Software Engineer, Cloud Infrastructure

6 Months ago • 5 Years + • DevOps

Job Summary

Job Description

As a Software Engineer, Cloud Infrastructure at Scale, you'll design and build core cloud infrastructure platforms and systems, supporting orchestration, data abstraction, and data pipelines. You'll leverage AWS, Kubernetes, Docker, Terraform, Helm, and more, working closely with stakeholders and internal customers.
Must have:
  • Cloud Infrastructure
  • AWS Experience
  • Distributed Systems
  • Software Development
Good to have:
  • Azure & GCP
  • GPU-based Compute
  • Hyper-growth Startups
  • AI Technologies
Perks:
  • AI Race Exposure
  • World-class RLHF

Job Details

Software is eating the world, but AI is eating software. We live in unprecedented times – AI has the potential to exponentially augment human intelligence. Every person will have a personal tutor, coach, assistant, personal shopper, travel guide, and therapist throughout life. As the world adjusts to this new reality, leading platform companies are scrambling to build LLMs at billion scale, while large enterprises figure out how to add it to their products. To make them safe, aligned and actually useful, these models need human eval and reinforcement learning through human feedback (RLHF) during pre-training, fine-tuning, and production evaluations. This is the main innovation that’s enabled ChatGPT to get such a large headstart among competition.

At Scale, our products include the Generative AI Data Engine, SGP, Donovan, and others that power the most advanced LLMs and generative models in the world through world-class RLHF, human data generation, model evaluation, safety, and alignment. The data we are producing is some of the most important work for how humanity will interact with AI.

At the foundation of these products is the Platform Engineering team.  In this role, you will help lead the design and development of core cloud infrastructure platforms and systems, while supporting orchestration, data abstraction, data pipelines, identity & access management, and underlying infrastructure.  You’ll also get widespread exposure to the forefront of the AI race as Scale sees it in enterprises, startups, governments, and large tech companies.

You will:

  • Own the underlying cloud infrastructure stack running on AWS leveraging Kubernetes, Docker, Terraform, Helm and other common tools and frameworks.
  • Drive the architecture, design, implementation and support of our foundational platforms and systems, working closely with stakeholders and internal customers to understand and refine requirements.
  • Collaborating with cross-functional teams to define, design, and deliver new features.
  • Proactively identifying opportunities for, and driving improvements to, current infrastructure practices, including process enhancements, tool upgrades, and cost optimizations.
  • Presenting technical information to teams and stakeholders, providing guidance and insight on development processes and technologies.

Ideally you’d have:

  • 5+ years of full-time engineering experience, post-graduation with specialities in back-end systems.
  • Extensive experience supporting cloud-based infrastructure (AWS preferred).
  • Extensive experience in software development and a deep understanding of distributed systems, cloud platforms, and software development best practices.
  • Show a track record of leading successful projects with increasing scale and scope.
  • Possess excellent communication and collaboration skills, and the ability to translate complex technical concepts to non-technical stakeholders.
  • Advanced Linux troubleshooting skills, including diagnostic experience leveraging common logging & telemetry systems, IAM management, TCP/IP and OSI proficiency.
  • Strong knowledge of software engineering best practices and CI/CD tooling.

Nice to haves:

  • Experience with Azure and GCP, and GPU-based compute.
  • Experience scaling products at hyper-growth startups.
  • Excitement to work with AI technologies.

Similar Jobs

The Walt Disney Company - Sr. Principal Software Engineer - Identity

The Walt Disney Company

New York, New York, United States (On-Site)
3 Months ago
ByteDance - Software Engineer, Cloud Native Platform

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Netflix - Site Reliability Engineer L5 - Open Connect

Netflix

United States (Remote)
2 Months ago
Glean - Technical Support Engineer (EST shift hours)

Glean

Bengaluru, Karnataka, India (On-Site)
4 Months ago
N-iX - Middle Full-Stack Engineer (React Native + NodeJS)

N-iX

Ukraine (Remote)
1 Week ago
Next Level Business Services - Windows Azure Build Engineer

Next Level Business Services

Redmond, Washington, United States (On-Site)
6 Months ago
Google - Software Engineer, Site Reliability Engineering

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Week ago
Google - Staff Software Engineer, Networking Infrastructure

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Week ago
Next Level Business Services - Systems Engineer

Next Level Business Services

Redmond, Washington, United States (On-Site)
6 Months ago
Brillio - PCF to Azure AKS Migration Architect - R01531191

Brillio

Bengaluru, Karnataka, India (Hybrid)
6 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

XBorg - Senior Back-End Software Engineer

XBorg

(Remote)
1 Month ago
Warner Bros Games - Staff Software Engineer - AWS Architecture (Observability Team)

Warner Bros Games

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
Razer - Senior API Developer

Razer

Singapore (On-Site)
6 Months ago
Ubisoft - Public Cloud Administrator

Ubisoft

Saint-Mandé, Île-de-France, France (Hybrid)
2 Days ago
The Walt Disney Company - Senior Software Engineer, Front-End

The Walt Disney Company

New York, New York, United States (On-Site)
1 Month ago
Epic Games - SDET

Epic Games

London, England, United Kingdom (On-Site)
1 Week ago
Netflix - Distributed Systems Engineer (L5), Content Engineering

Netflix

California, United States (Remote)
4 Months ago
NVIDIA - Principal DGX Cloud Machine Learning Architect

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
ION - Senior Java Developer - Italy

ION

Pisa, Tuscany, Italy (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

NVIDIA - Senior Math Libraries Engineers - Python APIs

NVIDIA

Santa Clara, California, United States (Remote)
2 Months ago
The Walt Disney Company - Director, Ad Monetization & Insights

The Walt Disney Company

New York, New York, United States (On-Site)
1 Week ago
Tencent - Senior Procurement Manager, Game Marketing

Tencent

Los Angeles, California, United States (On-Site)
2 Months ago
RealXP Lab - Industry Mentor, Game Development

RealXP Lab

Dallas, Texas, United States (Remote)
4 Months ago
Trek - Sales Associate

Trek

Alamo, California, United States (On-Site)
2 Months ago
Team Liquid - Event Manager, NA

Team Liquid

California, United States (Hybrid)
1 Month ago
The Walt Disney Company - Software Engineer, Platform

The Walt Disney Company

California, United States (On-Site)
1 Month ago
Epic Games - Product Manager

Epic Games

Bellevue, Washington, United States (On-Site)
2 Months ago
Epic Games - Product Manager

Epic Games

Cary, North Carolina, United States (On-Site)
2 Months ago
AGS - American Gaming Systems - Graphic Designer

AGS - American Gaming Systems

Nevada, United States (On-Site)
5 Days ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Gaming Innovation Group  - DevOps Engineer

Gaming Innovation Group

St. Julian's, Malta (Hybrid)
1 Month ago
One of Us - Tools Developer

One of Us

London, England, United Kingdom (Hybrid)
1 Month ago
Google - Customer Engineer, Infrastructure Modernization, Google Cloud

Google

Mumbai, Maharashtra, India (On-Site)
1 Week ago
Ajmera Infotech - Senior Azure DevOps Engineer (IaaS)

Ajmera Infotech

Ahmedabad, Gujarat, India (On-Site)
1 Month ago
Equivalent Jobs - MLOPS ENGINEER

Equivalent Jobs

(Remote)
5 Months ago
Aristocrat Gaming - DevOps Lead

Aristocrat Gaming

Austin, Texas, United States (Hybrid)
3 Weeks ago
Next Level Business Services - DevOps Consultant

Next Level Business Services

San Diego, California, United States (On-Site)
6 Months ago
Google - Program Manager, Google Distributed Cloud

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Week ago
ARHS - Senior Cloud/DevOps Architect

ARHS

Luxembourg (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Mexico City, Mexico City, Mexico (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (Hybrid)

Mexico City, Mexico City, Mexico (Remote)

San Francisco, California, United States (Hybrid)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Argentina (On-Site)

Argentina (On-Site)

View All Jobs

Get notified when new jobs are added by Scale AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug