Software Engineer, Cloud Infrastructure

6 Months ago • 5 Years + • DevOps

Job Summary

Job Description

As a Software Engineer, Cloud Infrastructure at Scale, you'll design and build core cloud infrastructure platforms and systems, supporting orchestration, data abstraction, and data pipelines. You'll leverage AWS, Kubernetes, Docker, Terraform, Helm, and more, working closely with stakeholders and internal customers.
Must have:
  • Cloud Infrastructure
  • AWS Experience
  • Distributed Systems
  • Software Development
Good to have:
  • Azure & GCP
  • GPU-based Compute
  • Hyper-growth Startups
  • AI Technologies
Perks:
  • AI Race Exposure
  • World-class RLHF

Job Details

Software is eating the world, but AI is eating software. We live in unprecedented times – AI has the potential to exponentially augment human intelligence. Every person will have a personal tutor, coach, assistant, personal shopper, travel guide, and therapist throughout life. As the world adjusts to this new reality, leading platform companies are scrambling to build LLMs at billion scale, while large enterprises figure out how to add it to their products. To make them safe, aligned and actually useful, these models need human eval and reinforcement learning through human feedback (RLHF) during pre-training, fine-tuning, and production evaluations. This is the main innovation that’s enabled ChatGPT to get such a large headstart among competition.

At Scale, our products include the Generative AI Data Engine, SGP, Donovan, and others that power the most advanced LLMs and generative models in the world through world-class RLHF, human data generation, model evaluation, safety, and alignment. The data we are producing is some of the most important work for how humanity will interact with AI.

At the foundation of these products is the Platform Engineering team.  In this role, you will help lead the design and development of core cloud infrastructure platforms and systems, while supporting orchestration, data abstraction, data pipelines, identity & access management, and underlying infrastructure.  You’ll also get widespread exposure to the forefront of the AI race as Scale sees it in enterprises, startups, governments, and large tech companies.

You will:

  • Own the underlying cloud infrastructure stack running on AWS leveraging Kubernetes, Docker, Terraform, Helm and other common tools and frameworks.
  • Drive the architecture, design, implementation and support of our foundational platforms and systems, working closely with stakeholders and internal customers to understand and refine requirements.
  • Collaborating with cross-functional teams to define, design, and deliver new features.
  • Proactively identifying opportunities for, and driving improvements to, current infrastructure practices, including process enhancements, tool upgrades, and cost optimizations.
  • Presenting technical information to teams and stakeholders, providing guidance and insight on development processes and technologies.

Ideally you’d have:

  • 5+ years of full-time engineering experience, post-graduation with specialities in back-end systems.
  • Extensive experience supporting cloud-based infrastructure (AWS preferred).
  • Extensive experience in software development and a deep understanding of distributed systems, cloud platforms, and software development best practices.
  • Show a track record of leading successful projects with increasing scale and scope.
  • Possess excellent communication and collaboration skills, and the ability to translate complex technical concepts to non-technical stakeholders.
  • Advanced Linux troubleshooting skills, including diagnostic experience leveraging common logging & telemetry systems, IAM management, TCP/IP and OSI proficiency.
  • Strong knowledge of software engineering best practices and CI/CD tooling.

Nice to haves:

  • Experience with Azure and GCP, and GPU-based compute.
  • Experience scaling products at hyper-growth startups.
  • Excitement to work with AI technologies.

Similar Jobs

Epic Games - Data Analyst - CorpTech Analytics

Epic Games

Cary, North Carolina, United States (On-Site)
1 Month ago
Rackspace Technology - Practice Manager, Data Science, AI and ML

Rackspace Technology

(Remote)
4 Months ago
Wargaming - Game Developer

Wargaming

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Month ago
Immutable - Senior Software Engineer (Passport)

Immutable

Australia (Hybrid)
4 Months ago
Rackspace Technology - Lead Cloud Engineer

Rackspace Technology

United States (Remote)
2 Months ago
ION - Cloud Engineer Kubernetes

ION

Castellazzo Bormida, Piedmont, Italy (Hybrid)
6 Months ago
Zeta - Engineering Manager - Cloud Security (DevSecOps)

Zeta

Bengaluru, Karnataka, India (On-Site)
6 Months ago
Microsoft - Technical Program Manager - Azure Core - Cloud Buildout Infrastructure & Lifecycle

Microsoft

Bucharest, Bucharest, Romania (On-Site)
1 Week ago
ByteDance - Backend Software Engineer - Foundational Technology

ByteDance

Singapore (On-Site)
1 Month ago
Auros Global - Senior Site Reliability Engineer

Auros Global

United Kingdom (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Nintendo - DevOps Engineer (Site Reliability)

Nintendo

Redmond, Washington, United States (Hybrid)
2 Weeks ago
Canva - Senior Software Reliability Engineer (Production Health)

Canva

Sydney, New South Wales, Australia (Remote)
1 Month ago
Fanatee - Data Intern

Fanatee

Spain (Hybrid)
1 Month ago
N-iX - Senior DevOps (AWS) Engineer

N-iX

Colombia (Remote)
1 Week ago
NinjaVan - Senior Software Engineer

NinjaVan

Ho Chi Minh City, Ho Chi Minh City, Vietnam (Hybrid)
6 Months ago
Glean - Tech Lead Manager - Generative AI Product

Glean

Palo Alto, California, United States (On-Site)
5 Months ago
Interactive Brokers - Senior Java Developer

Interactive Brokers

Budapest, Hungary (Hybrid)
6 Months ago
Sinch - Full Stack Technical Team Lead - DevEx

Sinch

Mandaluyong, Metro Manila, Philippines (Remote)
5 Days ago
Onward Search - Senior Software Back-End Engineer

Onward Search

Irvine, California, United States (Hybrid)
1 Day ago
RoofStack - Senior Platform Engineer

RoofStack

İstanbul, İstanbul, Türkiye (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

Zoox - Senior/Staff Software Engineer - Simulation Workload Orchestration

Zoox

Seattle, Washington, United States (Hybrid)
6 Months ago
Netflix - Technical Program Manager (L5), Content Promotion & Distribution

Netflix

Los Angeles, California, United States (On-Site)
1 Week ago
Sandbox VR - Retail Associate

Sandbox VR

Los Angeles, California, United States (On-Site)
6 Months ago
Meta - ASIC Engineer, Design Verification

Meta

Austin, Texas, United States (Remote)
5 Months ago
Riot Games - Associate Art Director - VALORANT, Core Game

Riot Games

Los Angeles, California, United States (On-Site)
8 Months ago
Google - Customer Engineer, SAP, Google Cloud

Google

Austin, Texas, United States (On-Site)
1 Week ago
Joyride Games - VP Marketing

Joyride Games

Austin, Texas, United States (Remote)
1 Year ago
The Walt Disney Company - Lead Software Engineer, Machine Learning - Ad Platforms

The Walt Disney Company

California, United States (On-Site)
2 Weeks ago
Google - Technical Program Manager II, Logistics Solutions, Cloud Infrastructure

Google

Atlanta, Georgia, United States (On-Site)
1 Week ago
Dentsu - VP, Client Partner

Dentsu

Maryland, United States (Remote)
1 Week ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Metyis - Lead Devops Engineer

Metyis

Bengaluru, Karnataka, India (On-Site)
5 Months ago
ByteDance - Cloud Solution Architect (Automotive Industry) - Singapore

ByteDance

Singapore (On-Site)
5 Months ago
Omnissa - Member of technical staff (C++,iOS)

Omnissa

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Thatgamecompany - Technical Support Engineer - China

Thatgamecompany

Shanghai, Shanghai, China (On-Site)
1 Month ago
Google - Technical Solutions Engineer, Data, Google Cloud

Google

Seoul, South Korea (On-Site)
1 Day ago
ByteDance - Site Reliability Engineer (Cloud Native Platform) - Traffic Infrastructure

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
Futurum Technology  - DevOps Engineer (Python Focus)

Futurum Technology

Kraków, Lesser Poland Voivodeship, Poland (Remote)
1 Month ago
ByteDance - Solutions Architect

ByteDance

(On-Site)
1 Month ago
GoTo Group - Principal SRE Engineer (SE5)

GoTo Group

Gurugram, Haryana, India (On-Site)
6 Months ago
Ajmera Infotech - Site Reliability Engineer (SRE) - Kubernetes

Ajmera Infotech

Bengaluru, Karnataka, India (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Mexico City, Mexico City, Mexico (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (Hybrid)

Mexico City, Mexico City, Mexico (Remote)

San Francisco, California, United States (Hybrid)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Argentina (On-Site)

Argentina (On-Site)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Scale AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug