AI Infrastructure Engineer, ML Data Platform

1 Day ago • 2 Years + • $188,000 PA - $225,600 PA

Job Summary

Job Description

As a Data Infrastructure Engineer on the AI Infrastructure team, you will design, build, and scale the data platform that powers all R&D and applied ML initiatives at Scale. You will collaborate closely with product engineering, platform engineering, and ML researchers to build robust and easy-to-use APIs and data pipelines. Your work will play a critical role in advancing frontier ML research, accelerating the data sales cycle, and improving data quality - all while optimizing infrastructure costs. You will design, implement, and maintain scalable data platforms to support diverse R&D and applied ML workloads and participate in the team’s on-call process.
Must have:
  • 2+ years of experience in building large-scale data systems.
  • Expertise in modern data platform technologies.
  • Experience with containerization and deployment technologies.
  • Strong problem solving skills in a dynamic environment.
Good to have:
  • Familiarity with ML development tools.
  • Experience with various storage systems.
  • Exposure to orchestration platforms.
  • Experience supporting post-training workflows.
  • Experience in a fast-moving startup environment.

Job Details

Scale’s AI Infrastructure team supports both R&D and applied Generative AI initiatives, driving breakthroughs in areas of post-training research such as AI safety, agents, and evaluating state-of-the-art model performance.

As a Data Infrastructure Engineer on the AI Infrastructure team, you will design, build, and scale the data platform that powers all R&D and applied ML initiatives at Scale. Collaborating closely with product engineering, platform engineering, and ML researchers, you will build robust and easy-to-use APIs and data pipelines. Your work will play a critical role in advancing frontier ML research, accelerating the data sales cycle, and improving data quality - all while optimizing infrastructure costs.

You will:

  • Design, implement, and maintain scalable data platforms to support diverse R&D and applied ML workloads.
  • Partner with ML researchers, product engineers, and operations teams to align data infrastructure with organizational goals.
  • Collaborate with ML researchers to build data access tools that help advance the state of frontier post-training research.
  • Participate in our team’s on call process to ensure the availability of our services.
  • Own projects end-to-end, from requirements, scoping, design, to implementation, in a highly collaborative and cross-functional environment.

Ideally you'd have:

  • 2+ years of experience in building and operating large-scale distributed data systems that support ML workloads.
  • Expertise in modern data platform technologies.
  • Experience working with standard containerization & deployment technologies like Kubernetes, Helm, Terraform, Docker, etc.
  • Strong problem solving skills and the ability to work effectively in a fast paced, dynamic environment.

Nice to haves:

  • Familiarity with ML development tools such as PyTorch, HuggingFace, or Weights & Biases.
  • Experience with a variety of storage systems: object (S3), document (MongoDB), relational (Postgres), and distributed (Redis, Elasticsearch).
  • Exposure to orchestration platforms like Temporal, Airflow, or AWS Step Functions.
  • Experience supporting post-training workflows such as evaluation, fine-tuning, and RLHF in LLM systems.
  • Experience working in a fast-moving startup or high-scale ML infra environment.

Similar Jobs

Scale AI - Growth Marketing Manager

Scale AI

(Remote)
1 Day ago
Toppan Merrill - Site Reliability Engineer

Toppan Merrill

Chennai, Tamil Nadu, India (On-Site)
7 Months ago
ASSIST Software - Azure DevOps Engineer

ASSIST Software

Suceava, Suceava County, Romania (Remote)
5 Months ago
Argus Labs - Site Reliability Engineer (APAC)

Argus Labs

Australia (Remote)
2 Weeks ago
Macrometa - Sr. Site Reliability Engineer

Macrometa

(Remote)
7 Hours ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Tide - Principal Cloud Engineer

Tide

(Remote)
9 Hours ago
NVIDIA - Senior Software Engineer - Conversational AI

NVIDIA

Pune, Maharashtra, India (On-Site)
1 Month ago
Relax Gaming  - Director of Games/Head of Games

Relax Gaming

Skåne County, Sweden (Hybrid)
1 Month ago
Playgendary - DevOps (Cloud Engineer)

Playgendary

Limassol, Limassol, Cyprus (Remote)
2 Months ago
Gaming Innovation Group  - Senior Platform DevOps Engineer

Gaming Innovation Group

St. Julian's, Malta (Hybrid)
1 Month ago
T systems - Java Backend Lead/ Architect

T systems

Bengaluru, Karnataka, India (Hybrid)
1 Month ago
anavatio  - DevOps Engineer

anavatio

Lorton, Virginia, United States (Hybrid)
4 Weeks ago
FICO - DevOps Engineering Enablement-Lead Engineer

FICO

Bengaluru, Karnataka, India (On-Site)
20 Hours ago
ARHS - DevSecOps Engineer (Automation Specialist)

ARHS

The Hague, South Holland, Netherlands (On-Site)
6 Months ago
Tatsu Works - Senior Fullstack Engineer

Tatsu Works

(Remote)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

Google - Customer Solutions Engineer

Google

Seattle, Washington, United States (On-Site)
2 Weeks ago
The Walt Disney Company - Security Training & Development Manager

The Walt Disney Company

Celebration, Florida, United States (On-Site)
3 Days ago
The Walt Disney Company - Nail Technician

The Walt Disney Company

Anaheim, California, United States (On-Site)
3 Days ago
Google - Program Manager I, Headcount Management, Google Cloud

Google

Addison, Texas, United States (On-Site)
2 Weeks ago
Warner Bros Games - Senior Artist, Character

Warner Bros Games

Salt Lake City, Utah, United States (Hybrid)
7 Months ago
Glean - Software Engineer, Machine Learning (Infrastructure)

Glean

Palo Alto, California, United States (Hybrid)
8 Hours ago
The Walt Disney Company - Maintenance Engineer

The Walt Disney Company

Anaheim, California, United States (On-Site)
2 Weeks ago
ByteDance - Engineering Manager Machine Learning Infrastructure

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Microsoft - Language Engineer

Microsoft

Mountain View, California, United States (Hybrid)
2 Weeks ago
Microsoft - Technical Program Manager, AI

Microsoft

Mountain View, California, United States (Hybrid)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Doha, Doha Municipality, Qatar (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Scale AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug