AI Infrastructure Engineer, ML Data Platform

1 Month ago • 2 Years + • $188,000 PA - $225,600 PA

Job Summary

Job Description

As a Data Infrastructure Engineer on the AI Infrastructure team, you will design, build, and scale the data platform that powers all R&D and applied ML initiatives at Scale. You will collaborate closely with product engineering, platform engineering, and ML researchers to build robust and easy-to-use APIs and data pipelines. Your work will play a critical role in advancing frontier ML research, accelerating the data sales cycle, and improving data quality - all while optimizing infrastructure costs. You will design, implement, and maintain scalable data platforms to support diverse R&D and applied ML workloads and participate in the team’s on-call process.
Must have:
  • 2+ years of experience in building large-scale data systems.
  • Expertise in modern data platform technologies.
  • Experience with containerization and deployment technologies.
  • Strong problem solving skills in a dynamic environment.
Good to have:
  • Familiarity with ML development tools.
  • Experience with various storage systems.
  • Exposure to orchestration platforms.
  • Experience supporting post-training workflows.
  • Experience in a fast-moving startup environment.

Job Details

Scale’s AI Infrastructure team supports both R&D and applied Generative AI initiatives, driving breakthroughs in areas of post-training research such as AI safety, agents, and evaluating state-of-the-art model performance.

As a Data Infrastructure Engineer on the AI Infrastructure team, you will design, build, and scale the data platform that powers all R&D and applied ML initiatives at Scale. Collaborating closely with product engineering, platform engineering, and ML researchers, you will build robust and easy-to-use APIs and data pipelines. Your work will play a critical role in advancing frontier ML research, accelerating the data sales cycle, and improving data quality - all while optimizing infrastructure costs.

You will:

  • Design, implement, and maintain scalable data platforms to support diverse R&D and applied ML workloads.
  • Partner with ML researchers, product engineers, and operations teams to align data infrastructure with organizational goals.
  • Collaborate with ML researchers to build data access tools that help advance the state of frontier post-training research.
  • Participate in our team’s on call process to ensure the availability of our services.
  • Own projects end-to-end, from requirements, scoping, design, to implementation, in a highly collaborative and cross-functional environment.

Ideally you'd have:

  • 2+ years of experience in building and operating large-scale distributed data systems that support ML workloads.
  • Expertise in modern data platform technologies.
  • Experience working with standard containerization & deployment technologies like Kubernetes, Helm, Terraform, Docker, etc.
  • Strong problem solving skills and the ability to work effectively in a fast paced, dynamic environment.

Nice to haves:

  • Familiarity with ML development tools such as PyTorch, HuggingFace, or Weights & Biases.
  • Experience with a variety of storage systems: object (S3), document (MongoDB), relational (Postgres), and distributed (Redis, Elasticsearch).
  • Exposure to orchestration platforms like Temporal, Airflow, or AWS Step Functions.
  • Experience supporting post-training workflows such as evaluation, fine-tuning, and RLHF in LLM systems.
  • Experience working in a fast-moving startup or high-scale ML infra environment.

Similar Jobs

Zscaler - Principal Information Security Engineer - Container Security

Zscaler

Bengaluru, Karnataka, India (Hybrid)
2 Weeks ago
Playgendary - DevOps (Cloud Engineer)

Playgendary

Limassol, Limassol, Cyprus (Remote)
3 Months ago
Survay Monkey - Staff Site Reliability Engineer - Cloud Solutions Team

Survay Monkey

Bengaluru, Karnataka, India (Hybrid)
1 Month ago
GoReel - Python Developer

GoReel

(Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Synechron - Core Platform Python Engineer

Synechron

Weehawken, New Jersey, United States (On-Site)
1 Week ago
N-ix - Middle Node.JS Engineer

N-ix

Colombia (Remote)
1 Month ago
Better ME - Backend Engineer (Mobile Team)

Better ME

Ukraine (Remote)
3 Weeks ago
Enverus - Senior Software Engineer

Enverus

Brno, South Moravian Region, Czechia (On-Site)
1 Week ago
GoTo Group - Software Engineer - Data Science Platform

GoTo Group

Jakarta, Jakarta, Indonesia (On-Site)
7 Months ago
Inworld AI - Staff Platform Engineer - USA

Inworld AI

Mountain View, California, United States (On-Site)
6 Months ago
Hashlist - Senior Data Engineer

Hashlist

Pune, Maharashtra, India (Hybrid)
6 Months ago
Genies.io - Lead Security & Safety Engineer

Genies.io

Los Angeles, California, United States (On-Site)
2 Weeks ago
Zinnia - Senior Cloud Security Engineer

Zinnia

Noida, Uttar Pradesh, India (Hybrid)
7 Months ago
Survay Monkey - Staff Site Reliability Engineer - Cloud Solutions Team

Survay Monkey

Bengaluru, Karnataka, India (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

Glean - Solutions Engineer, SLED

Glean

Palo Alto, California, United States (Remote)
2 Weeks ago
nexon america - Director, Gameplay Engineering

nexon america

El Segundo, California, United States (Hybrid)
1 Month ago
Clear Watery Analytics - Service Delivery Manager

Clear Watery Analytics

Boise, Idaho, United States (On-Site)
3 Weeks ago
Electronic Arts - Experienced C++ Generalist Software Engineer - Madden

Electronic Arts

Orlando, Florida, United States (Hybrid)
1 Month ago
2K - Product Manager

2K

Austin, Texas, United States (On-Site)
1 Month ago
Samsung Semiconductor - IT Infrastructure Engineer Contractor

Samsung Semiconductor

San Jose, California, United States (Hybrid)
4 Months ago
Glocomms - AVP, Identity & Access Management Architect and Operations Lead

Glocomms

Orlando, Florida, United States (On-Site)
1 Month ago
Fliff  Inc  - Data Scientist

Fliff Inc

Austin, Texas, United States (On-Site)
10 Months ago
Apple - Machine Learning Engineer

Apple

New York, New York, United States (On-Site)
2 Weeks ago
zoox - Senior/Staff Software Engineer - Simulation Traffic & Behavior Modeling

zoox

Foster City, California, United States (Hybrid)
7 Months ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Seattle, Washington, United States (Remote)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Scale AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug