Principal Software Engineer, ML Infrastructure

1 Month ago • All levels • Devops • $373,000 PA - $448,000 PA

Job Summary

Job Description

Zoox is seeking a Principal Software Engineer, ML Infrastructure to shape and build next-generation ML infrastructure to accelerate the development and deployment of large-scale ML and Foundational models for their autonomous robotaxis. This role involves leading the design and development of data, compute, model training, and serving infrastructure, collaborating with AI teams across Perception, Prediction, Planner, and Simulation. The engineer will be responsible for building and operating data infrastructure for PBs of sensor data, compute infrastructure for model training and validation across thousands of GPUs, and the base layer of ML tools and frameworks. Key responsibilities include developing a strategic vision for ML Infrastructure, leading the design and implementation of cutting-edge infrastructure across the ML lifecycle, collaborating with cross-functional teams, and mentoring engineers.
Must have:
  • Experience building and managing large-scale ML infrastructure
  • Excellent leadership skills and ability to lead teams
  • Strong experience with training frameworks (PyTorch, JAX)
  • Experience with GPU-accelerated inference (TensorRT, Ray Serve)
  • Proficient in Python and/or C++
Good to have:
  • Experience enabling development/deployment of large-scale Foundation models
  • Experience with large-scale data infrastructure (Apache Spark)
  • Experience in the AV domain (Perception, Prediction, Planner)
Perks:
  • Paid time off (sick leave, vacation, bereavement)
  • Unpaid time off
  • Zoox Stock Appreciation Rights
  • Amazon Restricted Stock Units
  • Health insurance
  • Long-term care insurance
  • Long-term and short-term disability insurance
  • Life insurance

Job Details

Zoox is on a mission to reimagine transportation and ground-up build autonomous robotaxis that are safe, reliable, clean, and enjoyable for everyone. We are still in the early stages of deploying our robotaxis, and it's a great time to join Zoox and make a significant impact on executing this mission. The ML Infrastructure team at Zoox plays a crucial role in enabling innovations in ML and CV and making autonomous driving as seamless as possible.

The Opportunity
We are seeking a deeply technical, influential, and hands-on Principal Software Engineer to shape and build our next-generation ML Infrastructure and significantly reduce the time to develop and deploy large-scale ML and Foundational models to our robotaxi. You will lead the design and development of our Data, Compute, Model Training, and Serving Infrastructure. You will work across all AI teams within Zoox, including Perception, Prediction, Planner, Simulation, Collision Avoidance, and have the opportunity to significantly push the boundaries of how ML is practiced within Zoox.

We build and operate the data infrastructure responsible for ingesting PBs of sensor data and the systems used to assemble training datasets. We operate the compute infrastructure that powers Zoox’s model training, serving, and large-scale validation pipelines across tens of thousands of GPUs. We also operate the base layer of ML tools, deep learning frameworks, and inference systems used by our applied research teams for in- and off-vehicle ML use cases. You will lead a team of strong software engineers and act as a force multiplier for our teams. You can learn more about our ML Infrastructure here and our stack behind autonomous driving here. 

In this role, you will
  • Vision: Develop and execute a strategic vision for ML Infrastructure that will unlock innovation in autonomous driving and enhance our rider experience. 
  • Technical acumen: Lead the design and implementation of cutting-edge infrastructure spanning all stages of an ML lifecycle from data preparation to training to evaluation, deployment, and serving. 
  • Partnership: Collaborate closely with cross-functional teams, including ML researchers, software engineers, data engineers, and hardware engineers, to define requirements and align on architectural decisions.
  • Mentorship: Enable the engineers in the team to grow their careers by providing technical guidance and mentorship.

Qualifications
  • Experience building and managing large-scale ML infrastructure that powers the development of large-scale ML models
  • Excellent leadership skills with a demonstrated ability to lead high-performing engineering teams.
  • Strong experience with training frameworks like PyTorch, JAX, etc., leveraging GPUs efficiently for distributed model training.
  • Experience with GPU-accelerated inference using TensorRT, Ray Serve, or similar frameworks.
  • Proficient in Python and/or C++.

Bonus Qualifications
  • Experience enabling the development and deployment of large-scale Foundation models.
  • Experience working on large-scale data infrastructure and big data processing frameworks like Apache Spark.
  • Experience working in the AV domain supporting Perception, Prediction, Planner et al.

Compensation
There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. The salary will range from $373,000-$448,000. A sign-on bonus may be part of a compensation package. Compensation will vary based on geographic location, job-related knowledge, skills, and experience.  

Zoox also offers a comprehensive package of benefits including paid time off (e.g. sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance.

Similar Jobs

Vigaet - Project Coordinator-Internship

Vigaet

Bengaluru, Karnataka, India (On-Site)
1 Year ago
PayPal - Inside Sales, Outbound

PayPal

New York, New York, United States (Hybrid)
1 Month ago
London stock Exchange - Dev Ops Engineer

London stock Exchange

Bucharest, Bucharest, Romania (On-Site)
1 Month ago
GoTo Group - Investor Relations Manager

GoTo Group

Jakarta, Indonesia (On-Site)
5 Months ago
Plaid  - Experienced Software Engineer - Consumer

Plaid

San Francisco, California, United States (On-Site)
1 Month ago
Apple - AIML - ML Engineer, Machine Learning Platform & Infrastructure

Apple

Santa Clara, California, United States (On-Site)
5 Days ago
Journee - Senior Cloud Infrastructure Engineer

Journee

Berlin, Berlin, Germany (Hybrid)
9 Months ago
ARHS - Azure Cloud Architect (m/f)

ARHS

Luxembourg (On-Site)
3 Months ago
CyberArk - Staff Site Reliability Engineer

CyberArk

United States (Remote)
2 Months ago
AeroSpike - Staff Site Reliability Engineer

AeroSpike

Bengaluru, Karnataka, India (Hybrid)
4 Days ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Sperasoft - Senior 3D Vegetation Artist

Sperasoft

Yerevan, Yerevan, Armenia (On-Site)
4 Months ago
whoop - Business Analyst II (Growth)

whoop

Boston, Massachusetts, United States (On-Site)
4 Months ago
sofar sounds - Freelance Producer

sofar sounds

Pittsburgh, Pennsylvania, United States (Hybrid)
1 Month ago
Lionbridge Games - Software Linguistic Tester

Lionbridge Games

Masovian Voivodeship, Poland (On-Site)
3 Months ago
Palo Alto Networks - Principal Software Test Engineer (Strata Cloud Manager)

Palo Alto Networks

Santa Clara, California, United States (On-Site)
2 Months ago
Hudl - Senior Engineer

Hudl

London, England, United Kingdom (On-Site)
2 Months ago
entrata - Agency Support Representative

entrata

Pune, Maharashtra, India (Hybrid)
1 Year ago
Avalanche Studios Group - Backend Engineer (C++)

Avalanche Studios Group

Stockholm, Stockholm County, Sweden (Hybrid)
2 Months ago
Lilt - French Belgium Medical Translators Needed

Lilt

Brussels, Brussels, Belgium (Remote)
1 Month ago
AccelData - Senior Backend Engineer

AccelData

Bengaluru, Karnataka, India (On-Site)
12 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Foster City, California, United States

Nordson Corporation - CNC Machinist

Nordson Corporation

Clinton, South Carolina, United States (On-Site)
2 Months ago
Internet Brands - Telecom Engineer

Internet Brands

El Segundo, California, United States (On-Site)
2 Months ago
Lilt - Senior Manager, Customer Engineering

Lilt

Indianapolis, Indiana, United States (Hybrid)
2 Weeks ago
Loft Orbital - Team Lead - Cloud Infrastructure Team

Loft Orbital

Golden, Colorado, United States (Hybrid)
2 Months ago
Activate Games - Game Facilitator (Store Associate)

Activate Games

Culver City, California, United States (On-Site)
2 Months ago
Next Level Business Services - C++ Developer

Next Level Business Services

Milwaukee, Wisconsin, United States (On-Site)
9 Months ago
Hawkeye Innovations - Data Processing Assistant

Hawkeye Innovations

Atlanta, Georgia, United States (On-Site)
3 Months ago
Findhelp - Senior Professional Services Consultant

Findhelp

West Virginia, United States (On-Site)
3 Weeks ago
HCL Tech - Golang Technical Specialist

HCL Tech

California, United States (On-Site)
1 Month ago
Cognite - Customer Business Executive

Cognite

Houston, Texas, United States (Hybrid)
7 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Supabase - Platform Engineer: Kubernetes

Supabase

(Remote)
1 Week ago
Axon - Sr. Solutions Architect, Fusus

Axon

Atlanta, Georgia, United States (Hybrid)
1 Month ago
Google - Software Engineer III, Engineering Productivity, Google Cloud Platforms

Google

Sunnyvale, California, United States (On-Site)
3 Months ago
Zenoti - Architect - Agentic AI/Python/Cloud/Architecture

Zenoti

Hyderabad, Telangana, India (On-Site)
1 Month ago
Ajmera Infotech - Senior .NET Developer with Cloud Expertise

Ajmera Infotech

Hyderabad, Telangana, India (On-Site)
2 Weeks ago
Thousand Eyes - Senior Site Reliability Engineer, Datastores

Thousand Eyes

Mexico City, Mexico (On-Site)
2 Months ago
hogarth - Senior DevSecOps Engineer

hogarth

Manila, Metro Manila, Philippines (On-Site)
2 Months ago
Workato - Senior Automation Engineer

Workato

Bengaluru, Karnataka, India (On-Site)
2 Weeks ago
Epic Games - Senior Mobile Platform Engineer

Epic Games

(On-Site)
3 Months ago
Capgemini - Cloud Solution Architect

Capgemini

Bengaluru, Karnataka, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Zoox is transforming mobility-as-a-service by developing a fully autonomous, purpose-built fleet designed for AI to drive and humans to enjoy.

Foster City, California, United States (Hybrid)

Fremont, California, United States (On-Site)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

Foster City, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by zoox