Engineering Manager, ML Training Platform

10 Months ago • 8 Years + • Research Development • $230,000 PA - $315,000 PA

Job Summary

Job Description

Zoox is seeking an Engineering Manager for their ML Training Platform to support autonomous driving innovations. This role involves managing a team of software engineers to build and operate the core ML platform for model training at scale, including deep learning frameworks and distributed infrastructure. Responsibilities include developing and executing a strategic vision for the platform, ensuring scalability and performance for Foundation and RL models, and leading the design, implementation, and operation of the ML training platform. The manager will also be responsible for hiring and mentoring a diverse engineering team, fostering innovation and collaboration, and partnering with cross-functional teams to define requirements and architectural decisions.
Must have:
  • 8+ years of total experience
  • 3+ years of engineering management experience
  • Excellent leadership skills
  • Experience enabling large-scale distributed model training
  • Experience with training frameworks (PyTorch, Hugging Face, Ray, DeepSpeed, JAX)
  • Experience building model lifecycle management tools
Good to have:
  • Experience with cost-efficient ML compute infrastructure
  • Experience leveraging GPUs, TPUs, or Trainium
  • Experience managing AWS costs for ML needs
Perks:
  • Salary range: $230,000 to $315,000
  • Sign-on bonus may be offered
  • Amazon Restricted Stock Units (RSUs)
  • Zoox Stock Appreciation Rights
  • Comprehensive benefits package (paid time off, health insurance, long-term care insurance, disability insurance, life insurance)

Job Details

Zoox is on a mission to reimagine transportation and ground-up build autonomous robotaxis that are safe, reliable, clean, and enjoyable for everyone. We are still in the early stages of deploying our robotaxis on public roads, and it is a great time to join Zoox and have a significant impact in executing this mission. The ML Platform team at Zoox plays a crucial role in enabling innovations in ML and CV to make autonomous driving as seamless as possible. 

The Opportunity
Are you excited to manage our ML Training Platform that enables autonomous driving? You will get to work across all ML teams within Zoox - Perception, Prediction, Planner, Simulation, Collision Avoidance, Data Science, etc., and have the opportunity to significantly push the boundaries of how ML is practiced within Zoox.
This team builds and operates the core part of the ML platform that powers model training at scale. We are responsible for developing and operating ML tools, deep learning frameworks, and distributed model training infrastructure to support foundational models and reinforcement learning. This team also owns the model repository and model lifecycle management tools used by our applied research teams for in- and off-vehicle ML use cases. You will lead a team of strong software engineers and act as a force multiplier for our internal customers. This team has a lot of growth opportunities as we expand our robotaxi deployments and venture into new ML domains. If you want to learn more about our stack behind autonomous driving, please look here.

In this role, you will

    • Vision: Develop and execute a strategic vision for our ML training platform, ensuring scalability, reliability, and performance to support large-scale Foundation and RL models.
    • Technical acumen: Lead the design, implementation, and operation of a robust and efficient ML training platform to enable the training, experimentation, validation, and monitoring of ML models.
    • Hiring: Attract, hire, and inspire a diverse world-class engineering team, fostering a culture of innovation, collaboration, and excellence.
    • Partnership: Collaborate closely with cross-functional teams, including ML researchers, software engineers, data engineers, and hardware engineers to define requirements and align on architectural decisions.
    • Mentorship: Enable the engineers in the team to grow their careers by providing the right opportunities along with clear and timely feedback.

Qualifications

    • 8+ years of total experience, including 3+ years of engineering management experience.
    • Excellent leadership skills with a demonstrated ability to build and manage high-performing engineering teams.
    • Experience enabling large-scale, cost-efficient distributed model training and ML compute infrastructure.
    • Experience with training frameworks such as PyTorch, Hugging Face, Ray, DeepSpeed, JAX, etc., leveraging GPUs, TPUs, or Trainium.
    • Experience building model lifecycle management tools and managing AWS costs for our ML needs.

Compensation
There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. The salary range for this position is $230,000 to $315,000. A sign-on bonus may be offered as part of the compensation package. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. The salary range listed in this posting is representative of the range of levels Zoox is considering for this position. Zoox also offers a comprehensive package of benefits, including paid time off (e.g., sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance.

Vaccine Mandate
Employees working in this position will be required to have received a vaccine approved by the U.S. Food and Drug Administration and/or the World Health Organization. In addition, employees who are eligible for a COVID-19 booster vaccine (“Booster”) will be required to receive a Booster. Employees will be required to show proof of vaccination status upon receipt of a conditional offer of employment. That offer of employment will be conditioned upon, among other things, an Applicant’s ability to show proof of vaccination status. Please note the Company provides reasonable accommodations in accordance with applicable state, federal, and local laws.

Similar Jobs

Pokemon - Manager, IT Service Desk

Pokemon

Bellevue, Washington, United States (Hybrid)
2 Weeks ago
Visa - Business Consultant, Strategic Revenue Growth

Visa

Atlanta, Georgia, United States (Hybrid)
4 Days ago
Telastra - Finance Specialist (CPA/CA Qualified)

Telastra

Makati City, Metro Manila, Philippines (Hybrid)
6 Days ago
LeoVegas - Senior Analyst

LeoVegas

Leeds, England, United Kingdom (Hybrid)
3 Months ago
PwC - Administrative Assistant - ITSDC (Proby)

PwC

Pasig, Metro Manila, Philippines (On-Site)
9 Months ago
Valeo - R&D Department Internship

Valeo

Skawina, Lesser Poland Voivodeship, Poland (On-Site)
1 Day ago
Qualcomm - Machine Learning Engineer

Qualcomm

Hyderabad, Telangana, India (On-Site)
2 Weeks ago
Joyteractive - User Acquisition Manager (R&D)

Joyteractive

Poland (Remote)
3 Months ago
Keywords Studios - AI - Project Lead (Prompts)

Keywords Studios

Silesian Voivodeship, Poland (On-Site)
4 Months ago
Microsoft - Member of Technical Staff, AI

Microsoft

Mountain View, California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

The Globel Talent Co - Internal Operations & Processes Specialist

The Globel Talent Co

Bogotá, Bogota, Colombia (Remote)
5 Months ago
Square - Senior Software Engineer

Square

Bengaluru, Karnataka, India (Hybrid)
1 Week ago
Dream Sports - Manager - B2B Sales

Dream Sports

Mumbai, Maharashtra, India (On-Site)
4 Months ago
Apple - Site Reliability Engineer (SRE) - Object Storage

Apple

Seattle, Washington, United States (On-Site)
5 Days ago
Axi - Head of Acquisition

Axi

Limassol, Limassol, Cyprus (On-Site)
1 Week ago
Stord - Customer Experience Associate

Stord

Georgia, United States (On-Site)
1 Week ago
Veeam Software - Director, Regional Marketing Canada

Veeam Software

Toronto, Ontario, Canada (Remote)
3 Weeks ago
Unity - Director, GTM Technology

Unity

San Francisco, California, United States (Hybrid)
2 Months ago
Whalar - Head of Content & Brand Marketing

Whalar

New York, United States (On-Site)
1 Day ago
Dream Games - Performance Marketing Specialist

Dream Games

Istanbul, İstanbul, Türkiye (On-Site)
2 Years ago

Get notifed when new similar jobs are uploaded

Jobs in Foster City, California, United States

Flow - Flow Experience Agent (Concierge)

Flow

Miami, Florida, United States (On-Site)
2 Weeks ago
JDA - Principal Software Engineer (Gen AI)

JDA

Dallas, Texas, United States (Hybrid)
1 Month ago
London stock Exchange - Contracts Negotiator

London stock Exchange

St. Louis, Missouri, United States (On-Site)
2 Months ago
FlockSafety - Senior Data Scientist, Operations Modeling

FlockSafety

United States (Remote)
2 Days ago
Valve corporation - Steam Partner Technical Account Manager

Valve corporation

Bellevue, Washington, United States (On-Site)
9 Months ago
Sierra - Technical Program Manager, Agent Development

Sierra

New York, United States (On-Site)
2 Days ago
Apple - Wireless RF OTA MIMO Validation Engineer

Apple

Cupertino, California, United States (On-Site)
2 Months ago
Whatnot - Customer Experience Team Lead (Night Shift)

Whatnot

Phoenix, Arizona, United States (On-Site)
2 Months ago
bytedance - Research Scientist, AI for Infra

bytedance

San Jose, California, United States (On-Site)
2 Months ago
Scopely - Senior Fullstack Engineer

Scopely

Culver City, California, United States (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

bytedance - Student Researcher (Doubao (Seed) - Foundation Model - Generative AI)

bytedance

Seattle, Washington, United States (On-Site)
3 Months ago
Apple - Machine Learning Engineer, Developer Productivity

Apple

Cupertino, California, United States (On-Site)
2 Months ago
EvenUp - Engineering Manager, AI Document Generation

EvenUp

San Francisco, California, United States (Hybrid)
1 Week ago
Airlab Inc  - Artificial Intelligence Researcher

Airlab Inc

Montreal, Quebec, Canada (On-Site)
1 Year ago
C3 IoT - Senior Software Engineer - Generative AI

C3 IoT

Guadalajara, Jalisco, Mexico (On-Site)
2 Weeks ago
Catina - Machine Learning Infrastructure Manager

Catina

San Francisco, California, United States (On-Site)
9 Months ago
Rackspace Technology - Senior Machine Learning Engineer

Rackspace Technology

Vietnam (Remote)
4 Months ago
Beta Craft - Sr. Python AI/ML Developer

Beta Craft

Pune, Maharashtra, India (On-Site)
4 Months ago
flying wild hog - AI Programmer

flying wild hog

(Remote)
5 Months ago
Unity - Director, Machine Learning

Unity

San Francisco, California, United States (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Zoox is transforming mobility-as-a-service by developing a fully autonomous, purpose-built fleet designed for AI to drive and humans to enjoy.

Foster City, California, United States (On-Site)

Fremont, California, United States (On-Site)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

Foster City, California, United States (On-Site)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

Foster City, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by zoox

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug