Staff Software Engineer, ML Training and Inference Infrastructure

2 Months ago • All levels • Devops

Job Summary

Job Description

As a Staff Software Engineer, ML training and inference infrastructure, you will be a member of the Perception team at Rivian, which develops advanced machine learning algorithms that directly impact safety critical self-driving features of our category defining vehicles. This role involves establishing a state-of-the-art ML infrastructure for training and inference of large autonomous driving models, and optimizing the training and inference performance. Responsibilities include optimizing deep learning workload on NVIDIA GPU systems, optimizing model inference latency, and designing, training, and deploying large deep learning models.
Must have:
  • Optimize Deep Learning training workload on NVIDIA GPU systems.
  • Optimize model inference and pre/post-processing latency.
  • PhD in CS/CE/EE, or equivalent experience.
  • Deep knowledge of PyTorch.
  • Knowledge of model training frameworks.
  • In-depth knowledge of transformer architecture.
  • Experience with large-scale distributed model training.
  • Track record of model profiling and optimization.

Job Details

About Rivian

Rivian is on a mission to keep the world adventurous forever. This goes for the emissions-free Electric Adventure Vehicles we build, and the curious, courageous souls we seek to attract. 

 

As a company, we constantly challenge what’s possible, never simply accepting what has always been done. We reframe old problems, seek new solutions and operate comfortably in areas that are unknown. Our backgrounds are diverse, but our team shares a love of the outdoors and a desire to protect it for future generations. 


Role Summary

As a Staff Software Engineer, ML training and inference infrastructure, you will be a member of the Perception team at Rivian, which develops advanced machine learning algorithms that directly impact safety critical self-driving features of our category defining vehicles.

We are looking for candidates with deep knowledge and strong enthusiasm towards establishing a state-of-art ML infrastructure for training and inference of large autonomous driving models; and optimizing the training and inference performance. 


Responsibilities

  • Optimize the performance of Deep Learning training workload on NVIDIA GPU systems on a large scale
  • Optimize the latency of model inference and model pre- and post-processing on onboard systems
  • Design, train, and deploy large deep learning models that can leverage the vast amount of labeled and unlabeled data

Qualifications

  • PhD in CS/CE/EE, or equivalent, in industry experience
  • Deep knowledge of PyTorch
  • Knowledge of model training framework (e.g. PyTorch Lightning, ray, etc.)
  • In-depth knowledge of transformer architecture and ways to accelerate the training and inference of transformer models
  • Experience of performing large scale distributed training of models
  • A track record of profiling models and doing detective work to improve model training and inference speed



Equal Opportunity

Rivian is an equal opportunity employer and complies with all applicable federal, state, and local fair employment practices laws. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law.

 

Rivian is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at candidateaccommodations@rivian.com.

Candidate Data Privacy

Rivian may collect, use and disclose your personal information or personal data (within the meaning of the applicable data protection laws) when you apply for employment and/or participate in our recruitment processes (“Candidate Personal Data”). This data includes contact, demographic, communications, educational, professional, employment, social media/website, network/device, recruiting system usage/interaction, security and preference information. Rivian may use your Candidate Personal Data for the purposes of (i) tracking interactions with our recruiting system; (ii) carrying out, analyzing and improving our application and recruitment process, including assessing you and your application and conducting employment, background and reference checks; (iii) establishing an employment relationship or entering into an employment contract with you; (iv) complying with our legal, regulatory and corporate governance obligations; (v) recordkeeping; (vi) ensuring network and information security and preventing fraud; and (vii) as otherwise required or permitted by applicable law. 

 

Rivian may share your Candidate Personal Data with (i) internal personnel who have a need to know such information in order to perform their duties, including individuals on our People Team, Finance, Legal, and the team(s) with the position(s) for which you are applying; (ii) Rivian affiliates; and (iii) Rivian’s service providers, including providers of background checks, staffing services, and cloud services. 

 

Rivian may transfer or store internationally your Candidate Personal Data, including to or in the United States, Canada, the United Kingdom, and the European Union and in the cloud, and this data may be subject to the laws and accessible to the courts, law enforcement and national security authorities of such jurisdictions.  

 

Please note that we are currently not accepting applications from third party application services.

Similar Jobs

Marsh McLennan - China O&T Leader – Health & Benefits

Marsh McLennan

Shanghai, China (Hybrid)
2 Months ago
Draftwise - Senior Customer Success Manager

Draftwise

New York, United States (Remote)
4 Weeks ago
Tekion Corp - Training Analyst II (Emerging Product)

Tekion Corp

(Remote)
3 Months ago
Techstars - Startup Community Program Manager

Techstars

Tuscaloosa, Alabama, United States (On-Site)
4 Weeks ago
Alpha Sense - Product Specialist

Alpha Sense

New York, New York, United States (On-Site)
8 Months ago
Wind River - Cloud Platform Software Developer – Member of Technical Staff

Wind River

Ottawa, Ontario, Canada (Hybrid)
2 Months ago
warner bros games - Staff Software Engineer - Golang - QoE Platform

warner bros games

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Trellix - Software Architect

Trellix

Cork, County Cork, Ireland (On-Site)
2 Months ago
reversing labs  - Principal Infrastructure & Cloud Optimization Engineer

reversing labs

Zagreb, Grad Zagreb, Croatia (Hybrid)
3 Months ago
Figma - Software Engineer, Infrastructure

Figma

San Francisco, California, United States (Remote)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Veeam Software - Software Developer in Test (JavaScript)

Veeam Software

Poland (Remote)
1 Month ago
Wolters Kluwer - Jurist / Vakredacteur Participatiewet (32 hours)

Wolters Kluwer

Venlo, Limburg, Netherlands (Hybrid)
3 Weeks ago
Moonton  - Senior Recruiter (Art-oriented)

Moonton

Shanghai, China (On-Site)
2 Weeks ago
ISS Stoxx - Sales Support Analyst

ISS Stoxx

Prague, Czechia (On-Site)
1 Year ago
Square - Office Cleaning / Medical Cleaning - Weekends 8 Hrs Shift

Square

Akron, Ohio, United States (On-Site)
1 Month ago
Haptic  - Senior 3D Creative Designer

Haptic

Paris, Île-de-France, France (Remote)
7 Months ago
Macrometa - Senior DevOps Engineer

Macrometa

(Remote)
2 Months ago
LeoVegas - Talent Acquisition Partner

LeoVegas

Sliema, Malta (Hybrid)
1 Month ago
USE Insider - Solutions Architect - Colombia

USE Insider

Bogota, Colombia (Hybrid)
2 Weeks ago
Warner Bros - NetherRealm Studios - Lead Artist

Warner Bros - NetherRealm Studios

Chicago, Illinois, United States (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

Square - Temporary Sales Assistant

Square

Birmingham, England, United Kingdom (On-Site)
2 Weeks ago
Mcdonalds - Change Management Lead

Mcdonalds

London, England, United Kingdom (Hybrid)
2 Months ago
ElevenLabs - Full-Stack Engineer (Front-End Leaning)

ElevenLabs

United Kingdom (Remote)
4 Months ago
fluence - Sr. Maintenance Engineer

fluence

London, England, United Kingdom (Hybrid)
3 Months ago
Insight Software - Project Coordinator (German Speaking)

Insight Software

London, England, United Kingdom (On-Site)
1 Month ago
Zoe - Senior Product Data Analyst

Zoe

United Kingdom (Remote)
1 Month ago
build a rocket boy - Animation Programmer

build a rocket boy

United Kingdom (Remote)
3 Months ago
Synthesia - Digital Designer (mid-level)

Synthesia

London, England, United Kingdom (Remote)
2 Weeks ago
DNEG - AI Audio Engineer

DNEG

London, England, United Kingdom (On-Site)
2 Months ago
PHINIA - Technical Service Technician

PHINIA

Buckingham, England, United Kingdom (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Argus - Site Reliability Engineer (LATAM)

Argus

(Remote)
4 Months ago
Trellix - Site Reliability Engineer

Trellix

Cork, County Cork, Ireland (On-Site)
2 Months ago
C3 IoT - Site Reliability Engineer - Field Operations

C3 IoT

Redwood City, California, United States (On-Site)
1 Month ago
Zazz - Cloud Engineer (Azure)

Zazz

(Remote)
5 Months ago
zoox - Site Reliability Engineer

zoox

Foster City, California, United States (Hybrid)
2 Months ago
Discord - Software Engineer, Traffic Infrastructure

Discord

San Francisco, California, United States (Remote)
2 Months ago
Semgrep - Senior Software Engineer, Infrastructure

Semgrep

San Francisco, California, United States (On-Site)
4 Weeks ago
Apple - On-device ML Infrastructure Engineer (ML Execution)

Apple

Cupertino, California, United States (On-Site)
2 Months ago
Workato - Senior Infrastructure Engineer

Workato

Lisbon, Lisbon, Portugal (On-Site)
1 Month ago
Flowable - Devops Architect

Flowable

Spain (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Bellevue, Washington, United States (On-Site)

Chicago, Illinois, United States (On-Site)

Berlin, Berlin, Germany (On-Site)

Hudson, New Hampshire, United States (On-Site)

Tustin, California, United States (On-Site)

Palo Alto, California, United States (On-Site)

Irvine, California, United States (On-Site)

El Segundo, California, United States (On-Site)

Torrance, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Rivian

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug