Senior High-Performance LLM Training Engineer

2 Months ago • 8-10 Years • Full Stack Development • Research & Development • Artificial Intelligence • $184,000 PA - $356,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior High-Performance LLM Training Engineer to optimize LLM training workloads on thousands of GPUs. Responsibilities include performance analysis, optimization of AI training on innovative hardware and software platforms (PyTorch, JAX), implementing production-quality software across NVIDIA's deep learning platform, contributing to the MLPerf Training benchmark, and building tools for workload analysis. The role involves working with cutting-edge neural networks and shaping hardware roadmaps for next-generation GPUs. This position requires deep learning, computer architecture, and programming expertise (C++, Python, CUDA).
Must have:
  • PhD/MS in CS/EE/CE & relevant experience
  • Deep learning & neural network expertise
  • GPU architecture knowledge
  • Performance analysis & tuning skills
  • C++, Python, CUDA programming
Perks:
  • Highly competitive salary
  • Comprehensive benefits package
  • Collaboration with top talent
  • Innovative work environment

Job Details

We are now looking for a Senior High-Performance LLM Training Engineer!

NVIDIA is seeking experienced engineers specializing in performance analysis and optimization to improve the efficiency of LLM training workloads, which are shaping the world's most advanced computing systems. This position focuses on optimizing NVIDIA’s high-performance LLM software stack in frameworks like PyTorch and JAX for high-performance training on thousands of GPUs, while also helping shape hardware roadmaps for the next generation of GPUs powering the AI revolution.

What you will be doing:

  • Understand, analyze, profile, and optimize AI training workloads on innovative hardware and software platforms.

  • Understand the big picture of training performance on GPUs, prioritizing and then solving problems across all state-of-the-art neural networks.

  • Implement production-quality software in multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.

  • Build and support NVIDIA submissions to the MLPerf Training benchmark suite.

  • Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.

  • Build tools to automate workload analysis, workload optimization, and other critical workflows.

What we want to see:

  • PhD in Computer Science, Electrical Engineering or Computer Engineering and 5+ years; or MS (or equivalent experience) and 8+ years of meaningful work experience.

  • Strong background in deep learning and neural networks, in particular training.

  • A deep background in computer architecture and familiarity with the fundamentals of GPU architecture.

  • Proven experience analyzing and tuning application performance & processor and system-level performance modelling.

  • Programming skills in C++, Python, and CUDA.

GPU computing is the most productive and pervasive platform for deep learning and AI. It begins with the most advanced GPUs and the systems and software we build on top of them. We integrate and optimize every deep learning framework. We work with the major systems companies and every major cloud service provider to make GPUs available in data centers and in the cloud. We craft computers and software to bring AI to edge devices, such as self-driving cars and autonomous robots. AI has the potential to spur a wave of social progress unmatched since the industrial revolution.

Widely considered to be one of tech's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. Additionally, this opportunity offers you the ability to collaborate with some of the most forward-thinking and hard-working people in the world, shaping the future of AI in a creative and autonomous work environment that encourages innovation. If you're excited to work across the full hardware & software stack—from GPU architecture to application code—to achieve optimal performance, we want to hear from you!

#LI-Hybrid

The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

CD PROJEKT RED - Data Scientist

CD PROJEKT RED

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Month ago
Guardian - Lead Data Scientist - Insurance

Guardian

New York, United States (Hybrid)
4 Weeks ago
bytedance - Student Researcher (Doubao (Seed) - Foundation Model - Video Generative Model)

bytedance

San Jose, California, United States (On-Site)
1 Month ago
Qualcomm - AI SDK Software Engineer

Qualcomm

Chengdu, Sichuan, China (On-Site)
3 Weeks ago
playrix  - Game Director

playrix

Portugal (Remote)
7 Months ago
Microsoft - Principal Software Engineering Manager

Microsoft

Noida, Uttar Pradesh, India (On-Site)
1 Month ago
ElevenLabs - Full-Stack Engineer (Back-End Leaning - Core)

ElevenLabs

United States (Remote)
2 Months ago
Google - Staff Engineer Tech Lead, Opinion Rewards, Google Ads

Google

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Patterned Learning Career - Senior Software Engineer (Rails)

Patterned Learning Career

(Remote)
3 Months ago
Google - Software Developer III, Infrastructure, Core

Google

Seattle, Washington, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Thousand Eyes - Senior Machine Learning Engineer

Thousand Eyes

San Francisco, California, United States (On-Site)
3 Weeks ago
playrix  - Game Designer

playrix

Georgia (Remote)
7 Months ago
NVIDIA - Senior Design Engineer, Coherent High Speed Interconnect

NVIDIA

Santa Clara, California, United States (Hybrid)
1 Month ago
Scale AI - Machine Learning Engineer, GenAI Applied ML

Scale AI

San Francisco, California, United States (On-Site)
1 Month ago
Granicus - Data Scientist 4

Granicus

Bengaluru, Karnataka, India (Remote)
5 Days ago
G5 games - 2D UI/UX Artist (Hidden objects project)

G5 games

Astana, Astana, Kazakhstan (Remote)
4 Months ago
Joyteractive - Lead UI/UX Designer

Joyteractive

Poland (Remote)
2 Months ago
Intel  - Deep Learning Hardware Engineer

Intel

Santa Clara, California, United States (On-Site)
6 Days ago
G5 games - 2D UI/UX Artist

G5 games

Yerevan, Yerevan, Armenia (Remote)
1 Month ago
Canva - Senior Machine Learning Engineer - Photo AI

Canva

Vienna, Vienna, Austria (Remote)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Google - Account Strategist, Mid-Market Sales

Google

Mountain View, Oregon, United States (On-Site)
1 Month ago
Discord - Senior Software Engineer- Database Infrastructure

Discord

San Francisco, California, United States (Remote)
1 Month ago
Axon - Product Marketing Manager, VR & Training

Axon

Scottsdale, Arizona, United States (On-Site)
2 Weeks ago
magnopus - Recruiting Coordinator

magnopus

Los Angeles, California, United States (Hybrid)
3 Months ago
Varonis  - Account Manager

Varonis

San Diego, California, United States (On-Site)
1 Month ago
Voodoo - Client Partner - USA

Voodoo

New York, New York, United States (Remote)
5 Months ago
Anavation LLC - Mid-Level Software Engineer

Anavation LLC

Quantico, Virginia, United States (Hybrid)
2 Months ago
quience - Recruiter

quience

United States (Remote)
1 Month ago
Philips - Sales, Territory Manager - Peripheral Image Guided Therapy Devices

Philips

Albany, New York, United States (On-Site)
6 Days ago
Axonius - Backend Engineer

Axonius

New York, United States (Hybrid)
5 Days ago

Get notifed when new similar jobs are uploaded

Full Stack Development Jobs

Animoca Brands - Web3 Engineer

Animoca Brands

Hong Kong, Hong Kong (Hybrid)
2 Months ago
Outfit7 - Lead Web Developer

Outfit7

Ljubljana, Ljubljana, Slovenia (On-Site)
8 Months ago
Google - Software Engineer III, Infrastructure, Google Cloud AI

Google

Sunnyvale, California, United States (On-Site)
7 Months ago
The Walt Disney Company - Software Engineer II

The Walt Disney Company

New York, New York, United States (On-Site)
1 Month ago
Nagarro - Senior Engineer, ERP

Nagarro

India (Remote)
7 Months ago
Google - Staff Software Engineer, Google Cloud

Google

Pune, Maharashtra, India (On-Site)
7 Months ago
Ion - Java Junior Developer

Ion

Chișinău, Chisinau, Moldova (Hybrid)
3 Months ago
playrix  - Senior C++ Software Engineer (Build System)

playrix

Armenia (Remote)
6 Months ago
Microsoft - Software Engineer: Microsoft Software and Systems Academy (MSSA)

Microsoft

Redmond, Washington, United States (On-Site)
1 Month ago
PwC - Senior Software Developer (.NET)

PwC

Qormi, Malta (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Pune, Maharashtra, India (On-Site)

Taipei City, Taiwan (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug