Senior AI Training Performance Engineer

2 Months ago • 5-13 Years • Artificial Intelligence

Job Summary

Job Description

NVIDIA seeks a Senior AI Training Performance Engineer to optimize AI training workloads. Responsibilities include analyzing, profiling, and optimizing AI and deep learning training workloads on cutting-edge hardware and software; understanding training performance on GPUs, solving problems across numerous neural networks; implementing software across NVIDIA's deep learning platform (drivers to DL frameworks); implementing key DL training workloads in NVIDIA's simulators for future architecture studies; and building tools to automate workload analysis and optimization. This role offers significant impact on hardware and software roadmaps within a rapidly growing AI company.
Must have:
  • PhD (or equiv.) in CS/EE/CSEE & 5+ years exp. or MS & 8+ years exp.
  • Deep learning & neural network training expertise
  • Strong computer architecture understanding (GPU architecture)
  • Application performance analysis and tuning
  • Processor & system-level performance modeling
  • C++, Python, CUDA programming skills

Job Details

We are now looking for a Senior AI Training Performance Engineer!

NVIDIA is seeking senior engineers who are obsessed with performance analysis and optimization to help us squeeze every last clock cycle out of AI training, one of the most important workloads in the world. If you are unafraid to work across all layers of the hardware/software stack from GPU architecture to Deep Learning Framework to achieve peak performance, we want to hear from you! This role offers the opportunity to directly impact the hardware and software roadmap in a fast-growing technology company that leads the AI revolution while helping deep learning users around the globe enjoy ever-higher training speeds.

What you will be doing:

  • Understand, analyze, profile, and optimize AI and deep learning training workloads on state-of-the-art hardware and software platforms.

  • Understand the big picture of training performance on GPUs, prioritizing and then solving problems across many dozens of state-of-the-art neural networks.

  • Implement production-quality software in multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.

  • Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.

  • Build tools to automate workload analysis, workload optimization, and other critical workflows.

What we want to see:

  • PhD (or equivalent experience) in CS, EE or CSEE and 5+ years; or MS and 8+ years of relevant work experience.

  • Strong background in deep learning and neural networks, in particular training.

  • Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.

  • Proven experience analyzing and tuning application performance.

  • Experience with processor and system-level performance modelling.

  • Programming skills in C++, Python, and CUDA.

Intelligent machines powered by AI computers that can learn, reason and interact with people are no longer science fiction. Today, a self-driving car powered by artificial intelligence can meander through a country road at night and find its way. An AI-powered robot can learn motor skills through trial and error. This is truly an extraordinary time. The era of AI has begun, and we are powering it. NVIDIA is increasingly known as the AI Computing company and is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. Are you passionate about performance? Are you interested in working on industry-leading Deep Learning products? Come, join our Deep Learning Architecture team, where you can help build real-time, cost-effective computing platforms driving our success in this exciting and rapidly growing field.

#LI-Hybrid

Similar Jobs

Vigaet - Self-Driving Car Intern

Vigaet

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Playrix - Game Director

Playrix

Georgia (Remote)
5 Months ago
ByteDance - Machine Learning Engineer Intern (Global E-commerce Risk Control) - 2025 Summer (PhD)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Tencent - Research Intern (Speech)

Tencent

California, United States (On-Site)
1 Month ago
Tencent - Research Intern

Tencent

Palo Alto, California, United States (On-Site)
1 Month ago
Meta - Software Engineer, Machine Learning

Meta

Seattle, Washington, United States (On-Site)
4 Months ago
ByteDance - Research Scientist- Foundation Model, Video Generation

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
Genies - Machine Learning Engineer, Character Animation & Motion AI

Genies

San Mateo, California, United States (On-Site)
6 Days ago
Pika - Research Engineer (Applied Research)

Pika

Palo Alto, California, United States (On-Site)
6 Days ago
Rackspace Technology - Principal MLOps Engineer

Rackspace Technology

San Antonio, Texas, United States (Remote)
6 Days ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

NVIDIA - Design Engineer, Coherent High Speed Interconnect

NVIDIA

Canada (On-Site)
1 Month ago
Warner Bros Games - Senior Data Scientist

Warner Bros Games

Hyderabad, Telangana, India (Hybrid)
4 Weeks ago
Resemble AI - Deep Learning Speech Researcher

Resemble AI

Mountain View, California, United States (On-Site)
7 Months ago
Playrix - Game Director

Playrix

Ukraine (Remote)
5 Months ago
G5 Games - 2D UI/UX Artist (Hidden objects project)

G5 Games

Tbilisi, Tbilisi, Georgia (Remote)
2 Months ago
Playrix - Game Director

Playrix

Ireland (Remote)
5 Months ago
G5 Games - 2D UI/UX Artist (match-3 project)

G5 Games

Astana, Astana, Kazakhstan (Remote)
5 Months ago
Zoox - Collision Avoidance System, Machine Learning Internship/Co-op

Zoox

Foster City, California, United States (On-Site)
5 Months ago
PwC - IN-Manager_ Advanced Analytics & ML _D&A_Advisory_Gurgaon

PwC

Gurugram, Haryana, India (On-Site)
5 Months ago
NXP - Test infrastructure developer intern

NXP

Roznov, Neamț County, Romania (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Shanghai, Shanghai, China

NVIDIA - Manufacturing Engineer

NVIDIA

Shenzhen, Guangdong Province, China (On-Site)
4 Weeks ago
Tencent - Overseas UA Manager - Casual Games

Tencent

Shenzhen, Guangdong Province, China (On-Site)
4 Days ago
Tencent - 3D Environment Lighting Artist (2D Open-World Game)

Tencent

Guangzhou, Guangdong Province, China (On-Site)
1 Month ago
NVIDIA - System Software Engineer, GPU Development Tools

NVIDIA

Shanghai, Shanghai, China (Hybrid)
2 Months ago
NVIDIA - System Software Engineer Intern, Apache Spark Solutions - 2025

NVIDIA

Shanghai, Shanghai, China (On-Site)
2 Months ago
NVIDIA - Senior Software and System Architect

NVIDIA

Shanghai, Shanghai, China (On-Site)
2 Months ago
Tencent - Senior System Planner

Tencent

Shanghai, Shanghai, China (On-Site)
3 Months ago
Tencent - Overseas Anime Game Community User Operator

Tencent

Shanghai, Shanghai, China (On-Site)
1 Month ago
Tencent - User Growth Manager

Tencent

Shenzhen, Guangdong Province, China (On-Site)
1 Month ago
Wicresoft - unity开发【玩法】

Wicresoft

Shenzhen, Guangdong Province, China (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

VGW - Machine Learning Engineer

VGW

Perth, Western Australia, Australia (On-Site)
4 Weeks ago
The Walt Disney Company - Lead Machine Learning Engineer

The Walt Disney Company

Seattle, Washington, United States (On-Site)
1 Week ago
Canva - Senior Backend Software Engineer - AI Help Platform

Canva

Sydney, New South Wales, Australia (Remote)
2 Weeks ago
Luxoft - Regular Data Engineer

Luxoft

(Remote)
4 Months ago
Meta - Research Scientist Intern, Language and Multimodal Research for MetaAI (PhD)

Meta

Bellevue, Washington, United States (On-Site)
4 Months ago
ByteDance - Software Engineer Intern (Doubao (Seed) - Machine Learning System) - 2025 Summer (PhD)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Krafton  - Head of Deep Learning PM & Ops

Krafton

Seoul, South Korea (On-Site)
2 Weeks ago
Wargaming - Gen AI Business Development Manager

Wargaming

Berlin, Berlin, Germany (On-Site)
1 Month ago
DNEG - Head of Machine Learning

DNEG

London, England, United Kingdom (Remote)
5 Days ago
Zoox - Senior/Staff Software Engineer - 3D World Generation Pipelines

Zoox

Seattle, Washington, United States (Hybrid)
5 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Ra'anana, Center District, Israel (On-Site)

Ra'anana, Center District, Israel (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug