Senior Deep Learning Systems Software Engineer - AI Infrastructure

3 Months ago • 5 Years + • Full Stack Development

Job Summary

Job Description

NVIDIA seeks a Senior Deep Learning Systems Software Engineer to optimize deep learning workloads on cutting-edge hardware and software. Responsibilities include analyzing, profiling, and optimizing workloads; building automation tools; collaborating with cross-functional teams; identifying and resolving performance bottlenecks; designing performance benchmarks; and providing guidance on cloud application optimization. The ideal candidate will have 5+ years of experience in application performance engineering, experience with large-scale GPU infrastructure, deep learning model architectures (PyTorch), application profiling tools (NVIDIA Nsight, Intel VTune), and strong programming skills (Python, C/C++). The role involves working across the hardware/software stack to achieve peak performance in deep learning training and inference.
Must have:
  • 5+ years application performance engineering experience
  • Large-scale multi-node GPU infrastructure experience
  • Deep learning model architectures & PyTorch expertise
  • Application profiling tools (NVIDIA NSight, Intel VTune)
  • Strong understanding of computer architecture and GPU architecture
  • Proficiency in Python and C/C++
Good to have:
  • CUDA or OpenCL experience
  • NVIDIA server and software ecosystem understanding
  • Experience with large-scale distributed systems
  • Hands-on experience with NVIDIA GPUs, HPC storage, networking, and cloud computing
  • In-depth understanding of storage systems, Linux file systems, and RDMA networking

Job Details

NVIDIA is an industry leader with groundbreaking developments in High-Performance Computing, Artificial Intelligence and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is seeking senior engineers who are mindful of performance analysis and optimization to help us squeeze every last clock cycle out of all facets of Deep Learning such as training and inferencing, one of today's most important workloads in the world. If you are unafraid to work across all layers of the hardware/software stack from GPU architecture to Deep Learning Framework to achieve peak performance, we want to hear from you! This role offers an opportunity to directly impact the hardware and software roadmap in a fast-growing technology company that leads the AI revolution while helping deep learning users around the globe enjoy ever-higher training speeds.

What you'll be doing:

  • Understand, analyze, profile, and optimize deep learning workloads on state-of-the-art hardware and software platforms.

  • Build tools to automate workload analysis, workload optimization, and other critical workflows.

  • Collaborate with cross-functional teams to analyze and optimize cloud application performance on diverse GPU architectures.

  • Identify bottlenecks and inefficiencies in application code and propose optimizations to enhance GPU utilization.

  • Drive end-to-end platform optimization from a hardware level to the application and service levels

  • Design and implement performance benchmarks and testing methodologies to evaluate application performance.

  • Provide guidance and recommendations on optimizing cloud-native applications for speed, scalability, and resource efficiency.

  • Share knowledge and best practices with domain expert teams as they transition applications to distributed environments.

What we need to see:

  • Masters in CS, EE or CSEE or equivalent experience

  • 5+ years of experience in application performance engineering

  • Experience using large scale multi node GPU infrastructure on premise or in CSPs

  • Background in deep learning model architectures and experience with Pytorch and large scale distributed training

  • Experience with application profiling tools such as NVIDIA NSight, Intel VTune etc.

  • Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture. Experience with NVIDIA's Infrastructure and software stacks.

  • Proven experience analyzing, modeling and tuning DL application performance.

  • Proficiency in Python and C/C++ for analyzing and optimizing application code

Ways to stand out from the crowd:

  • Strong fundamentals in algorithms and GPU programming experience (CUDA or OpenCL)

  • Understanding of NVIDIA's server and software ecosystem

  • Hands-on experience in performance optimization and benchmarking on large-scale distributed systems

  • Hands-on experience with NVIDIA GPUs, HPC storage, networking, and cloud computing.

  • In-depth understanding storage systems, Linux file systems, RDMA networking

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you.

Similar Jobs

Epic Games - Senior QA Engineer

Epic Games

(On-Site)
3 Months ago
Google - Software Engineer, University Graduate, 2025

Google

New Taipei City, Taiwan (On-Site)
4 Months ago
DigitalOcean - Senior Software Engineer (Hyderabad)

DigitalOcean

Hyderabad, Telangana, India (Hybrid)
5 Months ago
Blizzard Entertainment - Senior Data Scientist, Computer Graphics

Blizzard Entertainment

Irvine, California, United States (On-Site)
5 Months ago
Framestore - Immersive Developer - London Launchpad Internship 2025

Framestore

England, United Kingdom (On-Site)
1 Month ago
Lucid Reality Labs - Senior Full Stack Javascript Developer

Lucid Reality Labs

Poland (Remote)
4 Months ago
Google - Staff Software Engineer, Infrastructure, Google Cloud Data Management

Google

Sunnyvale, California, United States (On-Site)
4 Months ago
Tesla - Full Stack Developer

Tesla

Prüm, Rhineland-Palatinate, Germany (On-Site)
2 Months ago
PwC - ETIC, Full stack Developer- Graduate Program

PwC

Cairo, Cairo Governorate, Egypt (On-Site)
6 Months ago
CloudLinux - Senior Python/Full Stack Developer (Django-focused)

CloudLinux

City Of Zagreb, Croatia (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Spatial Studio - Animal Company - Unity Gameplay Engineer

Spatial Studio

(Remote)
3 Months ago
ION - Senior Technical Consultant - Endur

ION

Uniondale, New York, United States (On-Site)
6 Months ago
ION - Technical Support Analyst, Jersey City - 9781

ION

Jersey City, New Jersey, United States (On-Site)
6 Months ago
Zoox - Machine Learning Engineer - Collision Avoidance System

Zoox

Foster City, California, United States (Hybrid)
6 Months ago
PearlAbyss - Engineering_Mobile Platform Tech Internship

PearlAbyss

(On-Site)
3 Months ago
Tencent - Software Engineer

Tencent

(On-Site)
2 Months ago
QUANTIC DREAM - Programmeur Gameplay IA (H/F/NB) - Projet Multi-joueurs

QUANTIC DREAM

Paris, Île-de-France, France (Hybrid)
6 Months ago
Saviynt - Technical Account Manager

Saviynt

Atlanta, Georgia, United States (Remote)
6 Months ago
Epic Games - Senior DevOps Programmer

Epic Games

Cary, North Carolina, United States (On-Site)
2 Months ago
PlayStation Global - Senior Site Reliability Engineer

PlayStation Global

Aliso Viejo, California, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

Rackspace Technology - Pursuit Manager (Bid Manager)

Rackspace Technology

Gurugram, Haryana, India (Remote)
3 Months ago
Dream Sports - Manager - Revenue Assurance

Dream Sports

Mumbai, Maharashtra, India (On-Site)
1 Month ago
Krafton  - Game Artist

Krafton

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Sportskeeda - Social Media Manager - Golf

Sportskeeda

India (Remote)
1 Month ago
PhonePe - Site Reliability Engineer-NetOps

PhonePe

Bengaluru, Karnataka, India (On-Site)
5 Months ago
SparkCognition - Senior IT Cloud Engineer

SparkCognition

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Nagarro - Associate Principal Engineer, QA Manual

Nagarro

India (Remote)
6 Months ago
WebMD - Technical Lead

WebMD

Maharashtra, India (On-Site)
3 Months ago
Sportskeeda - Entertainment Writer

Sportskeeda

India (Remote)
6 Months ago
PwC - IN-Manager - e-governance & Govt Consulting – Cities – Advisory-Gurgaon

PwC

Gurugram, Haryana, India (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Full Stack Development Jobs

Ubisoft - Web Developer

Ubisoft

Bucharest, Bucharest, Romania (Hybrid)
2 Months ago
Next Level Business Services - Technical Lead (ASP.NET / Site core)

Next Level Business Services

Philadelphia, Pennsylvania, United States (On-Site)
6 Months ago
The Mill Adventure - Senior Back-End Developer

The Mill Adventure

St. Julian's, Malta (Remote)
6 Months ago
ION - Front-End / GUI Developer C#- 4908

ION

Noida, Uttar Pradesh, India (Hybrid)
7 Months ago
CloudHire - Full Stack Developer - Angular & Node

CloudHire

Hyderabad, Telangana, India (Remote)
6 Months ago
Google - Software Engineer III, Infrastructure, Google Cloud Networking

Google

Sunnyvale, California, United States (On-Site)
5 Months ago
NVIDIA - Senior Software Engineer - Data Center System Bringup

NVIDIA

Santa Clara, California, United States (On-Site)
3 Months ago
Jaspersoft - Senior Software Engineer

Jaspersoft

Pune, Maharashtra, India (On-Site)
6 Months ago
Playnetic - Game Developer

Playnetic

(Remote)
3 Months ago
Embark Studios - Fullstack Engineer - Commercial Tech

Embark Studios

Stockholm, Stockholm County, Sweden (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug