Deep Learning Engineer, Datacenters

1 Month ago • 3 Years + • Research & Development

Job Summary

Job Description

NVIDIA's Deep Learning Engineer, Datacenters role focuses on optimizing next-generation systems and the deep learning software stack. Responsibilities include developing software infrastructure for analyzing deep learning applications, evolving cost-efficient datacenter architectures for LLMs, creating analysis and profiling tools (Python, bash, C++), analyzing system and software characteristics of DL applications, and developing methodologies to measure performance metrics. This role requires collaboration with various teams across NVIDIA, impacting the development of high-performance datacenters designed for the future of AI. The engineer will analyze how CPU, GPU, networking, and IO relate to deep learning architectures for various technologies.
Must have:
  • Bachelor's degree in EE or CS
  • 3+ years relevant experience
  • System software/Silicon architecture experience
  • C/C++ and Python programming
  • Strong analytical skills
Good to have:
  • GPU kernels (CUDA)
  • DL Frameworks (PyTorch, TensorFlow)
  • Containerization (Docker)
  • Datacenter Workload Managers (Slurm)
  • Performance modeling experience

Job Details

As NVIDIA makes inroads into the Datacenter business, our team plays a central role in getting the most out of our exponentially growing datacenter deployments as well as establishing a data-driven approach to hardware design and system software development. We collaborate with a broad cross section of teams at NVIDIA ranging from DL research teams to CUDA Kernel and DL Framework development teams, to Silicon Architecture Teams. As our team grows, and as we seek to identify and take advantage of long term opportunities, our skillset needs are expanding as well.

Do you want to influence the development of high-performance Datacenters designed for the future of AI? Do you have an interest in system architecture and performance? In this role you will find how CPU, GPU, networking, and IO relate to deep learning (DL) architectures for Natural Language Processing, Computer Vision, Autonomous Driving and other technologies. Come join our team, and bring your interests to help us optimize our next generation systems and Deep Learning Software Stack.

What you'll be doing:

  • Help develop software infrastructure to characterize and analyze a broad range Deep Learning applications
  • Evolve cost-efficient datacenter architectures tailored to meet the needs of Large Language Models (LLMs).
  • Work with experts to help develop analysis and profiling tools in Python, bash and C++ to measure key performance metrics of DL workloads running on Nvidia systems.
  • Analyze system and software characteristics of DL applications.
  • Develop analysis tools and methodologies to measure key performance metrics and to estimate potential for efficiency improvement.

What we need to see:

  • A Bachelor’s degree in Electrical Engineering or Computer Science with 3 years or more of relevant experience (Masters or PhD degree preferred)
  • Experience in at least one of the following:
    • System Software: Operating Systems (Linux), Compilers, GPU kernels (CUDA), DL Frameworks (PyTorch, TensorFlow).
    • Silicon Architecture and Performance Modeling/Analysis: CPU, GPU, Memory or Network Architecture
  • Experience programming in C/C++ and Python. Exposure to Containerization Platforms (docker) and Datacenter Workload Managers (slurm) is a plus
  • Demonstrated ability to work in virtual environments, and a strong drive to own tasks from beginning to end. Prior experience with such environments will make you stand out.

Ways to stand out from the crowd:

  • Background with system software, Operating system intrinsics, GPU kernels (CUDA), or DL Frameworks (PyTorch, TensorFlow).

  • Experience with silicon performance monitoring or profiling tools (e.g. perf, gprof, nvidia-smi, dcgm).

  • In depth performance modeling experience in any one of CPU, GPU, Memory or Network Architecture

  • Exposure to Containerization Platforms (docker) and Datacenter Workload Managers (slurm).

  • Prior experience with multi-site teams or multi-functional teams.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!

#LI-Hybrid

Similar Jobs

NetSPI - Lead DevOps Engineer

NetSPI

Pune, Maharashtra, India (On-Site)
4 Months ago
PhonePe - SRE - Big Data (OnPrem)

PhonePe

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Moon Active - DevOps Team Leader

Moon Active

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
2 Months ago
Omind - Senior DevOps Engineer

Omind

Bengaluru, Karnataka, India (On-Site)
4 Months ago
PwC - Senior Associate_Azure Data Engineer_Data & Analytics_Advisory_PAN  India

PwC

Kolkata, West Bengal, India (On-Site)
4 Months ago
N-iX - Senior Electronics Engineer

N-iX

Ukraine (Remote)
1 Month ago
Fabric - Applied Researcher, Cryptography Hardware

Fabric

British Columbia, Canada (Remote)
4 Months ago
Analog Devices - Senior Software Engineer

Analog Devices

Bengaluru, Karnataka, India (On-Site)
4 Months ago
ByteDance - Technical Expert, Large Language Model

ByteDance

Singapore (On-Site)
3 Months ago
Meta - ASIC Engineer, Design

Meta

Menlo Park, California, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Escape Velocity Entertainment - IT Systems Engineer

Escape Velocity Entertainment

(Remote)
3 Weeks ago
Playrix - Senior Release Engineer

Playrix

Portugal (Remote)
3 Months ago
ION - Senior DevSecOps Engineer, Italy

ION

London, England, United Kingdom (On-Site)
4 Months ago
The Walt Disney Company - Senior Real Time Pipeline Engineer (PH)

The Walt Disney Company

Glendale, California, United States (On-Site)
3 Months ago
Codvoai - Senior Data Scientist

Codvoai

Pune, Maharashtra, India (Remote)
1 Year ago
PlayStation Global - Senior Service Reliability Engineer

PlayStation Global

Aliso Viejo, California, United States (On-Site)
3 Months ago
Spell Brush - Software Engineer

Spell Brush

San Francisco, California, United States (On-Site)
3 Months ago
Sabre India - Data Scientist

Sabre India

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Relax Gaming  - Security Engineer

Relax Gaming

Helsinki, Uusimaa, Finland (Hybrid)
2 Months ago
10 Chambers - Senior Build Engineer

10 Chambers

Stockholm, Stockholm County, Sweden (On-Site)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

EXUSIA - Data Governance Developer - Collibra & Ab Initio

EXUSIA

India (Remote)
4 Months ago
PluginLive - Recruitment Associate

PluginLive

Mumbai, Maharashtra, India (On-Site)
4 Months ago
Entrata - Product Owner

Entrata

Pune, Maharashtra, India (Hybrid)
4 Months ago
Nielsen Holdings - C#, .Net

Nielsen Holdings

Mumbai, Maharashtra, India (Hybrid)
2 Months ago
Hitachi - Performance Testing

Hitachi

Pune, Maharashtra, India (Remote)
4 Months ago
Zippin - Senior Embedded Software Engineer

Zippin

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Paytm - Micro Market Manager - QR - Jaipur

Paytm

Jaipur, Rajasthan, India (On-Site)
4 Months ago
PwC - Senior Associate_GCP Data Engineer_Data and  Analytics_Advisory_Bengaluru

PwC

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Barry Callebaut Group - Digital Learning Designer

Barry Callebaut Group

Hyderabad, Telangana, India (Hybrid)
4 Months ago

Get notifed when new similar jobs are uploaded

Research & Development Jobs

NVIDIA - Physical Design Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
3 Days ago
Microsoft - Research Intern - Advanced Memory Systems

Microsoft

Mountain View, California, United States (On-Site)
1 Month ago
ByteDance - Backend Engineer, ARK Large Model Platform (Singapore)

ByteDance

Singapore (On-Site)
3 Months ago
Google - Senior Software Engineer, Machine Learning, YouTube

Google

Mountain View, California, United States (On-Site)
3 Months ago
Intel Corporation - Linux Kernel Developer

Intel Corporation

Tel Aviv-Yafo, Tel Aviv District, Israel (Hybrid)
1 Month ago
NVIDIA - Senior Mask Design Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
Brightline - Chief Software Architect

Brightline

Virginia, United States (Hybrid)
1 Month ago
NVIDIA - Senior Compiler Engineer, Software - Deep Learning Accelerator

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
Luxoft - Senior GPU Compiler Software Development Engineer

Luxoft

Türkiye (Remote)
2 Months ago
ByteDance - Research Scientist Intern (Doubao (Seed) - Foundation Model, Speech Understanding) - 2024 Summer (PhD)

ByteDance

San Jose, California, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Yokne'am Illit, North District, Israel (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

United States (Remote)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Bengaluru, Karnataka, India (Hybrid)

Bengaluru, Karnataka, India (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug