Data Center System Software Architect, DGX Cloud

1 Month ago • 10 Years + • DevOps • Research & Development • $184,000 PA - $425,500 PA

Job Summary

Job Description

NVIDIA seeks a Data Center System Software Architect for its DGX Cloud team. Responsibilities include leading the architecture, design, and implementation of next-generation DGX cloud clusters using cutting-edge technologies. This full-stack role encompasses hardware architecture, workload orchestration, and application performance tuning. The ideal candidate possesses 10+ years of experience in system software, strong programming skills (C, C++, Go, Rust), expertise in distributed systems, and excellent communication skills. The role involves collaborating with various engineering teams across NVIDIA to ensure seamless software integration, from hardware to AI training applications. The architect will provide solutions for complex problems and translate requirements into a vision, architecture, and roadmap.
Must have:
  • 10+ years system software experience
  • Strong programming skills (C, C++, Go, Rust)
  • Distributed systems expertise
  • Excellent communication skills
  • Data science/deep learning knowledge
Good to have:
  • TensorFlow/PyTorch experience
  • Docker, Kubernetes, Slurm experience
  • CUDA/NCCL programming
  • HPC programming (MPI, OpenACC)
  • DGX Cloud, NVIDIA AI Enterprise experience
Perks:
  • Equity
  • Benefits

Job Details

NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, a deep understanding of distributed systems, familiarity with software testing and deployment, and excellent communication and planning abilities. We also welcome out-of-the-box thinkers who can provide new ideas with strong at execution bias. Expect to be constantly challenged, improving, and evolving for the better. You and other engineers in this team will help advance NVIDIA's capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications that affect core data science. What are you waiting for if you're creative, passionate about what you do, and love having fun apply today!

We’re looking for a highly motivated, creative engineer with strong experience in system software to join the DGX Cloud Software Team. You will lead the architecture, design and implementation of our next generation DGX cloud clusters using latest technologies. On this team, you will do full stack deployment including hardware architecture, workload orchestration and application performance tuning. Are you ready to change the next generation of computing? Join us at the forefront of technological advancement.

What you’ll be doing:

  • Lead technical activities for data centers with focus on hybrid deployments between cloud and on-prem

  • Providing expertise in infrastructure workflows, including hardware, workload orchestration and application tuning

  • Provide fast and creative solutions for complex problems and write effective, clear and reliable architecture specification

  • Translate requirements to vision, architecture and roadmap

  • Work with engineering teams across NVIDIA to ensure your software integrates seamlessly from the hardware all the way up to the AI training applications.

What we need to see:

  • Masters or PhD in Computer Science, Computer Engineering, Physics or equivalent experience

  • 10+ years of experience in this field.

  • Data Sciences, Deep Learning, or Machine Learning coursework

  • Ability to seamlessly shift between Linux system environments to Python programming

  • Programming skills in 1 or more high-level languages (C, C++,Go,Rust etc)

  • System-level experience with both hardware and software

  • Motivated self-starter with an equal balance of strong problem-solving skills and customer-facing communication skills

  • Strong design, coding, analytical, debugging and problem-solving skills

  • Passion for continuous learning and knowledge transfer. Ability to work concurrently with multiple groups locally and abroad in the organization

Ways to stand out from the crowd:

  • Experience with GPU deep learning and data sciences. Experience using TensorFlow, PyTorch or other DL framework. Experience working with Docker containers, Slurm, Terraform and Kubernetes

  • CUDA programming and NCCL experience. HPC programming experience including MPI, OpenACC, or other parallel programming tools

  • Hands-on experience with DGX Cloud, NVIDIA AI Enterprise AI Software, Base Command Manager, NEMO and NVIDIA Inference Microservices.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you are creative and autonomous, we want to hear from you!

The base salary range is 184,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Social Discovery Group - Senior NLP Engineer

Social Discovery Group

(Remote)
3 Weeks ago
Passive Logic - AI Control Theory & Optimization Scientist

Passive Logic

Salt Lake City, Utah, United States (On-Site)
2 Months ago
ByteDance - Research Scientist Intern in Foundation Models for Science (ByteDance Research) - 2025 Summer/Fall (PhD)

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
NVIDIA - Software Engineering Manager, Distributed Task-based Runtimes

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
Microsoft - Research Intern - Applied Science in Viva Insights

Microsoft

Redmond, Washington, United States (On-Site)
1 Month ago
AbZorba Games  - Dev Ops Engineer

AbZorba Games

Athens, Greece (On-Site)
8 Months ago
Egnyte - Staff Software Engineer

Egnyte

Mountain View, California, United States (Hybrid)
3 Months ago
PwC - ETIC, OCI Technical Support Engineer - Manager

PwC

Cairo, Cairo Governorate, Egypt (On-Site)
4 Months ago
GoTo Group - Senior Software Engineer - Event Platform

GoTo Group

Gurugram, Haryana, India (On-Site)
4 Months ago
Samsung Semiconductor - Staff DevOps Engineer

Samsung Semiconductor

San Jose, California, United States (Hybrid)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

NVIDIA - Software Engineering Intern - 2025

NVIDIA

Shanghai, Shanghai, China (On-Site)
1 Month ago
NVIDIA - Senior Mask Design Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
Intel Corporation - Senior IP Design Engineer (HBM Controller)

Intel Corporation

Center District, Israel (Hybrid)
2 Months ago
Riot Games - Research Scientist Intern - Generative AI - Summer 2025 (Remote)

Riot Games

Dublin, County Dublin, Ireland (Remote)
3 Months ago
Ubisoft - Principal R&D Scientist on Bots & Behaviors

Ubisoft

Bordeaux, Nouvelle-Aquitaine, France (Hybrid)
1 Month ago
ByteDance - Sales Engineer (South Asia) - BytePlus

ByteDance

Singapore (On-Site)
3 Months ago
InMobiInMobi - Data Scientist II

InMobiInMobi

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Microsoft - Senior Researcher (MSR Asia)

Microsoft

Singapore (On-Site)
1 Month ago
Social Discovery Group - Senior NLP Engineer

Social Discovery Group

Poland (Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

CloudHire - Pathology Assistant Reallocation Opportunity, New York

CloudHire

North Carolina, United States (On-Site)
4 Months ago
Keywords Studios (Player Support) - Korean to English Technical Freelance Translator (Remote/Freelance)

Keywords Studios (Player Support)

United States (Remote)
3 Months ago
Patreon - Partner Manager, Top Creators

Patreon

San Francisco, California, United States (Hybrid)
1 Month ago
Fabric - Applied Researcher, Cryptography Hardware

Fabric

Los Angeles, California, United States (Remote)
4 Months ago
Nintendo - CONTRACT - Associate Account Administrator

Nintendo

Redmond, Washington, United States (Hybrid)
2 Months ago
Rockstar Games - Senior Graphic Designer

Rockstar Games

New York, New York, United States (On-Site)
5 Months ago
ByteDance - KOL Business Development Manager - DCar (Third-party Contractor)

ByteDance

Los Angeles, California, United States (On-Site)
3 Months ago
The Walt Disney Company - Software Engineer, Test

The Walt Disney Company

Emeryville, California, United States (On-Site)
3 Months ago
Sphere Entertainment Co - Senior Director Post Production Pipeline

Sphere Entertainment Co

Burbank, California, United States (On-Site)
3 Months ago
Take-Two Interactive - Senior Cybersecurity Risk Analyst

Take-Two Interactive

Texas, United States (On-Site)
5 Days ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Ema Unlimited - Platform Engineer

Ema Unlimited

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
BayOne Solutions - DevOps Engineer

BayOne Solutions

Gurugram, Haryana, India (Hybrid)
5 Months ago
Demonware - Platform Engineering Co-op

Demonware

Vancouver, British Columbia, Canada (Hybrid)
3 Weeks ago
EXUSIA - Lead Data Engineers – Azure/Databricks/Snowflake

EXUSIA

United States (Remote)
3 Weeks ago
Microsoft - Principal Software Engineering Manager

Microsoft

Bucharest, Bucharest, Romania (Remote)
1 Month ago
Microsoft - ROP - Senior Software Engineering Manager

Microsoft

Hyderabad, Telangana, India (On-Site)
1 Month ago
NVIDIA - Design Verification Infrastructure Engineer

NVIDIA

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Velotio Technologies - Senior Engineer (Node.js & DevOps)

Velotio Technologies

Maharashtra, India (Remote)
1 Week ago
NVIDIA - Senior ASIC Engineer

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
1 Month ago
Microsoft - Critical Environment Technical Trainer

Microsoft

Jakarta, Jakarta, Indonesia (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Shenzhen, Guangdong Province, China (On-Site)

Bengaluru, Karnataka, India (On-Site)

Taipei City, Taiwan (On-Site)

Taipei City, Taiwan (On-Site)

Shanghai, Shanghai, China (On-Site)

Shanghai, Shanghai, China (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug