Senior System Software Engineer – DC Platform Software Tools

3 Weeks ago • 10 Years + • Research & Development • $184,000 PA - $356,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior System Software Engineer to design, develop, and deploy tools for managing large-scale AI data centers. The role focuses on creating user-friendly tools for the data center lifecycle (deployment, production, service, repair) for DGX, HGX, or MGX products. Responsibilities include gathering requirements from cross-functional teams and customers, creating solutions, ensuring proper tools for managing server software and firmware, and contributing to all phases of product development. The ideal candidate will have strong Python skills, experience with large-scale data center programming and debugging, and a proven track record in management solutions for large-scale clusters.
Must have:
  • 10+ years experience
  • Large-scale cluster management
  • Strong Python skills
  • Data center debugging experience
  • Excellent communication skills
Good to have:
  • Data center deployment experience
  • x86 or ARM architecture knowledge
  • Processor microarchitecture familiarity
  • Experience with code coverage tools
Perks:
  • Equity
  • Benefits

Job Details

We are looking for a: Senior System Software Engineer – DC Platform SW Tools. NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern deep learning — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and establish teams with the most thoughtful people in the world.   NVIDIA Grace and GPU superchips provide performance and productivity required for strong scaling for HPC and generative AI workload. Scale out is inherent to the design of this massive superchip.  

We are looking for a Senior System Software Engineer to join our Data Center Platform Software Tools team. You will be responsible for the design, development, enhancement, and deployment of tools in large-scale AI data centersThe primary focus of these tools is to provide simple user experience in the data center manageability life cycle from deployment, production, service, and repair workflows. You will work closely with cross-functional teams, including hardware engineers, system architects, software developers, and customers to gather requirements, create solutions and provide end-to-end simplified manageability experience. Are you ready to change the next generation of computing? Join us at the forefront of technological advancement.  

What you’ll be doing: 

  • Drive next generation GPU Server Software manageability workflows for scaling AI infrastructure for Datacenters. This infrastructure includes DGX, HGX or MGX Products. You will be involved in ensuring proper tools are built for managing Server Software and Firmware for data center lifecycle. 

  • Work with internal and external customers to understand requirements for various tools to improve debuggability, serviceability and runtime of data center firmware and software. 

  • Contribute to all phases of product development, from product definition, architecture, and design, through implementation, debugging, testing and early customer support. 

  • Maintain detailed documentation of tool designs, capabilities, and usage guidelines. Provide regular reports and technical insights to internal teams on the effectiveness and improvements of developed tools. 

  • Define KPIs for tools and work across various stakeholders to improve it over time. 

 

What we need to see: 

  • BS, MS, or PhD in EE/CS or related field of education (or equivalent experience) with 10+ years of experience 

  • Proven record of having worked in management solutions for large scale clusters in data centers. 

  • Strong and demonstrable skill in Python 

  • Experience programming and debugging skills for large scale data centers. 

  • Experience in SCM (e.g., Git, Perforce) and project management tools like Jira. 

  • Possess excellent written and oral communication skills, excellent work ethics, a deep sense of teamwork, love to produce quality work and commitment to finish your tasks every single day. 

  • You are a self-starter who loves to find creative solutions to complicated problems and hands on with coding. 

 

Ways to stand out from the crowd: 

  • Worked on data center deployment and management projects. 

  • Hands on with x86 or ARM system architecture. 

  • Are familiar with processor microarchitecture such as caches, pipelining, memory hierarchy, and instruction set architecture (ISA). Experience with code coverage and static analysis tools. 

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you! 

The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Lakshya Digital - Dot Net Developer

Lakshya Digital

Haryana, India (On-Site)
1 Month ago
Hawk Eye Innovations - Delivery Manager

Hawk Eye Innovations

Basingstoke, England, United Kingdom (On-Site)
3 Weeks ago
Virtusa - DevOps Lead

Virtusa

Pune, Maharashtra, India (Hybrid)
5 Months ago
Hologate gmbh - Senior UX/UI Designer

Hologate gmbh

Munich, Bavaria, Germany (On-Site)
3 Weeks ago
Nordcurrent - Experienced QA Mobile Game Tester

Nordcurrent

Vilnius, Vilnius County, Lithuania (On-Site)
5 Months ago
Riot Games - Manager, Software Engineering - Teamfight Tactics, Gameplay

Riot Games

Los Angeles, California, United States (On-Site)
4 Weeks ago
Valeo - Site Management Controller

Valeo

Chennai, Tamil Nadu, India (On-Site)
5 Months ago
NVIDIA - Senior Compiler Engineer, Software - Deep Learning Accelerator

NVIDIA

Santa Clara, California, United States (On-Site)
3 Months ago
NVIDIA - Senior Physical Design Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
NVIDIA - Senior Chip Design Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Cloud Imperium Games - Senior IT Support Specialist

Cloud Imperium Games

Manchester, England, United Kingdom (On-Site)
1 Month ago
Epic Games - Tester I

Epic Games

Porto Alegre, State Of Rio Grande Do Sul, Brazil (On-Site)
4 Weeks ago
SmileGate - Platform Division Web Service Backend Developer

SmileGate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
3 Months ago
Luxoft - QA Automation Engineer with Java

Luxoft

Wrocław, Lower Silesian Voivodeship, Poland (On-Site)
5 Months ago
Mixmob - Senior Full-Stack React/Node & NFT Gaming Developer

Mixmob

Vancouver, British Columbia, Canada (Remote)
9 Months ago
Lionbridge Games - Technical Software Test Engineer 3

Lionbridge Games

Quebec, Canada (On-Site)
2 Months ago
Bohemia Interactive - Lead Character Artist

Bohemia Interactive

Brno, South Moravian Region, Czechia (On-Site)
1 Week ago
Cargo Studio - Lead DevOps Engineer

Cargo Studio

(On-Site)
2 Months ago
Ubisoft - Team Lead - Animation

Ubisoft

Toronto, Ontario, Canada (On-Site)
1 Month ago
Lightspeed LA - Production Assistant

Lightspeed LA

Los Angeles, California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

NVIDIA - Manager, Systems Software

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
The Walt Disney Company - Principal User Experience Designer

The Walt Disney Company

Glendale, California, United States (On-Site)
1 Week ago
Apollo - Senior Engineering Manager (EST)

Apollo

United States (Remote)
6 Months ago
Evolution - Studio Game Presenter (Retail Sales / Cashier Alternative)

Evolution

Fairfield, Connecticut, United States (On-Site)
7 Months ago
Google - Engineering Manager, Google Kubernetes Engine, Access Platform

Google

Seattle, Washington, United States (On-Site)
4 Days ago
PlayStation Global - Senior Program Manager, Account & Identity

PlayStation Global

California, United States (On-Site)
3 Weeks ago
Tencent - Senior Strategic Sales Executive

Tencent

Los Angeles, California, United States (On-Site)
3 Months ago
Framestore - FREELANCE: FLAME - CHICAGO

Framestore

Chicago, Illinois, United States (On-Site)
10 Months ago
Google - Policy Escalation Specialist, Rapid Response

Google

San Bruno, California, United States (On-Site)
4 Days ago
Mob Entertainment - Senior Game Designer Remote

Mob Entertainment

United States (Remote)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Research & Development Jobs

NXP - <2025 Internship Program> Application Engineer

NXP

Taipei City, Taiwan (On-Site)
5 Months ago
Google - Silicon IP RTL Design Engineer

Google

Bengaluru, Karnataka, India (On-Site)
4 Days ago
ByteDance - NPU FW Engineer/Architect- Pico - San Jose

ByteDance

San Jose, California, United States (On-Site)
4 Months ago
NVIDIA - Research Scientist, Design Automation

NVIDIA

Austin, Texas, United States (On-Site)
1 Month ago
Valve corporation - Electrical Engineer

Valve corporation

Bellevue, Washington, United States (On-Site)
5 Months ago
Meta - Software Engineer (Technical Leadership) - Machine Learning

Meta

New York, New York, United States (On-Site)
5 Months ago
Trackman - Team Lead - Radar & High-Speed Electronics

Trackman

Hørsholm, Denmark (On-Site)
3 Weeks ago
NVIDIA - Silicon Validation Engineer (RDSS Intern)

NVIDIA

Taipei City, Taiwan (On-Site)
3 Months ago
Zuru - Computational Mechanics Engineer

Zuru

Modena, Emilia-Romagna, Italy (Hybrid)
6 Months ago
NVIDIA - Research Scientist, Circuits

NVIDIA

Taipei City, Taiwan (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Hyderabad, Telangana, India (On-Site)

Pune, Maharashtra, India (On-Site)

Pune, Maharashtra, India (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Shenzhen, Guangdong Province, China (On-Site)

Taipei City, Taiwan (On-Site)

California, United States (Remote)

Yokne'am Illit, North District, Israel (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug