Diagnostic Software Manager - Server

4 Days ago • 8 Years + • Research & Development

Job Summary

Job Description

NVIDIA seeks a Diag Software Manager - Server to lead a team of software engineers responsible for developing and improving system stress applications for their data center products. This involves collaborating with multiple teams (architecture, ASIC, systems engineering, operations) to create software that rigorously tests GPU servers in customer and partner environments. The manager will oversee multiple concurrent projects, prioritize tasks, manage engineers, recruit new talent, and develop long-term strategies for the team. Responsibilities include identifying and resolving hardware/software issues, driving feature development, multi-team debugging, and improving product quality and production efficiency. The role requires strong system design and programming skills (C/C++, Python a plus), a deep understanding of computer architecture and operating systems, and experience working within large codebases.
Must have:
  • 8+ years system software experience
  • 4+ years team management
  • C/C++ programming skills
  • Deep understanding of computer architecture
  • Experience in feature development and debugging
Good to have:
  • GPU compute or server product knowledge (BMC, Infiniband, PCIe, NVLink)
  • Experience with customer software teams
  • RAS software engineering experience
  • Python programming skills
Perks:
  • Competitive salary
  • Generous benefits package

Job Details

We seek a manager to lead all aspects of a team of software engineers tasked with improving and crafting a collection of system stress applications tailored for NVIDIA's forthcoming data center products, operational within customer and partner infrastructures. Our focus lies in crafting software that subjects GPU servers to the most thorough testing scenarios imaginable. Our team collaborates closely with architecture, ASIC, systems engineering, and operations teams to devise methodologies aimed at pushing every hardware component to its limits. Situated at the core of NVIDIA's data center enterprise, from GPU baseboards to standalone servers and entire clusters, we are responsible for developing the comprehensive suite of system stress applications. We partner with NVIDIA operation teams to find efficient balance between product quality, test yield, and manufacturing efficiency. Wouldn't you want to be a key factor of NVIDIA gross margin?

What you will be doing:

  • Collaborated with multi-functional teams to do NPI project and improve and refine software deployed on our customers' servers and environments, facilitating detailed identification of hardware or software issues.

  • As the manager, you will run multiple concurrent projects through active prioritization, and communication.

  • On the engineer management side, we want the manager to continue to groom future technical leaders in the team and recruit new talent.

  • Constant development is another area of responsibility. We look for candidates who are proactive - seek opportunities to improve NVIDIA product quality and production efficiency.

  • We also need our candidates to be reactive: be able to drive root cause of critical issues and embrace corrective actions.

  • Finally, we need our leaders to develop long range strategies for the team to prepare for new challenges and drive execution.

What we need to see:

  • Bachelor of science in Computer Science, Computer Engineering, Electrical Engineering (or equivalent experience).

  • 8+ overall years of system software experience, deep understanding of software development principles, comfortable working in large code space and deep driver stack with 4+ years of team management experience

  • Good system design skills

  • Good programming skills in C/C++, python programming is a plus.

  • Solid understanding in computer architecture, operating system, kernel driver, device programming.

  • Experience driving feature development and multi-team debug.

Ways to stand out from the crowd:

  • Knowledge of GPU compute or server product technologies like BMC (Baseboard Management Controller), Infiniband, PCIE, NVLink.

  • Extensive experience collaborating with customer software teams

  • Strong experience to engineer software with consideration of RAS

  • Comfortable with unknown and change

With competitive salaries and a generous benefits package, NVIDIA is widely considered to be one of the most desirable employers in the world. We have some of the most brilliant and talented people in the world working for us. If you are creative, autonomous and love a challenge, we want to hear from you. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

#LI-Hybrid 

Similar Jobs

NVIDIA - System Software Engineer Intern, Autonomous Vehicles

NVIDIA

Shenzhen, Guangdong Province, China (On-Site)
1 Month ago
Airlab Inc  - Senior Lead Programmer (Game Industry)

Airlab Inc

Montreal, Quebec, Canada (On-Site)
8 Months ago
Fluence - Jr. Controls Engineer (m/f/d) - German speaker

Fluence

Erlangen, Bavaria, Germany (Hybrid)
5 Months ago
Welevel - Senior Gameplay Programmer

Welevel

Munich, Bavaria, Germany (On-Site)
4 Weeks ago
PlayStation Global - Senior Pipeline Programmer (Build System)

PlayStation Global

Los Angeles, California, United States (Remote)
2 Days ago
Virtuos - Technical Director

Virtuos

China (On-Site)
4 Days ago
Krafton  - Product Manager (Data & Marketing Product)

Krafton

Seoul, South Korea (On-Site)
2 Weeks ago
Meta - Software Engineer (Leadership) - Machine Learning

Meta

Burlingame, California, United States (Remote)
4 Months ago
Krafton  - [Publishing Platform Div.] Sr. Web Front-End Developer (5년 이상)

Krafton

Seoul, South Korea (On-Site)
4 Months ago
Krafton  - Performance Management & Evaluation Specialist (HRM)

Krafton

Seoul, South Korea (On-Site)
2 Days ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Riot Games - Software Engineer - Platform & Tools (Contractor)

Riot Games

Shanghai, Shanghai, China (On-Site)
5 Months ago
ByteDance - Software Engineer Graduate (RDMA Network- High Speed Network)

ByteDance

San Jose, California, United States (On-Site)
3 Days ago
ByteDance - Optical Scientist - Display Optics System - Pico

ByteDance

San Jose, California, United States (On-Site)
3 Days ago
Playrix - Director of Engineering

Playrix

Armenia (Remote)
5 Months ago
Light Speed Studios - Lead Engine Systems Engineer

Light Speed Studios

Irvine, California, United States (On-Site)
6 Months ago
PwC - Senior AI Developer - Roma [DIG]

PwC

Rome, Lazio, Italy (On-Site)
5 Months ago
NVIDIA - Senior ASIC Design Engineer

NVIDIA

Washington, District Of Columbia, United States (Remote)
2 Weeks ago
Epoch Games - Unreal Engine C++ Programmer

Epoch Games

North Carolina, United States (Remote)
2 Days ago
ByteDance - Software Engineer Large Model System Graduate (Machine Learning Sys-US) - 2024 Start (BS/MS)

ByteDance

Seattle, Washington, United States (On-Site)
4 Months ago
ByteDance - Backend Engineer, Machine Learning Systems - Singapore

ByteDance

Singapore (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Taipei City, Taiwan

NVIDIA - Digital Circuit Design Engineer

NVIDIA

Taipei City, Taiwan (On-Site)
1 Month ago
Appier - Software Engineer, Backend Development

Appier

Taipei City, Taiwan (On-Site)
3 Months ago
Appier - Campaign Analyst (US) 02:00 AM-11:00 AM working hours

Appier

Taipei City, Taiwan (On-Site)
4 Months ago
NVIDIA - Data Systems Analyst (RDSS Intern)

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
2 Months ago
Logitech - TW Gaming Assistant Marketing Manager

Logitech

Taipei City, Taiwan (Hybrid)
2 Months ago
NVIDIA - Payroll Manager

NVIDIA

Taipei City, Taiwan (On-Site)
2 Months ago
NVIDIA - Software Engineering Intern, Autonomous Vehicles (RDSS)

NVIDIA

Taipei City, Taiwan (On-Site)
2 Months ago
NVIDIA - Mixed Signal Analog Circuit Designer (RDSS Intern)

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
2 Months ago
GoMotive - Senior Commodity Manager (Mechanical)

GoMotive

Taipei City, Taiwan (Remote)
1 Month ago
Trend Micro - Staff/Sr. Cloud Service Engineer (VicOne_ Automotive Security)

Trend Micro

Taipei City, Taiwan (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Research & Development Jobs

Buckman - Digital Innovation Director

Buckman

Chennai, Tamil Nadu, India (On-Site)
3 Months ago
Krafton  - Gamelab Coach - Studio Supporter Conversion Position (10+ years)

Krafton

Seoul, South Korea (On-Site)
1 Day ago
Playtika - Java Technical Lead

Playtika

Romania (Hybrid)
4 Months ago
Easygo - Senior Software Development Engineer - Design System

Easygo

Melbourne, Victoria, Australia (On-Site)
1 Week ago
Rivos - SOC Design Verification - Intern

Rivos

Santa Clara, California, United States (On-Site)
5 Months ago
ByteDance - Software Engineer Intern (Machine Learning Platform) - 2024 Summer (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
4 Months ago
NVIDIA - Senior Applied Power Architect - GPU

NVIDIA

Austin, Texas, United States (On-Site)
1 Month ago
Riot Games - Software Engineer - Platform & Tools (Contractor)

Riot Games

Dublin, County Dublin, Ireland (On-Site)
4 Months ago
Meta - Software Engineer, Machine Learning

Meta

Redmond, Washington, United States (On-Site)
4 Months ago
Thales - Avionics Software Developer

Thales

Bengaluru, Karnataka, India (Hybrid)
6 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Yokne'am Illit, North District, Israel (On-Site)

Hyderabad, Telangana, India (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug