Senior Technical Program Manager - GPU Clusters

1 Week ago • 12 Years + • Operations • $192,000 PA - $304,750 PA

Job Summary

Job Description

NVIDIA's Hardware Infrastructure team seeks a Senior Technical Program Manager to lead the strategy and execution of programs supporting GPU infrastructure bringup, operations, and automation. This role involves partnering with internal and external teams to scale cluster operations, develop standardized methodologies, and meet challenging objectives. Responsibilities include shaping technical strategy, guiding engineering efforts using agile methodologies, utilizing data-driven approaches for program success, and creating effective communication channels. The ideal candidate will have extensive experience in software engineering, technical program management, large-scale HPC/AI infrastructure deployments, and excellent communication skills.
Must have:
  • 12+ years experience in software engineering/technical program management
  • Expertise in infrastructure software, production application development, large-scale distributed computing
  • Experience managing large-scale HPC and/or AI infrastructure deployments
  • Exceptional communication and presentation skills
  • Agile methodologies and project management tools knowledge
Good to have:
  • Experience bringing up new datacenter capacity across cloud service providers and on-premise locations
  • Experience migrating platforms and solutions from on-prem to cloud
  • Experience working with AI researchers and/or EDA developers
Perks:
  • Equity
  • Benefits

Job Details

Hardware Infrastructure is seeking a Senior Technical Program Manager to lead the strategy and execution of programs to support the bringup, operations and automation of GPU infrastructure. The GPU infrastructure we build and operate enables NVIDIAs most advanced AI and hardware researchers and engineers to create the future of computing. This is a fast paced and evolving landscape that requires a senior TPM leader to guide engineering roadmaps to be delivered with high quality outcomes and a strong foundation of operational excellence. They will partner both internally within Hardware Infrastructure and externally with senior management and partner teams to scale the clusters operations charter. They will develop and standardize planning, reporting and execution methodologies and metrics to enable meeting the challenging objectives.


What You'll Be Doing:

  • Engage with cross-company partners to shape the technical strategy, build programs and coordinate execution to meet key business objectives that support scaling bringups to be seamless, fast and efficient

  • Nurture a culture of continuous improvement, finding new opportunities across tooling, automation and processes to scale cluster operations and management

  • Guide a diverse set of engineering efforts in an agile program methodology across planning, prioritization, design, dependency management, implementation and execution.

  • Bring a data first approach to programs (metrics, OKRs, KPIs) to effectively measure program success and for identifying areas of improvement

  • Create effective communication channels to provide varying audience levels insights into program status, risks and opportunities.

  • Act as an effective technical and non-technical liaison between developers, customers and partners to drive organization alignment across a multi-functional matrixed set of leads


What We Need To See:

  • B.S. (or equivalent experience) in Computer Science or a related technical discipline

  • 12+ years of experience across software engineering and/or technical program management roles with demonstrated expertise and mastery of technical and management practices

  • Prior experience in infrastructure software, production application software development and large scale distributed computing

  • Experience managing large scale HPC and/or AI Infrastructure deployments that stretch across hardware and software

  • Exceptional communication and presentation skills for diverse technical and non-technical audiences

  • Strong multitasking abilities with a focus on thoroughness and rapid context switching

  • Knowledge of agile methodologies and the best in class project management tools

  • Proactive and enthusiastic in identifying and implementing positive changes in software engineering and release management within a fast-paced environment


Ways To Stand Out From The Crowd:

  • Prior experience bringing up new datacenter capacity across cloud service providers and on-premise locations

  • Prior experience migrating platforms and solutions from on prem to cloud

  • Prior experience in working with AI researchers and/or EDA developers

  • Software development, release and support methodology and devops


NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to tackle, that only we can pursue, and that matter to the world. This is our life’s work: to amplify human creativity and intelligence. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most brilliant and hardworking people in the world working for us. If you're creative, autonomous, and love a challenge, we want to hear from you. Come join our team and help build the real-time, efficient computing platform driving our success in this exciting and quickly growing field.

The base salary range is 192,000 USD - 304,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

The Walt Disney Company - Intern, Digital & Social Media Marketing, Disney+ & Studios

The Walt Disney Company

Singapore, Singapore (On-Site)
1 Month ago
Cirrus Logic - Product Engineer - Quality (GM-50023425)

Cirrus Logic

Edinburgh, Scotland, United Kingdom (Hybrid)
5 Months ago
Microsoft - Member of Technical Staff, AI Multimodal

Microsoft

London, England, United Kingdom (On-Site)
3 Days ago
Mozilla - Staff Machine Learning Engineer, Gen AI

Mozilla

Netherlands (Remote)
5 Months ago
Dream Sports - Senior ML Scientist

Dream Sports

Mumbai, Maharashtra, India (On-Site)
4 Months ago
Nintendo - Manager, Fraud Operations

Nintendo

Redmond, Washington, United States (Hybrid)
4 Days ago
ByteDance - BNPL Operations Manager - Global Payment

ByteDance

Singapore (On-Site)
1 Month ago
Wargaming - Tactical Sourcing Supervisor

Wargaming

Prague, Prague, Czechia (Hybrid)
1 Month ago
The Walt Disney Company - Experience Planning and Delivery Manager

The Walt Disney Company

Lake Buena Vista, Florida, United States (On-Site)
3 Days ago
The Walt Disney Company - Music Coordinator

The Walt Disney Company

Minato City, Tokyo, Japan (On-Site)
3 Days ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Scientific Games  - Package Assembly Technician II

Scientific Games

Solon, Ohio, United States (On-Site)
1 Month ago
ByteDance - Research Engineer- Foundation Model AI Platform- San Jose

ByteDance

San Jose, California, United States (On-Site)
4 Months ago
Push Gaming - Senior Game Developer

Push Gaming

Malta (Hybrid)
2 Days ago
The Walt Disney Company - Principal Software Engineer

The Walt Disney Company

New York, New York, United States (On-Site)
4 Months ago
NVIDIA - Senior Deep Learning Research Engineer, Advanced AI Systems

NVIDIA

Santa Clara, California, United States (On-Site)
2 Weeks ago
Voodoo - Product Lead - Paper.io 2

Voodoo

Barcelona, Catalonia, Spain (Remote)
1 Month ago
Push Gaming - Senior Game Developer

Push Gaming

(Hybrid)
2 Days ago
AI Fund - Senior Backend Engineer

AI Fund

Taipei City, Taiwan (Hybrid)
5 Months ago
Tesla - Senior Power Electronics Controls Engineer

Tesla

Baden-Württemberg, Germany (On-Site)
1 Month ago
Flow - Senior/Staff Web Engineer

Flow

Miami, Florida, United States (Hybrid)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Washington, United States

ByteDance - Tech Lead Machine Learning Engineer

ByteDance

Seattle, Washington, United States (On-Site)
3 Days ago
PENN Interactive - Senior Technical Product Manager, Data

PENN Interactive

Philadelphia, Pennsylvania, United States (Hybrid)
1 Month ago
Spellbrush - LLM Engineer

Spellbrush

San Francisco, California, United States (On-Site)
1 Week ago
ION - Senior Full-Stack Developer, New York

ION

New York, New York, United States (Hybrid)
5 Months ago
Tencent - Production Director

Tencent

Palo Alto, California, United States (On-Site)
4 Months ago
PlayStation Global - Staff Technical Program Manager

PlayStation Global

Carlsbad, California, United States (Hybrid)
2 Weeks ago
Gearbox Software - Senior Outsourcing Manager

Gearbox Software

Frisco, Texas, United States (On-Site)
3 Months ago
Illumination - Senior Campaign Manager, Marketing

Illumination

Santa Monica, California, United States (On-Site)
3 Weeks ago
The Walt Disney Company - Senior Software Engineer

The Walt Disney Company

New York, New York, United States (On-Site)
2 Months ago
ByteDance - Research Scientist in ML Systems

ByteDance

San Jose, California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Operations Jobs

ByteDance - Benefits Operations Specialist, APAC - Singapore

ByteDance

Singapore (On-Site)
4 Months ago
ION - Service Manager, Italy

ION

Italy (Hybrid)
5 Months ago
Sporty Group - ZA Risk, Payments and Operations Senior Associate

Sporty Group

South Africa (Remote)
1 Month ago
The Walt Disney Company - Entertainment Lead - 6-12 months contract

The Walt Disney Company

Hong Kong (On-Site)
4 Months ago
The Walt Disney Company - Merchandise Seller - Hercules and Lion King

The Walt Disney Company

London, England, United Kingdom (On-Site)
3 Days ago
 Vizrt - Program Management Officer (PMO)

Vizrt

Lisbon, Lisbon, Portugal (On-Site)
1 Day ago
ByteDance - Senior Payroll Analyst

ByteDance

Bangkok, Bangkok, Thailand (On-Site)
3 Days ago
Sphere Entertainment Co - Senior Manager Food & Merchandise Operations

Sphere Entertainment Co

Las Vegas, Nevada, United States (On-Site)
4 Days ago
Overwolf - Production & Ops Manager

Overwolf

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
3 Weeks ago
OKX - Specialist, Customer Due Diligence Operations (KYC)

OKX

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Yokne'am Illit, North District, Israel (On-Site)

Hyderabad, Telangana, India (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug