Senior Technical Program Manager - Infrastructure Capacity Management

3 Months ago • 10 Years + • Operations • Product Management • $192,000 PA - $304,750 PA

Job Summary

Job Description

NVIDIA's Hardware Infrastructure team seeks a Senior Technical Program Manager to lead the strategy and execution of capacity forecasting, planning, allocation, and management across internal clusters. This role involves working with multiple internal customer teams to build demand models for compute, storage, and network resources. The successful candidate will shape technical strategy, foster continuous improvement, and lead engineering efforts using agile methodologies. They will define strategies to increase resource efficiency, utilize data-driven approaches (metrics, OKRs, KPIs), and create effective communication channels. The role also requires strong cross-functional collaboration with developers, customers, and partners.
Must have:
  • 10+ years exp in software engineering/TPM
  • Experience forecasting and managing infrastructure resources
  • Lead programs across multiple teams (100+)
  • Manage large-scale HPC/AI infrastructure deployments
  • Exceptional communication and presentation skills
  • Agile methodologies & project management tools expertise
Good to have:
  • Experience with cloud service providers
  • Experience working with AI researchers/EDA developers
  • Software development, release, and DevOps knowledge
Perks:
  • Highly competitive salaries
  • Comprehensive benefits package

Job Details

Hardware Infrastructure is seeking a Senior Technical Program Manager to lead the strategy and execution of programs to support capacity forecasting, planning, allocation and management across our internal clusters. The GPU infrastructure we build and operate enables NVIDIA's most advanced AI and hardware researchers and engineers to create the future of computing. The scope of the capacity management work spans across compute, storage and network to ensure we have infrastructure that is functional, performant and reliable. This is a fast paced and evolving landscape that requires a senior TPM leader to guide engineering roadmaps to be delivered with high quality outcomes and a strong foundation of operational excellence. They will partner both internally within Hardware Infrastructure and externally with senior management and partner teams to scale the capacity management lifecycle. They will develop and standardize planning, reporting and execution methodologies and metrics to enable meeting the challenging objectives.

What You'll Be Doing:

  • Work across multiple internal customer teams to build robust demand models that accurately provide a comprehensive picture of capacity requirements across compute, storage and network

  • Assist and play a key role in shaping the technical strategy and execution for how our internal serving platform meets internal customer needs

  • Nurture a culture of continuous improvement, finding new opportunities across tooling, automation and processes to scale overall capacity management

  • Take lead in defining strategies that will help increase the efficiency and utilization of resources across internal clusters to minimize capacity waste

  • Guide a diverse set of engineering efforts in an agile program methodology across planning, prioritization, design, dependency management, implementation and execution.

  • Bring a data-first approach to programs (metrics, OKRs, KPIs) to measure program success and for identifying areas of improvement

  • Create effective communication channels to provide varying audience levels insights into program status, risks and opportunities.

  • Act as an effective technical and non-technical liaison between developers, customers and partners to drive organization alignment across a multi-functional matrixed set of leads

What We Need To See:

  • B.S. (or equivalent experience) in Computer Science or a related technical field

  • 10+ years of experience across software engineering and/or technical program management roles with demonstrated expertise and mastery of technical and management practices

  • Prior experience developing process and tools to forecast, allocate and manage infrastructure resources across a diverse and large portfolio ($billions)

  • Prior experience leading programs that span across multiple teams and engineers (100+)

  • Experience managing large scale HPC and/or AI Infrastructure deployments that stretch across hardware and software

  • Exceptional communication and presentation skills for diverse technical and non-technical audiences

  • Strong multitasking abilities with a focus on thoroughness and rapid context switching

  • Knowledge of agile methodologies and the best in class project management tools

  • Proactive and enthusiastic in identifying and implementing positive changes in software engineering and release management within a fast-paced environment

Ways To Stand Out From The Crowd:

  • Prior experience bringing up new datacenter capacity across cloud service providers and on-premise locations

  • Prior background in working with AI researchers and/or EDA developers

  • Software development, release and support methodology and devops

NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most forward-thinking and hardworking people in the world on our team and our collaborative talent continues to drive NVIDIA's growth. We are seeking creative and independent engineers with real passion for technology!

The base salary range is 192,000 USD - 304,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Behaviour Interactive - Principal UI/UX Designer

Behaviour Interactive

Montreal, Quebec, Canada (Hybrid)
2 Months ago
Meta - Product Technical Program Manager

Meta

Seattle, Washington, United States (Remote)
6 Months ago
PwC - AI Engineer (Freelance)

PwC

Warsaw, Masovian Voivodeship, Poland (Remote)
4 Months ago
Playtika - Director Of Monetization, VIP & CS

Playtika

Israel (On-Site)
4 Months ago
Electronic Arts - Producer - EA Sports FC

Electronic Arts

Bucharest, Bucharest, Romania (On-Site)
8 Months ago
ByteDance - Privacy Security Operation - Information System - Singapore

ByteDance

Singapore (On-Site)
4 Months ago
Tencent - Game Operations Manager

Tencent

Shenzhen, Guangdong Province, China (On-Site)
4 Months ago
Fanatics - Warehouse Associate III

Fanatics

Norman, Oklahoma, United States (On-Site)
6 Months ago
SSC Technologies - Operations Engineer - Senior Associate

SSC Technologies

Navi Mumbai, Maharashtra, India (On-Site)
7 Months ago
Electronic Arts - Senior Director, Strategic Partner Operations

Electronic Arts

Orlando, Florida, United States (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Sinch - Junior Front Office Manager

Sinch

Stockholm, Stockholm County, Sweden (Hybrid)
2 Months ago
N-iX - Trainee Data Engineer

N-iX

Kyiv, Kyiv City, Ukraine (Flexible)
3 Months ago
Paypal - MTS 1, Software Engineer

Paypal

Scottsdale, Arizona, United States (Hybrid)
6 Months ago
Salesforce - Data Cloud Account Executive

Salesforce

London, England, United Kingdom (On-Site)
6 Months ago
PwC - Client Relationship Associate (Mandarin Speaker)

PwC

Qormi, Malta (On-Site)
7 Months ago
Riot Games - Senior Game Product Manager, Gameplay - Wild Rift Shanghai

Riot Games

Dublin, County Dublin, Ireland (On-Site)
5 Months ago
SOFTGAMES - Technical Artist (AI Focus) - Fully Remote

SOFTGAMES

Berlin, Berlin, Germany (Remote)
6 Months ago
ION - Technical Consultant - Endur

ION

Jersey City, New Jersey, United States (On-Site)
6 Months ago
Haptic - Lead Engine Software Engineer

Haptic

Dallas, Texas, United States (Remote)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

CharacterAI - Platform Engineer, Frontend

CharacterAI

Menlo Park, California, United States (On-Site)
4 Months ago
Epic Games - Senior Gameplay Systems Programmer, Unreal Engine

Epic Games

Cary, North Carolina, United States (On-Site)
2 Months ago
Meta - Data Science Director

Meta

New York, New York, United States (Remote)
5 Months ago
Epic Games - Senior Gameplay Systems Developer, Developer Relations

Epic Games

Cary, North Carolina, United States (On-Site)
4 Months ago
Lionsgate Games - Coordinator, Global Paid Media & Analytics

Lionsgate Games

Santa Monica, California, United States (On-Site)
2 Months ago
Sunblink - Technical Design Intern

Sunblink

Boulder, Colorado, United States (On-Site)
1 Month ago
The Walt Disney Company - Principal Product Manager, ML Platform

The Walt Disney Company

San Francisco, California, United States (On-Site)
6 Months ago
ByteDance - Tech Lead - Global E-Commerce Supply Chain

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
The Walt Disney Company - Sr Software Engineer (Rust Developer)

The Walt Disney Company

Glendale, California, United States (On-Site)
6 Months ago
Varonis  - Product Security GRC

Varonis

Morrisville, North Carolina, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Operations Jobs

Keywords Studios (Player Support) - Player Engagement Coordinator

Keywords Studios (Player Support)

Silesian Voivodeship, Poland (Hybrid)
2 Months ago
Inkittt - VP of Operations

Inkittt

San Francisco, California, United States (Hybrid)
5 Months ago
ByteDance - Head of Customer Success & Presales, Lark APAC

ByteDance

Singapore (On-Site)
6 Months ago
Netflix - Manager, Sustainable Production, EMEA

Netflix

London, England, United Kingdom (On-Site)
4 Months ago
Keywords Studios (Player Support) - Regional Service Delivery Manager

Keywords Studios (Player Support)

Katowice, Silesian Voivodeship, Poland (On-Site)
4 Months ago
ByteDance - Global E-Commerce - Partner Operations Manager

ByteDance

Taguig, Metro Manila, Philippines (On-Site)
2 Months ago
Tesla - Service Advisor

Tesla

Dornbirn, Vorarlberg, Austria (On-Site)
2 Months ago
Saaswika Solutions   - Well Completions Engineer

Saaswika Solutions

Barmer, Rajasthan, India (On-Site)
6 Months ago
The Walt Disney Company - Insights-Optimization & Supply Chain Planning Manager

The Walt Disney Company

Bay Lake, Florida, United States (On-Site)
2 Months ago
Rackspace Technology - Citrix Technical Manager

Rackspace Technology

United States (Remote)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug