Senior Technical Program Manager - GPU Clusters

2 Months ago • 12 Years + • Operations • $192,000 PA - $304,750 PA

Job Summary

Job Description

NVIDIA's Hardware Infrastructure team seeks a Senior Technical Program Manager to lead the strategy and execution of programs supporting GPU infrastructure bringup, operations, and automation. This role involves partnering with internal and external stakeholders to scale cluster operations, develop standardized methodologies, and guide engineering efforts using agile methodologies. Responsibilities include shaping technical strategy, fostering continuous improvement, managing large-scale HPC and AI infrastructure deployments, and effectively communicating program status and risks. The ideal candidate possesses strong technical and management skills, experience with large-scale distributed computing, and a data-driven approach to program management.
Must have:
  • 12+ years exp in software engineering/technical program management
  • Experience with infrastructure software, production application development
  • Large-scale HPC/AI infrastructure deployment experience
  • Exceptional communication and presentation skills
  • Agile methodologies and project management tools knowledge
Good to have:
  • Experience bringing up new datacenter capacity (cloud/on-premise)
  • Experience migrating platforms from on-prem to cloud
  • Experience working with AI researchers/EDA developers
Perks:
  • Equity
  • Benefits

Job Details

Hardware Infrastructure is seeking a Senior Technical Program Manager to lead the strategy and execution of programs to support the bringup, operations and automation of GPU infrastructure. The GPU infrastructure we build and operate enables NVIDIAs most advanced AI and hardware researchers and engineers to create the future of computing. This is a fast paced and evolving landscape that requires a senior TPM leader to guide engineering roadmaps to be delivered with high quality outcomes and a strong foundation of operational excellence. They will partner both internally within Hardware Infrastructure and externally with senior management and partner teams to scale the clusters operations charter. They will develop and standardize planning, reporting and execution methodologies and metrics to enable meeting the challenging objectives.


What You'll Be Doing:

  • Engage with cross-company partners to shape the technical strategy, build programs and coordinate execution to meet key business objectives that support scaling bringups to be seamless, fast and efficient

  • Nurture a culture of continuous improvement, finding new opportunities across tooling, automation and processes to scale cluster operations and management

  • Guide a diverse set of engineering efforts in an agile program methodology across planning, prioritization, design, dependency management, implementation and execution.

  • Bring a data first approach to programs (metrics, OKRs, KPIs) to effectively measure program success and for identifying areas of improvement

  • Create effective communication channels to provide varying audience levels insights into program status, risks and opportunities.

  • Act as an effective technical and non-technical liaison between developers, customers and partners to drive organization alignment across a multi-functional matrixed set of leads


What We Need To See:

  • B.S. (or equivalent experience) in Computer Science or a related technical discipline

  • 12+ years of experience across software engineering and/or technical program management roles with demonstrated expertise and mastery of technical and management practices

  • Prior experience in infrastructure software, production application software development and large scale distributed computing

  • Experience managing large scale HPC and/or AI Infrastructure deployments that stretch across hardware and software

  • Exceptional communication and presentation skills for diverse technical and non-technical audiences

  • Strong multitasking abilities with a focus on thoroughness and rapid context switching

  • Knowledge of agile methodologies and the best in class project management tools

  • Proactive and enthusiastic in identifying and implementing positive changes in software engineering and release management within a fast-paced environment


Ways To Stand Out From The Crowd:

  • Prior experience bringing up new datacenter capacity across cloud service providers and on-premise locations

  • Prior experience migrating platforms and solutions from on prem to cloud

  • Prior experience in working with AI researchers and/or EDA developers

  • Software development, release and support methodology and devops


NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to tackle, that only we can pursue, and that matter to the world. This is our life’s work: to amplify human creativity and intelligence. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most brilliant and hardworking people in the world working for us. If you're creative, autonomous, and love a challenge, we want to hear from you. Come join our team and help build the real-time, efficient computing platform driving our success in this exciting and quickly growing field.

The base salary range is 192,000 USD - 304,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Conga - Sr. Software Engineer

Conga

Ahmedabad, Gujarat, India (On-Site)
1 Month ago
Riot Games - Senior Game Product Manager, Gameplay - Wild Rift Shanghai

Riot Games

Dublin, County Dublin, Ireland (On-Site)
6 Months ago
Addepar - Software Engineer - Data Platform

Addepar

Pune, Maharashtra, India (On-Site)
1 Month ago
ByteDance - Software Development Engineer (SDN Traffic Intelligence & Control)

ByteDance

San Jose, California, United States (On-Site)
1 Month ago
Google - Senior Software Engineer, Infrastructure, Core

Google

Sunnyvale, California, United States (On-Site)
5 Months ago
Aristocrat Gaming - GameOps Operator

Aristocrat Gaming

Montreal, Quebec, Canada (Hybrid)
1 Month ago
USE Insider - Solutions Architect - Mexico

USE Insider

Mexico City, Mexico City, Mexico (Hybrid)
7 Months ago
Evolution - Thai Speaking Game Presenter

Evolution

Birkirkara, Malta (On-Site)
10 Months ago
Hawk Eye Innovations - Match Operations Assistant

Hawk Eye Innovations

Stockholm, Stockholm County, Sweden (On-Site)
1 Month ago
Google - Global Vendor Operations Lead, Google Cloud

Google

Taguig, Metro Manila, Philippines (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Senior Software Engineer, Multi Cloud CDN - San Jose / Seattle / Boston

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
Netflix - International Tax Analyst, Tax Operations

Netflix

Los Gatos, California, United States (Hybrid)
1 Month ago
London stock Exchange - Director, Network Automation

London stock Exchange

Bengaluru, Karnataka, India (On-Site)
3 Weeks ago
Playtika - Creative Manager

Playtika

Israel (On-Site)
7 Months ago
FICO - Lead Software Engineer

FICO

Bengaluru, Karnataka, India (On-Site)
4 Weeks ago
Canva - Fullstack Software Engineer - Video Compositor

Canva

Adelaide, South Australia, Australia (Remote)
1 Month ago
Cognite - Senior Financial Data Analytics

Cognite

Bengaluru, Karnataka, India (Hybrid)
2 Months ago
ION - Technical Consultant - Endur

ION

Houston, Texas, United States (On-Site)
7 Months ago
Ramboll - Air Pollution Control Engineer

Ramboll

Arlington, Virginia, United States (On-Site)
1 Month ago
Rackspace Technology - Client Partner V

Rackspace Technology

United States (Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in California, United States

Google - Staff Software Developer, Google Cloud

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
Universal Music - Senior Credit Analyst

Universal Music

Los Angeles, California, United States (On-Site)
1 Month ago
Next Level Business Services - JavaScript Developer with Full stack Experience

Next Level Business Services

Dallas, Texas, United States (On-Site)
7 Months ago
The EW. Scripps Company - Assignment Editor

The EW. Scripps Company

Atlanta, Georgia, United States (On-Site)
3 Weeks ago
Daybreak Game Company LLC - Lead Software Engineer

Daybreak Game Company LLC

Renton, Washington, United States (Remote)
11 Months ago
NVIDIA - Senior Math Libraries Engineers - Python APIs

NVIDIA

Remote, Oregon, United States (Remote)
3 Months ago
Onehouse - Software Engineer

Onehouse

Sunnyvale, California, United States (Hybrid)
1 Year ago
Riot Games - Art Outsourcing II (Weapons Concept)

Riot Games

Los Angeles, California, United States (On-Site)
1 Month ago
Epic Games - Producer - Animation

Epic Games

Cary, North Carolina, United States (On-Site)
1 Month ago
prizepicks - Senior HR Business Partner

prizepicks

Atlanta, Georgia, United States (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Operations Jobs

The Walt Disney Company - Senior Dog Handler

The Walt Disney Company

Hong Kong (On-Site)
2 Months ago
Rank group - Team Leader

Rank group

Wednesbury, England, United Kingdom (On-Site)
6 Months ago
Tesla - Training Coordinator - Parts Operations

Tesla

Barcelona, Catalonia, Spain (On-Site)
3 Months ago
Tencent - Marketing Procurement Intern

Tencent

Amsterdam, North Holland, Netherlands (On-Site)
1 Month ago
Evolution - Studio Operations Manager

Evolution

Colombia (On-Site)
3 Months ago
Tesla - Operations Coordinator, Fleet - West

Tesla

North Holland, Netherlands (On-Site)
3 Months ago
1v1Me - Game Operations Associate (Esports)

1v1Me

Remote, Oregon, United States (Remote)
5 Months ago
Tesla - Contract Compliance Analyst

Tesla

North Holland, Netherlands (On-Site)
3 Months ago
Google - Partner Operations Manager, YouTube

Google

Singapore (On-Site)
1 Month ago
Evolution - iGaming Live Performer (Guest Service Agent Alternative)

Evolution

Atlantic City, New Jersey, United States (On-Site)
1 Year ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Pune, Maharashtra, India (On-Site)

Taipei City, Taiwan (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug