Jobs Courses Resources Companies Placements

Home >

Jobs >

HPC Operations Manager - Hardware Engineering

NVIDIA

Texas, United States (On-site)

HPC Operations Manager - Hardware Engineering

6 Months ago • 15 Years + • Software Development & Engineering • $272,000 PA - $425,500 PA

Job Summary

Job Description

NVIDIA seeks a highly motivated HPC Operations Manager to lead and mentor a multinational team in managing and evolving its global HPC clusters. Responsibilities include ensuring high reliability, developing critical metrics, identifying and resolving failures, evaluating new technologies, planning hardware deployments, collaborating with hardware engineering teams, managing the HPC scheduler (LSF), tracking software licenses, and communicating program status to senior management. The ideal candidate will have extensive experience in IT infrastructure management, Linux server administration, HPC schedulers, and hardware design workflows.

Must have:

15+ years overall experience
5+ years managing IT infra teams
10+ years running Linux servers
HPC schedulers (IBM LSF preferred)
Knowledge of hardware design workflows

Good to have:

HPC storage (Netapp, Pure Storage, etc.)
Infiniband (operations, debugging)
Software development (DevOps)
Relational databases, data lakes
FlexLM-based software license servers

Perks:

Equity
Benefits

5 skills required

5 skills required for this role

Add these skills to join the top 1% applicants for this job

problem-solving

game-texts

networking

linux

css

Job Details

Widely considered to be one of the technology world’s most desirable employers, NVIDIA is an industry leader with groundbreaking developments in High-Performance Computing, Artificial Intelligence and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables outstanding creativity and discovery and powers what were once science fiction inventions from artificial intelligence to autonomous cars. We are now looking for a highly motivated HPC Operations Manager to join this multifaceted and innovative infrastructure team to craft global and dynamic HPC clusters used by Nvidia’s hardware design teams. We are looking for leaders to help us grow and evolve a reliable computing environment to enable our hardware designers to build the next generation of GPUs and SOCs.

What You'll be Doing:

A huge part of the day-to-day job is collaborating with partners to develop programs driving around storage, networking, and compute in our growing fleet of data centers.
Lead, cultivate, and mentor a multi-national team of sysadmins and devops engineers, in support of the chip design teams
Ensure the highest reliability of HPC clusters. Develop critical metrics, program schedules to measure program health, predictability, and achievements
Identify failures, lead retrospective analysis, and help to develop improvement action plans. Build standard methodologies that cut through complexity and can be used across Nvidia and influence other partners for continuous improvement
Evaluate the latest technologies (hardware and cloud computing) and recommend future evolution of the infrastructure. Plan deployments and refresh of hardware (compute, storage, network equipment), and associated software stack (e.g. OS)
Work multi-functionally with hardware engineering leaders to support their future chip design needs, understand their workflow characteristics, and engineer an efficient HPC environment. Work with IT and engineering infrastructure teams on the different subsystems that comprise the computing environment.
Lead all aspects of the HPC scheduler (LSF), set/adjust policy, ensure delivery of forecasted compute demand to each hardware division, and drive high utilization.
Track software licensing servers and drive efficient license utilization
Develop and manage program schedules, milestones and deliverables. Adjust in the face of a highly fluid customer product roadmap.
Regularly communicate program status and key issues to senior management at NVIDIA’s headquarters. Accurately represent the importance of issues and call out issues appropriately. Be the evangelist of data driven project management

What We Need to See:

B.S. or M.S. in Computer Science, Computer Engineering, Information Science (or equivalent experience)
15+ years overall
5+ years managing IT infrastructure teams of 10+ people
10+ years experience running Linux servers, NFS storage, and Ethernet networks
Knowledge of HPC schedulers (IBM LSF preferred)
Knowledge of hardware design workflows (EDA tools and methodology)
Experience using project management and capacity planning software
Datacenter operations (rack and stack, maintenance)

Ways to stand out from the crowd:

HPC storage (e.g. Netapp, Pure Storage, Lustre, ZFS, Isilon)
Infiniband (operations, debugging, performance tuning)
Software development, especially in a devops context
Knowledge of relational databases, data lakes, metrics/visualization/analytics platforms
Deploying and maintaining FlexLM-based software license servers
Established relationships with enterprise-level equipment suppliers

The base salary range is 272,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

GDB developer

luxsoft

Poland (Remote)

• 3 Months ago

Senior DevOps Engineer

Polygon Labs

United States (Remote)

• 4 Months ago

Auditor Externo | Sênior Associate 2 [tag02]

PwC

Curitiba, State Of Paraná, Brazil (On-Site)

• 11 Months ago

Junior QA Engineer - Computer Vision

Sword Health

Porto, Porto District, Portugal (Remote)

• 3 Months ago

IT Support Administrator

plarium

Kharkiv, Kharkiv Oblast, Ukraine (On-Site)

• 1 Month ago

Software Development Engineer

Divensi

Redmond, Washington, United States (On-Site)

• 8 Years ago

Member of Technical Staff – Web Foundations Lead

Microsoft

Mountain View, California, United States (Hybrid)

• 5 Months ago

Senior Software Engineer, Italy

Ion

Milan, Lombardy, Italy (On-Site)

• 10 Months ago

IT Operations Engineer

Jane Street

London, England, United Kingdom (On-Site)

• 3 Months ago

Principal Software Engineer

Rovio

Stockholm, Stockholm County, Sweden (On-Site)

• 3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Senior Online Programmer - Unannounced IP | Programmeur·euse Senior·e en ligne - Projet non annoncé

Behaviour Interactive

Montreal, Quebec, Canada (Hybrid)

• 9 Months ago

Designated Support Engineer

Nice

Manila, Metro Manila, Philippines (Hybrid)

• 1 Month ago

Security & Fraud Prevention Case Manager

Interactive Brokers

Mumbai, Maharashtra, India (Hybrid)

• 1 Month ago

ASIC Verification Engineer - GPU

NVIDIA

Santa Clara, California, United States (Hybrid)

• 4 Months ago

GPU Compute Gaming Compiler Engineer - Sr Staff Manager

Qualcomm

Bengaluru, Karnataka, India (On-Site)

• 2 Months ago

Senior Rendering Engineer

Kabam

Vancouver, British Columbia, Canada (Hybrid)

• 4 Months ago

Engineer, Staff -Devops

Qualcomm

Hyderabad, Telangana, India (On-Site)

• 4 Months ago

Junior/Mid FX Artist

Milk visual effects

(On-Site)

• 9 Months ago

Site Reliability Engineer

Perplexity

San Francisco, California, United States (On-Site)

• 3 Months ago

Technical Artist

Deep Silver Fishlabs

Hamburg, Hamburg, Germany (Hybrid)

• 2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Austin, TX, USA

UI/UX Designer

Joyride Games

Austin, Texas, United States (Remote)

• 1 Year ago

Machine Learning Scientist, Scaling AI for Biology

bytedance

Seattle, Washington, United States (On-Site)

• 10 Months ago

Lead Product Design Engineer

Backbone

Atherton, California, United States (On-Site)

• 1 Year ago

Senior/Staff Software Engineer - Simulation Data Platform

zoox

Foster City, California, United States (Hybrid)

• 10 Months ago

Senior ML/AI Engineer

Yahoo

United States (Hybrid)

• 2 Months ago

Account Executive

Snap Mobile INC

Rochester, Michigan, United States (On-Site)

• 4 Months ago

Silicon Architect, Camera Hardware

Apple

Cupertino, California, United States (On-Site)

• 2 Months ago

Materials Manager 2nd Shift

Fox Factory

Gainesville, Georgia, United States (On-Site)

• 2 Months ago

Student Researcher (Doubao (Seed) - Foundation Model - Vision Generative AI)

bytedance

San Jose, California, United States (On-Site)

• 5 Months ago

Java Developer

Next Level Business Services

San Jose, California, United States (On-Site)

• 10 Months ago

Get notifed when new similar jobs are uploaded

Software Development & Engineering Jobs

Sustaining Engineering Manager - Connected Devices

GoMotive

Buffalo, New York, United States (On-Site)

• 4 Months ago

Signal Engineer IV

WebTech Corporation

Wayne, Pennsylvania, United States (On-Site)

• 3 Months ago

Senior Software Engineer

AGS - American Gaming Systems

Israel (On-Site)

• 5 Months ago

Senior Software Engineer, Italy

Ion

Rome, Lazio, Italy (On-Site)

• 10 Months ago

PCB Layout Engineer

Capgemini

Pune, Maharashtra, India (On-Site)

• 3 Months ago

Senior Voice Engineer / Senior VoIP Engineer

Qualcomm

Hyderabad, Telangana, India (On-Site)

• 3 Months ago

Lead Module Design Engineer

Coherent corp.

Fremont, California, United States (On-Site)

• 4 Months ago

Lead PCB Engineer

Apexon

Houston, Texas, United States (On-Site)

• 2 Months ago

Engineer - FOSS (Free and Open-Source Software)

Qualcomm

Hyderabad, Telangana, India (On-Site)

• 3 Months ago

Data Engineering Intern

Tencent

(On-Site)

• 6 Months ago

Get notifed when new similar jobs are uploaded

About The Company

NVIDIA

76 Active Jobs

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

HPC Operations Manager - Hardware Engineering

Job Summary

Job Description

5 skills required

5 skills required for this role

Job Details

Similar Jobs

GDB developer

Senior DevOps Engineer

Auditor Externo | Sênior Associate 2 [tag02]

Junior QA Engineer - Computer Vision

IT Support Administrator

Software Development Engineer

Member of Technical Staff – Web Foundations Lead

Senior Software Engineer, Italy

IT Operations Engineer

Principal Software Engineer

Similar Skill Jobs

Senior Online Programmer - Unannounced IP | Programmeur·euse Senior·e en ligne - Projet non annoncé

Designated Support Engineer

Security & Fraud Prevention Case Manager

ASIC Verification Engineer - GPU

GPU Compute Gaming Compiler Engineer - Sr Staff Manager

Senior Rendering Engineer

Engineer, Staff -Devops

Junior/Mid FX Artist

Site Reliability Engineer

Technical Artist

Jobs in Austin, TX, USA

UI/UX Designer

Machine Learning Scientist, Scaling AI for Biology

Lead Product Design Engineer

Senior/Staff Software Engineer - Simulation Data Platform

Senior ML/AI Engineer

Account Executive

Silicon Architect, Camera Hardware

Materials Manager 2nd Shift

Student Researcher (Doubao (Seed) - Foundation Model - Vision Generative AI)

Java Developer

Software Development & Engineering Jobs

Sustaining Engineering Manager - Connected Devices

Signal Engineer IV

Senior Software Engineer

Senior Software Engineer, Italy

PCB Layout Engineer

Senior Voice Engineer / Senior VoIP Engineer

Lead Module Design Engineer

Lead PCB Engineer

Engineer - FOSS (Free and Open-Source Software)

Data Engineering Intern

About The Company

System Design Power Validation Engineer

OEM Account Manager

System Debug Lead Engineer

Network Site Reliability Engineer

ASIC Engineer

Senior ASIC Design Engineer

Physical Design CAD Team Manager

Engineering Farm Engineer

Senior Mixed Signal Design Verification Engineer

Senior Solutions Architect, Cloud Infrastructure and DevOps

Level Up Your Career in Game Development!