HPC Lab Manager

1 Month ago • 3 Years + • Administrative

Job Summary

Job Description

NVIDIA seeks a HPC Lab Manager to join its Networking Cloud Solutions team. The role involves planning and building complex HPC clusters and supercomputers in various data centers and labs. Responsibilities include rack stacking, cable management, ensuring power and cooling efficiency, daily data center operations, and troubleshooting hardware and software issues (network, cabling, bare metal, operating systems). The manager will also support research and development activities and work with scientific researchers, developers, and customers. The ideal candidate will possess strong Linux troubleshooting skills and experience in managing large data centers.
Must have:
  • MCSE or MCITP/CCNA
  • 3+ years lab management experience
  • Linux troubleshooting expertise
  • Knowledge of core services (DHCP, DNS, etc.)
  • Teamwork and service-oriented approach
Good to have:
  • Bash/Python scripting
  • Configuration management tools (Ansible, Puppet)
  • CI/CD & job schedulers (Jenkins, SLURM)
  • Virtualization (KVM, VMware, Hyper-V)
  • L2 & L3 network protocols

Job Details

NVIDIA is looking for a HPC Lab manager to join the networking cloud solutions HPC/AI Infrastructure team. We are focused on building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for a lab manager, be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Take part of building large-scale compute and Deep Learning software and hardware platforms, work together and support many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions.

What you will be doing:

  • Plan and build complex cluster and supercomputers in various of data center and labs

  • Rack stack and cable management to ensure efficient use of space and easy maintenance

  • Ensure data centers and labs power and cooling efficiency while optimizing rack space utilization

  • Data centers and labs daily operation and support

  • Installations for variety of infrastructure and solutions - Cloud, VMs, Storage, Network, HPC and AI

  • Perform troubleshooting - network, optic cabling, bare metal, operating system.

  • Support Research & Development activities

What we need to see:

  • MCSE or MCITP/CCNA certification

  • 3+ years of experience as lab manager

  • Experience in supporting large and complex data centers

  • Proven hands-on experience in Linux troubleshooting with good problem identification, resolution and solving skills.

  • In depth knowledge in Linux & Windows Core Services: DHCP, DNS, NIS, AD, etc.

  • Team Work, Service oriented, organized

Ways to stand out from the crowd:

  • Scripting experience in Bash and/or Python

  • Experience with configuration managements tools known in the community (e.g. Ansible, puppet)

  • CI & Known Job schedulers tools (e.g. Jenkins, SLURM)

  • Virtualization: KVM / VMware / Hyper-V

  • Experience with L2 & L3 network protocols

NVIDIA is widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Google - CPU Design Verification Engineer, Google Cloud

Google

(On-Site)
2 Months ago
Thatgamecompany - Senior Backend Engineer - China

Thatgamecompany

Shanghai, Shanghai, China (On-Site)
5 Months ago
CommerceIQ - DevOps Engineer-III

CommerceIQ

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Omind - Senior DevOps Engineer

Omind

Bengaluru, Karnataka, India (On-Site)
4 Months ago
IGT - Security Architect

IGT

London, England, United Kingdom (On-Site)
2 Months ago
Sphere Entertainment Co - Uniform Attendant

Sphere Entertainment Co

Las Vegas, Nevada, United States (On-Site)
4 Weeks ago
Enphase Energy - Sr. Engineer - Oracle APEX Developer

Enphase Energy

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Notion - Enterprise Technical Support, German, EMEA

Notion

Dublin, County Dublin, Ireland (On-Site)
4 Months ago
PwC - I.行政類(新竹)-專員(審計助理)

PwC

Hsinchu, Hsinchu City, Taiwan (On-Site)
4 Months ago
Rockstar Games - Admin Assistant (Night Shift)

Rockstar Games

Bengaluru, Karnataka, India (On-Site)
4 Days ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Gaming Innovation Group  - System Administrator

Gaming Innovation Group

Sliema, Malta (Hybrid)
2 Weeks ago
Riot Games - Principal Software Engineer - VALORANT, Foundations, Build Platforms

Riot Games

Los Angeles, California, United States (On-Site)
5 Months ago
Google - Senior Cyber Security Consultant, Google Public Sector

Google

Reston, Virginia, United States (On-Site)
1 Month ago
Salt AI - Sr. QA Automation Engineer

Salt AI

Los Angeles, California, United States (Remote)
7 Months ago
NOVOMATIC - QA Engineer (Embedded Systems)

NOVOMATIC

Lesser Poland Voivodeship, Poland (Hybrid)
1 Day ago
NVIDIA - Senior Software Engineer, Linux Kernel Upstream

NVIDIA

Ra'anana, Center District, Israel (Hybrid)
3 Weeks ago
SuperPlay - DEVOPS ENGINEER

SuperPlay

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
3 Months ago
NVIDIA - System Linux Administrator

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
1 Month ago
Pattern® - Senior Site Reliability Engineer

Pattern®

Pune, Maharashtra, India (On-Site)
5 Months ago
Grid Dynamics - DevOps Engineer

Grid Dynamics

Tamil Nadu, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Yokne'am Illit, North District, Israel

SuperPlay - Senior 2D Illustrator

SuperPlay

Tel Aviv District, Israel (On-Site)
2 Months ago
NVIDIA - Firmware Manager

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
1 Month ago
Unity - Senior ML Engineer

Unity

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
3 Months ago
Varonis  - Data Engineer

Varonis

Herzliya, Tel Aviv District, Israel (Hybrid)
1 Month ago
Playtika - Technical Product Manager

Playtika

Israel (On-Site)
2 Months ago
Intel Corporation - SW Embedded (Kernel) Team Leader

Intel Corporation

Haifa District, Israel (Hybrid)
1 Month ago
Ludeo - Senior Back End Developer

Ludeo

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
6 Days ago
SuperPlay - Community & Social Media Manager

SuperPlay

Tel Aviv District, Israel (On-Site)
1 Month ago
PAPAYA - Monetization Illustrator

PAPAYA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
SuperPlay - Bookkeeper

SuperPlay

Tel Aviv District, Israel (On-Site)
5 Days ago

Get notifed when new similar jobs are uploaded

Administrative Jobs

Penumbra - IT Systems Engineer, Operations

Penumbra

Alameda, California, United States (On-Site)
4 Months ago
Dream Games - Senior IT/AV Specialist

Dream Games

London, England, United Kingdom (On-Site)
6 Days ago
Luxoft - Senior Linux Python Developer

Luxoft

(Remote)
3 Months ago
Rockstar Games - NOC Supervisor

Rockstar Games

India (On-Site)
1 Month ago
PhonePe - SRE - Systems

PhonePe

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Axon - Customer Service Representative (Onsite)

Axon

Scottsdale, Arizona, United States (On-Site)
3 Days ago
Next Level Business Services - IBM Tivoli Administrator

Next Level Business Services

Florence, Kentucky, United States (On-Site)
4 Months ago
SmileGate - Group Personal Information Protection Manager

SmileGate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
1 Month ago
Anthology  Inc  - Implementation Consultant - SIS

Anthology Inc

Colombia (Remote)
1 Month ago
Sinch - Technical Specialist (VSS and OCS)

Sinch

Uttar Pradesh, India (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Yokne'am Illit, North District, Israel (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

United States (Remote)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Bengaluru, Karnataka, India (Hybrid)

Bengaluru, Karnataka, India (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug