HPC Lab Manager

2 Months ago • 3 Years + • Administrative

Job Summary

Job Description

NVIDIA seeks a HPC Lab Manager to join its Networking Cloud Solutions team. The role involves planning and building complex HPC clusters and supercomputers in various data centers and labs. Responsibilities include rack stacking, cable management, ensuring power and cooling efficiency, daily data center operations, and troubleshooting hardware and software issues (network, cabling, bare metal, operating systems). The manager will also support research and development activities and work with scientific researchers, developers, and customers. The ideal candidate will possess strong Linux troubleshooting skills and experience in managing large data centers.
Must have:
  • MCSE or MCITP/CCNA
  • 3+ years lab management experience
  • Linux troubleshooting expertise
  • Knowledge of core services (DHCP, DNS, etc.)
  • Teamwork and service-oriented approach
Good to have:
  • Bash/Python scripting
  • Configuration management tools (Ansible, Puppet)
  • CI/CD & job schedulers (Jenkins, SLURM)
  • Virtualization (KVM, VMware, Hyper-V)
  • L2 & L3 network protocols

Job Details

NVIDIA is looking for a HPC Lab manager to join the networking cloud solutions HPC/AI Infrastructure team. We are focused on building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for a lab manager, be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Take part of building large-scale compute and Deep Learning software and hardware platforms, work together and support many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions.

What you will be doing:

  • Plan and build complex cluster and supercomputers in various of data center and labs

  • Rack stack and cable management to ensure efficient use of space and easy maintenance

  • Ensure data centers and labs power and cooling efficiency while optimizing rack space utilization

  • Data centers and labs daily operation and support

  • Installations for variety of infrastructure and solutions - Cloud, VMs, Storage, Network, HPC and AI

  • Perform troubleshooting - network, optic cabling, bare metal, operating system.

  • Support Research & Development activities

What we need to see:

  • MCSE or MCITP/CCNA certification

  • 3+ years of experience as lab manager

  • Experience in supporting large and complex data centers

  • Proven hands-on experience in Linux troubleshooting with good problem identification, resolution and solving skills.

  • In depth knowledge in Linux & Windows Core Services: DHCP, DNS, NIS, AD, etc.

  • Team Work, Service oriented, organized

Ways to stand out from the crowd:

  • Scripting experience in Bash and/or Python

  • Experience with configuration managements tools known in the community (e.g. Ansible, puppet)

  • CI & Known Job schedulers tools (e.g. Jenkins, SLURM)

  • Virtualization: KVM / VMware / Hyper-V

  • Experience with L2 & L3 network protocols

NVIDIA is widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Rackspace Technology - AWS Support Engineer IV

Rackspace Technology

Gurugram, Haryana, India (Remote)
2 Months ago
Zoox - Senior Software Engineer: Secure Embedded Operating Systems

Zoox

Foster City, California, United States (On-Site)
5 Months ago
Budge Studios - Build Master

Budge Studios

Quebec, Canada (Hybrid)
5 Days ago
Zoox - Test Engineer, Manufacturing Test & Diagnostics

Zoox

San Carlos, California, United States (On-Site)
5 Months ago
ByteDance - Security System Engineer

ByteDance

San Jose, California, United States (On-Site)
6 Days ago
Company3 Method Studios - Onsite Support Analyst I

Company3 Method Studios

New York, New York, United States (On-Site)
1 Month ago
InvenioLSI - SAP Basis Associate Consultant

InvenioLSI

Suva, Central Division, Fiji (On-Site)
6 Days ago
ION - Senior Technical Support Analyst, Jersey City - 7537

ION

Jersey City, New Jersey, United States (On-Site)
5 Months ago
Nintendo - Student Help Internal Communications (m/f/d)

Nintendo

Frankfurt, Hessen, Germany (On-Site)
4 Months ago
Nagarro - SAP Cloud Projectmanager (m/f/d)

Nagarro

Germany (Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Luxoft - DevOps Engineer with Azure

Luxoft

Pune, Maharashtra, India (On-Site)
3 Months ago
Sinch - Site Reliability Engineer

Sinch

France (Remote)
1 Week ago
Wargaming - DevOps Engineer

Wargaming

Belgrade, Serbia (On-Site)
3 Months ago
Velotio Technologies - Infrastructure Architect

Velotio Technologies

Pune, Maharashtra, India (Remote)
6 Days ago
PwC - ETIC, Cloud Infrastructure - Manager

PwC

Cairo, Cairo Governorate, Egypt (On-Site)
5 Months ago
ION - Cloud Engineer Kubernetes

ION

Milan, Lombardy, Italy (Hybrid)
5 Months ago
Wargaming - DevOps Engineer (Deployment team)

Wargaming

Nicosia, Nicosia, Cyprus (On-Site)
1 Week ago
Garena - Senior/Expert Site Reliability Engineer (SRE)

Garena

Singapore (On-Site)
2 Months ago
Rackspace Technology - Sr. GCP Cloud Engineer

Rackspace Technology

Canada (Remote)
5 Days ago
Interactive Brokers - Senior Systems Engineer- Microsoft M365/Active Directory

Interactive Brokers

Chicago, Illinois, United States (Hybrid)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Yokne'am Illit, North District, Israel

NVIDIA - Senior Software Research Architect

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
SuperPlay - Bookkeeper

SuperPlay

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
seeking alpha - Chinese Translation Reviewer

seeking alpha

Israel (Remote)
1 Month ago
NVIDIA - Senior Networking Security Research Architect

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
NVIDIA - Senior Manager, Interconnect Product Engineering

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
1 Month ago
PAPAYA - Corporate & IP Counsel

PAPAYA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
6 Days ago
Playtika - Senior FP&A Specialist

Playtika

Israel (On-Site)
5 Months ago
NVIDIA - Physical Design Engineer

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
2 Months ago
NVIDIA - Physical Design Full Chip STA Engineer

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
1 Month ago
NVIDIA - Senior System Product Engineer

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Administrative Jobs

Zones - Field Services Technician

Zones

California, United States (On-Site)
3 Months ago
Infernozilla - Executive Assistant

Infernozilla

(Remote)
9 Months ago
The Walt Disney Company - Bell Services Cast Member

The Walt Disney Company

Kapolei, Hawaii, United States (On-Site)
1 Week ago
Phantom FX - Prep Supervisor

Phantom FX

Chennai, Tamil Nadu, India (On-Site)
5 Months ago
Activision - Senior Manager, Payroll Operations

Activision

Shanghai, Shanghai, China (On-Site)
2 Weeks ago
Definitive Healthcare - IT Support Engineer

Definitive Healthcare

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Nintendo - Risk Management Specialist (m/f/d)

Nintendo

Frankfurt, Hessen, Germany (On-Site)
5 Months ago
IGT - Senior Internal Auditor, IT

IGT

Providence, Rhode Island, United States (On-Site)
4 Months ago
Nintendo - Senior Bilingual Communications Coordinator - Japanese

Nintendo

Redmond, Washington, United States (Hybrid)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Ra'anana, Center District, Israel (On-Site)

Ra'anana, Center District, Israel (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug