HPC Lab Manager

4 Months ago • 3 Years + • Administrative

Job Summary

Job Description

NVIDIA seeks a HPC Lab Manager to join its Networking Cloud Solutions team. The role involves planning and building complex HPC clusters and supercomputers in various data centers and labs. Responsibilities include rack stacking, cable management, ensuring power and cooling efficiency, daily data center operations, and troubleshooting hardware and software issues (network, cabling, bare metal, operating systems). The manager will also support research and development activities and work with scientific researchers, developers, and customers. The ideal candidate will possess strong Linux troubleshooting skills and experience in managing large data centers.
Must have:
  • MCSE or MCITP/CCNA
  • 3+ years lab management experience
  • Linux troubleshooting expertise
  • Knowledge of core services (DHCP, DNS, etc.)
  • Teamwork and service-oriented approach
Good to have:
  • Bash/Python scripting
  • Configuration management tools (Ansible, Puppet)
  • CI/CD & job schedulers (Jenkins, SLURM)
  • Virtualization (KVM, VMware, Hyper-V)
  • L2 & L3 network protocols

Job Details

NVIDIA is looking for a HPC Lab manager to join the networking cloud solutions HPC/AI Infrastructure team. We are focused on building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for a lab manager, be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Take part of building large-scale compute and Deep Learning software and hardware platforms, work together and support many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions.

What you will be doing:

  • Plan and build complex cluster and supercomputers in various of data center and labs

  • Rack stack and cable management to ensure efficient use of space and easy maintenance

  • Ensure data centers and labs power and cooling efficiency while optimizing rack space utilization

  • Data centers and labs daily operation and support

  • Installations for variety of infrastructure and solutions - Cloud, VMs, Storage, Network, HPC and AI

  • Perform troubleshooting - network, optic cabling, bare metal, operating system.

  • Support Research & Development activities

What we need to see:

  • MCSE or MCITP/CCNA certification

  • 3+ years of experience as lab manager

  • Experience in supporting large and complex data centers

  • Proven hands-on experience in Linux troubleshooting with good problem identification, resolution and solving skills.

  • In depth knowledge in Linux & Windows Core Services: DHCP, DNS, NIS, AD, etc.

  • Team Work, Service oriented, organized

Ways to stand out from the crowd:

  • Scripting experience in Bash and/or Python

  • Experience with configuration managements tools known in the community (e.g. Ansible, puppet)

  • CI & Known Job schedulers tools (e.g. Jenkins, SLURM)

  • Virtualization: KVM / VMware / Hyper-V

  • Experience with L2 & L3 network protocols

NVIDIA is widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Jane Street - Data Center Engineer

Jane Street

New York, New York, United States (On-Site)
1 Month ago
Revolgy - Senior Cloud Operations Engineer

Revolgy

United Kingdom (Remote)
2 Months ago
Zazz - Cloud Engineer (Azure)

Zazz

(Remote)
3 Months ago
Ness Digital - DevOps Specialist – CI/CD

Ness Digital

Prague, Czechia (On-Site)
2 Months ago
Palo Alto Networks - Senior Staff DevOps Engineer (Prisma SaaS) - NetSec

Palo Alto Networks

Gurugram, Haryana, India (On-Site)
2 Months ago
Anthology  Inc  - Global Support Specialist

Anthology Inc

Bogotá, Bogota, Colombia (Remote)
2 Months ago
HP - US Payroll Customer support

HP

Tlaquepaque, Jalisco, Mexico (On-Site)
8 Months ago
Morning Star - Senior ServiceNow Engineer

Morning Star

Mumbai, Maharashtra, India (Hybrid)
8 Months ago
Gaming Innovation Group  - Infrastructure Engineer

Gaming Innovation Group

Andalusia, Spain (Hybrid)
2 Months ago
Anavation - Atlassian Subject Matter Expert

Anavation

San Antonio, Texas, United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Wizcorp - Game Server Programmer

Wizcorp

Tokyo, Japan (Remote)
2 Months ago
Trailmix Games - Senior DevOps Engineer

Trailmix Games

London, England, United Kingdom (Hybrid)
2 Months ago
Cubic Corporation - System Administrator

Cubic Corporation

Hyderabad, Telangana, India (On-Site)
1 Month ago
DEVOTEAM - Distributed Cloud | AWS DevOps Engineer

DEVOTEAM

Lisbon, Lisbon, Portugal (Remote)
7 Months ago
Ajmera Infotech - Kubernetes Experts

Ajmera Infotech

Bengaluru, Karnataka, India (On-Site)
11 Months ago
Ettain Group - Linux Support Engineer

Ettain Group

Addison, Texas, United States (On-Site)
9 Years ago
FICO - DevOps Engineering-Sr Engineer

FICO

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Google - Systems Development Engineer, Edge Infrastructure Operations

Google

Dublin, County Dublin, Ireland (On-Site)
1 Month ago
Super - Software Engineering Intern - Security

Super

(Remote)
1 Month ago
Rockstar Games - Technical Artist: DevOps

Rockstar Games

London, England, United Kingdom (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Yokne'am Illit, North District, Israel

NVIDIA - Senior Networking Electrical Validation Engineer

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
2 Months ago
Zynga - Product Designer

Zynga

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
NVIDIA - SDK Ethernet Software Team Manager

NVIDIA

Ra'anana, Center District, Israel (On-Site)
4 Months ago
Playtika - Marketing Systems Analyst

Playtika

Israel (On-Site)
6 Months ago
Playtika - Experienced Data Scientist

Playtika

Israel (On-Site)
4 Months ago
Google - Technical Account Manager, Play Partnerships, Global Business Consulting

Google

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
NVIDIA - Senior Software Architect, Accelerated Computing SDN

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
3 Months ago
Playtika - Spine Animator

Playtika

Israel (On-Site)
7 Months ago
Google - Software Engineer III, Control Plane, Google Cloud

Google

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
Playtika - Influencer Marketing & Content Manager

Playtika

Israel (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Administrative Jobs

Scientific Games  - Data Center Technician II

Scientific Games

Middletown, Pennsylvania, United States (On-Site)
3 Months ago
Scientific Games  - Field Service Technician I

Scientific Games

Arizona, United States (On-Site)
2 Months ago
The Walt Disney Company - Production Supervisor

The Walt Disney Company

Toronto, Ontario, Canada (On-Site)
1 Month ago
Brillio - Salesforce CPQ Architect - R01525561

Brillio

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
Scopely - Global Sr. Payroll Specialist

Scopely

Barcelona, Catalonia, Spain (Hybrid)
2 Months ago
Tencent - Senior IT Operations Engineer

Tencent

Los Angeles, California, United States (On-Site)
2 Months ago
Google - Administrative Business Partner, Google Cloud

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Month ago
AGS - American Gaming Systems - Field Service Technician I

AGS - American Gaming Systems

Manchester, New Hampshire, United States (On-Site)
1 Month ago
Social Discovery Group - Payroll Accountant

Social Discovery Group

Latvia (Remote)
7 Months ago
Evolution - Video Engineer

Evolution

Cebu City, Central Visayas, Philippines (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Pune, Maharashtra, India (On-Site)

Taipei City, Taiwan (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug