Jobs Courses Resources Companies Placements

Home >

Jobs >

Senior HPC Cluster Engineer

NVIDIA

North District, Israel (On-site)

Senior HPC Cluster Engineer

6 Months ago • 5 Years + • Research Development

Job Summary

Job Description

As a Senior HPC Cluster Engineer at NVIDIA, you'll lead the design and implementation of cutting-edge GPU compute clusters for deep learning, HPC, and computationally intensive workloads. Responsibilities include building and improving the GPU-accelerated computing ecosystem, developing large-scale automation solutions, maintaining and building deep learning clusters, supporting researchers, performing performance analysis and optimization, and conducting root cause analysis. You'll also be involved in strategic challenges related to compute, networking, storage, resource utilization, cloud strategy, capacity modeling, and growth planning.

Must have:

5+ years experience designing/operating large-scale compute infrastructure
Experience analyzing and tuning HPC workload performance
Knowledge of cluster management tools (Ansible, Puppet, Salt)
Experience with HPC schedulers (SLURM, LSF)
Understanding of container technologies (Docker, Singularity)
Proficient in Linux (CentOS/RHEL or Ubuntu), Python, bash scripting
Experience with MPI-based HPC workflows

Good to have:

Understanding of MLPerf benchmarking
Familiarity with InfiniBand, IBOP, RDMA
Understanding of Lustre/GPFS for HPC
Background in SDN and HPC cluster networking
Familiarity with PyTorch and TensorFlow

Perks:

Highly competitive salaries
Comprehensive benefits package

11 skills required

11 skills required for this role

Add these skills to join the top 1% applicants for this job

bash

tensorflow

puppet

deep-learning

python

docker

pytorch

linux

ansible

networking

performance-analysis

Job Details

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to solve, that only we can tackle, and that matter to the world. This is our life’s work, to amplify human imagination and intelligence. Make the choice to join us today!

As a member of the GPU/HPC Infrastructure team, you will provide leadership in the design and implementation of ground breaking GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek an expert to identify architectural changes and/or completely new approaches for our GPU Compute Clusters. As an expert, you will help us with the strategic challenges we encounter including: compute, networking, and storage design for large scale, high performance workloads, effective resource utilization in a heterogeneous compute environment, evolving our private/public cloud strategy, capacity modeling, and growth planning across our global computing environment.

What you'll be doing:

Building and improving our ecosystem around GPU-accelerated computing including developing large scale automation solutions
Maintaining and building deep learning clusters at scale
Supporting our researchers to run their flows on our clusters including performance analysis and optimizations of deep learning workflows
Root cause analysis and suggest corrective action for problems large and small scales
Finding and fixing problems before they occur

What we need to see:

Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience.
Minimum 5 years of experience designing and operating large scale compute infrastructure.
Experience analyzing and tuning performance for a variety of HPC workloads.
Working knowledge of cluster configuration managements tools such as Ansible, Puppet, Salt.
Experience with HPC cluster job schedulers such as SLURM, LSF
In depth understating of container technologies like Docker, Singularity, Shifter, Charliecloud
Proficient in Centos/RHEL and/or Ubuntu Linux distros including Python programming and bash scripting
Experience with HPC workflows that use MPI

Ways to stand out from the crowd:

Understanding of MLPerf benchmarking
Familiarity with InfiniBand with IBOP and RDMA
Understanding of fast, distributed storage systems like Lustre and GPFS for HPC workloads.
Background with Software Defined Networking and HPC cluster networking
Familuarity with deep learning frameworks like PyTorch and TensorFlow

NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most brilliant and talented people in the world working for us and, due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.

#LI-Hybrid

Similar Jobs

Senior DevSecOps Engineer, Italy

ION

Pisa, Tuscany, Italy (On-Site)

• 9 Months ago

DevOps Specialist

Zuru

Modena, Emilia-Romagna, Italy (Hybrid)

• 9 Months ago

DevSecOps Engineer

Fluence

Bengaluru, Karnataka, India (Hybrid)

• 9 Months ago

Professional Services Consultant - Cybersecurity

Fortra

Saudi Arabia (On-Site)

• 8 Months ago

Technical Support Engineer - Kubernetes

Microsoft

Sydney, New South Wales, Australia (Remote)

• 6 Months ago

VLSI Timing Methodology Intern - Summer 2025

NVIDIA

Santa Clara, California, United States (On-Site)

• 6 Months ago

Software Engineer in ML Systems Graduate (AML - Machine Learning Systems) - 2024 Start (BS/MS)

ByteDance

San Jose, California, United States (On-Site)

• 8 Months ago

Staff Software Engineer, Generalist - Unreal Ecosystem

Riot Games

Dublin, County Dublin, Ireland (On-Site)

• 8 Months ago

Systems Engineer / Product Definer

Cirrus Logic

Edinburgh, Scotland, United Kingdom (Hybrid)

• 9 Months ago

<2025 Internship Program> Application Engineer

NXP

Taipei City, Taiwan (On-Site)

• 9 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Senior Associate_Azure Data Engineer_Data & Analytics_Advisory_PAN India

PwC

Kolkata, West Bengal, India (On-Site)

• 9 Months ago

Senior Software QA Automation Engineer

NVIDIA

Ra'anana, Center District, Israel (On-Site)

• 6 Months ago

Senior Computer Systems Linux Engineer w/ Python

Luxoft

Bucharest, Bucharest, Romania (On-Site)

• 8 Months ago

Senior DevOps Engineer

Polygon Labs

United States (Remote)

• 4 Months ago

Senior Cloud Security Engineer

Interactive Brokers

Fort Lauderdale, Florida, United States (Hybrid)

• 9 Months ago

DevOps Engineer

Onward Search

Irvine, California, United States (Hybrid)

• 5 Months ago

Application Security Engineer

Every matrix

Bucharest, Bucharest, Romania (Hybrid)

• 6 Months ago

Senior Site Reliability Engineer

Gearbox Software

Frisco, Texas, United States (On-Site)

• 7 Months ago

ML/LLM Ops Intern

Rackspace Technology

Mexico City, Mexico City, Mexico (Remote)

• 5 Months ago

Cyber Security Analyst, Italy

ION

Turin, Piedmont, Italy (On-Site)

• 9 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Yokne'am Illit, North District, Israel

Senior Formal Verification Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

• 6 Months ago

Senior Instructional Designer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

• 4 Months ago

Data Analyst - Maternity Leave Replacement

SciPlay

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

• 6 Months ago

Bookkeeper

SuperPlay

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

• 5 Months ago

Account Manager - German Market

Moon Active

Tel Aviv-Yafo, Tel Aviv District, Israel (Hybrid)

• 7 Months ago

Loyalty Manager

Playtika

Israel (On-Site)

• 6 Months ago

Community Manager

Playtika

Israel (On-Site)

• 7 Months ago

Business Intelligence Developer

SciPlay

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

• 7 Months ago

Automation Engineer (Java)

Moon Active

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

• 5 Months ago

Analytics Principal

PAPAYA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

• 11 Months ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

Machine Learning Intern

Austin, Texas, United States (On-Site)

• 10 Months ago

CPU Physical Design-Timing Engineer

Intel Corporation

Bengaluru, Karnataka, India (Hybrid)

• 7 Months ago

Staff Software Engineer (Services) - League of Legends, Motivations

Riot Games

Los Angeles, California, United States (On-Site)

• 10 Months ago

Senior Power Electronics Engineer

Tesla

Baden-Württemberg, Germany (On-Site)

• 5 Months ago

Verification Team Lead

Ceragon Networks

Karnataka, India (On-Site)

• 8 Months ago

Principal Research Engineer

Epic Games

Cary, North Carolina, United States (On-Site)

• 6 Months ago

Tools Programmer

Ubisoft

Shanghai, Shanghai, China (On-Site)

• 8 Months ago

Senior High-Performance LLM Training Engineer

NVIDIA

Santa Clara, California, United States (Hybrid)

• 6 Months ago

Senior ASIC Design Engineer

NVIDIA

California, Maryland, United States (Remote)

• 4 Months ago

Senior Chip Design Verification Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (Hybrid)

• 5 Months ago

Get notifed when new similar jobs are uploaded

About The Company

NVIDIA

151 Active Jobs

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Senior HPC Cluster Engineer

Job Summary

Job Description

11 skills required

11 skills required for this role

Job Details

Similar Jobs

Senior DevSecOps Engineer, Italy

DevOps Specialist

DevSecOps Engineer

Professional Services Consultant - Cybersecurity

Technical Support Engineer - Kubernetes

VLSI Timing Methodology Intern - Summer 2025

Software Engineer in ML Systems Graduate (AML - Machine Learning Systems) - 2024 Start (BS/MS)

Staff Software Engineer, Generalist - Unreal Ecosystem

Systems Engineer / Product Definer

<2025 Internship Program> Application Engineer

Similar Skill Jobs

Senior Associate_Azure Data Engineer_Data & Analytics_Advisory_PAN India

Senior Software QA Automation Engineer

Senior Computer Systems Linux Engineer w/ Python

Senior DevOps Engineer

Senior Cloud Security Engineer

DevOps Engineer

Application Security Engineer

Senior Site Reliability Engineer

ML/LLM Ops Intern

Cyber Security Analyst, Italy

Jobs in Yokne'am Illit, North District, Israel

Senior Formal Verification Engineer

Senior Instructional Designer

Data Analyst - Maternity Leave Replacement

Bookkeeper

Account Manager - German Market

Loyalty Manager

Community Manager

Business Intelligence Developer

Automation Engineer (Java)

Analytics Principal

Research Development Jobs

Machine Learning Intern

CPU Physical Design-Timing Engineer

Staff Software Engineer (Services) - League of Legends, Motivations

Senior Power Electronics Engineer

Verification Team Lead

Principal Research Engineer

Tools Programmer

Senior High-Performance LLM Training Engineer

Senior ASIC Design Engineer

Senior Chip Design Verification Engineer

About The Company

System Design Power Validation Engineer

OEM Account Manager

System Debug Lead Engineer

Network Site Reliability Engineer

ASIC Engineer

Senior ASIC Design Engineer

Physical Design CAD Team Manager

Senior Data Scientist and System Architect

Solutions Architect for NCP

Senior Networking Architect

Level Up Your Career in Game Development!