Jobs Courses Resources Companies Placements

Home >

Jobs >

Senior AI-HPC Storage Engineer

NVIDIA

Massachusetts, United States (On-site)

Senior AI-HPC Storage Engineer

4 Months ago • 8 Years + • Research Development • $184,000 PA - $356,500 PA

Job Summary

Job Description

As a Senior AI-HPC Storage Engineer at NVIDIA, you'll lead the design and implementation of cutting-edge storage solutions for demanding AI/HPC workloads. Responsibilities include researching and implementing distributed storage services, designing on-prem and cloud-based AI/HPC infrastructure, developing automation tools, and collaborating with teams to optimize workflows. You'll perform performance analysis, root cause analysis, and contribute to the evolution of NVIDIA's global computing environment's storage strategy. The role requires expertise in parallel file systems (Lustre, GPFS), cloud environments (AWS, Azure, GCP), and AI/HPC cluster management.

Must have:

8+ years large-scale storage infrastructure experience
AI/HPC workload performance analysis & tuning
Lustre/GPFS experience
Proficient in Linux, Python, Bash scripting
Cloud storage experience (AWS, Azure, GCP)
Experience with SLURM/LSF
Docker, Kubernetes experience

Good to have:

NVIDIA GPU, CUDA, NCCL, MLPerf experience
Machine learning/deep learning knowledge
InfiniBand, IB/RDMA experience
SDN and AI/HPC cluster networking
PyTorch/TensorFlow familiarity

Perks:

Highly competitive salary
Comprehensive benefits package
Equity

16 skills required

16 skills required for this role

Add these skills to join the top 1% applicants for this job

performance-analysis

game-texts

cuda

networking

linux

aws

azure

pytorch

deep-learning

docker

python

algorithms

bash

tensorflow

css

machine-learning

Job Details

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to solve, that only we can address, and that matter to the world. This is our life’s work, to amplify human creativity and intelligence. Make the choice to join us today!

As a member of the GPU AI/HPC Infrastructure team, you will provide leadership in the design and implementation of ground breaking fast storage solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an expert to identify architectural changes encompassing file, block, and object storage, to cater to the requirements of an expanding cloud infrastructure. As an expert, you will help us with the next-gen storage solutions strategic challenges we encounter with storage design for large scale, high performance workloads, evolving our private/public cloud strategy, capacity modelling, and growth planning across our global computing environment.

What you'll be doing:

Research and implementation of distributed storage services.
Design, implement an on-prem AI/HPC infrastructure supplemented with cloud computing to support the growing needs of NVIDIA.
Design and implement scalable and efficient next-gen storage solutions tailored for data-intensive applications, optimizing performance and cost-effectiveness.
Develop tooling to automate management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources.
Document the general procedures and practices, perform technology evaluations, related to distributed file systems.
Collaborate across teams to better understand developers' workflows and gather their infrastructure requirements.
Influence and guide methodologies for building, testing, and deploying applications to ensure optimal performance and resource utilization.
Supporting our researchers to run their flows on our clusters including performance analysis and optimizations of deep learning workflows
Root cause analysis and suggest corrective action for problems large and small scales

What we need to see:

Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience.
8+ years of experience designing and operating large scale storage infrastructure.
Experience analyzing and tuning performance for a variety of AI/HPC workloads.
Experience with one or more parallel or distributed filesystems such as Lustre, GPFS is a must.
Proficient in Centos/RHEL and/or Ubuntu Linux distros including Python programming and bash scripting
Experience architecture design and operation of storage solutions on any of the leading Cloud environment [AWS, Azure or GCP]
Experience with AI/HPC cluster job schedulers such as SLURM, LSF
In depth understating of container technologies like Docker, Enroot
Experience with AI/HPC workflows that use MPI

Ways to stand out from the crowd:

Experience with NVIDIA GPUs, Cuda Programming, NCCL and MLPerf benchmarking
Experience with Machine Learning and Deep Learning concepts, algorithms and models
Familiarity with InfiniBand with IBOIP and RDMA
Background with Software Defined Networking and AI/HPC cluster networking
Familiarity with deep learning frameworks like PyTorch and TensorFlow

NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most resourceful and talented people in the world working for us and, due to unprecedented growth, our extraordinary engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.

The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

CRM & Data Systems Specialist

world relief

Towson, Maryland, United States (Remote)

• 1 Day ago

Technical Writer-Editor, Marketing

Eneba Games

(Remote)

• 2 Months ago

Jr. Paid Ads and Analytics Specialist

WebFX

Harrisburg, Pennsylvania, United States (On-Site)

• 8 Months ago

Copywriter (Digital Marketing & B2B) (Philippines)

WebFX

Philippines (Remote)

• 8 Months ago

Senior Offshore Azure Infrastructure - EST Shift

Hitachi

Pune, Maharashtra, India (On-Site)

• 8 Months ago

ML OPS

Capgemini

Hyderabad, Telangana, India (On-Site)

• 1 Month ago

Senior Machine Learning Engineer

Match Group

Seoul, South Korea (Hybrid)

• 1 Week ago

Senior Machine Learning Applied Researcher

Apple

Seattle, Washington, United States (On-Site)

• 3 Weeks ago

Machine Learning Performance Engineer

Jane Street

New York, United States (On-Site)

• 1 Month ago

Senior Program Manager, R&D/Engineering Enablement

Rippling

San Francisco, California, United States (On-Site)

• 1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Account Executive

DMG

Cincinnati, Ohio, United States (On-Site)

• 3 Weeks ago

Software Engineer, Architecture and Infrastructure

bytedance

San Jose, California, United States (On-Site)

• 8 Months ago

VP, Global Customer Success - iGaming

Aristocrat

Las Vegas, Nevada, United States (Remote)

• 1 Month ago

Senior User Acquisition Manager

Homa Games

Paris, Île-de-France, France (Hybrid)

• 1 Month ago

Senior Gameplay Programmer AI

HoYoverse

Québec City, Quebec, Canada (Remote)

• 3 Months ago

Jr. Content Marketer

WebFX

Harrisburg, Pennsylvania, United States (On-Site)

• 8 Months ago

Data Engineer (Microsoft & Talend)

Thales

Jakarta, Indonesia (On-Site)

• 1 Month ago

Paid Search Manager

Dentsu

London, England, United Kingdom (Hybrid)

• 1 Month ago

Senior Back-End Software Engineer

XBorg

(Remote)

• 4 Months ago

Senior Advertising Specialist

Nintendo

Redmond, Washington, United States (Hybrid)

• 11 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Westford, Massachusetts, United States

Director, Investor Relations

Ziff Davis

New York, United States (Remote)

• 2 Weeks ago

Machine Learning Engineer, Siri Automatic Speech Recognition

Apple

Cambridge, Massachusetts, United States (On-Site)

• 1 Month ago

GTM Recruiter

Safe security

New York, United States (On-Site)

• 2 Months ago

Research Associate I

BioFire

Salt Lake City, Utah, United States (On-Site)

• 2 Weeks ago

Field Marketing Manager

Workato

New York, United States (On-Site)

• 1 Month ago

Associate Product Manager, Web Curation

Alpha Sense

New York, United States (On-Site)

• 1 Month ago

Account Executive - New Logo North America

cyara

United States (Remote)

• 2 Weeks ago

Materials Handler I

BioFire

Salt Lake City, Utah, United States (On-Site)

• 2 Months ago

Systems Engineer - Collision Avoidance

zoox

Foster City, California, United States (Hybrid)

• 8 Months ago

APIs Staff Software Engineer

lifechruh

Edmond, Oklahoma, United States (On-Site)

• 8 Months ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

Applied AI Engineer - Internship

DevRev

Buenos Aires, Buenos Aires, Argentina (On-Site)

• 1 Month ago

Senior Machine Learning Engineer, Conversion Lift

Canada (Remote)

• 1 Month ago

Senior Machine Learning Engineer - Marketplace, Apple Ads

Apple

Cupertino, California, United States (On-Site)

• 1 Day ago

Head of Applied AI

Snorkel AI

New York, United States (Hybrid)

• 1 Month ago

Research Scientist, Reinforcement Learning

bytedance

San Jose, California, United States (On-Site)

• 8 Months ago

Student Researcher (Doubao (Seed) - Foundation Model - Video Generation) - 2025 Start (PhD)

bytedance

San Jose, California, United States (On-Site)

• 8 Months ago

Machine Learning Software Engineer Intern (Summer 2025)

Valeo

San Mateo, California, United States (On-Site)

• 2 Months ago

Student Researcher (Doubao (Seed) - Foundation Model - Speech Understanding) - 2025 Start (PhD)

bytedance

Seattle, Washington, United States (On-Site)

• 8 Months ago

Machine Learning Engineer - MLDev

bytedance

Seattle, Washington, United States (On-Site)

• 3 Months ago

Senior Staff Machine Learning Scientist

PayPal

San Jose, California, United States (Hybrid)

• 3 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

NVIDIA

245 Active Jobs

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Senior AI-HPC Storage Engineer

Job Summary

Job Description

16 skills required

16 skills required for this role

Job Details

Similar Jobs

CRM & Data Systems Specialist

Technical Writer-Editor, Marketing

Jr. Paid Ads and Analytics Specialist

Copywriter (Digital Marketing & B2B) (Philippines)

Senior Offshore Azure Infrastructure - EST Shift

ML OPS

Senior Machine Learning Engineer

Senior Machine Learning Applied Researcher

Machine Learning Performance Engineer

Senior Program Manager, R&D/Engineering Enablement

Similar Skill Jobs

Account Executive

Software Engineer, Architecture and Infrastructure

VP, Global Customer Success - iGaming

Senior User Acquisition Manager

Senior Gameplay Programmer AI

Jr. Content Marketer

Data Engineer (Microsoft & Talend)

Paid Search Manager

Senior Back-End Software Engineer

Senior Advertising Specialist

Jobs in Westford, Massachusetts, United States

Director, Investor Relations

Machine Learning Engineer, Siri Automatic Speech Recognition

GTM Recruiter

Research Associate I

Field Marketing Manager

Associate Product Manager, Web Curation

Account Executive - New Logo North America

Materials Handler I

Systems Engineer - Collision Avoidance

APIs Staff Software Engineer

Research Development Jobs

Applied AI Engineer - Internship

Senior Machine Learning Engineer, Conversion Lift

Senior Machine Learning Engineer - Marketplace, Apple Ads

Head of Applied AI

Research Scientist, Reinforcement Learning

Student Researcher (Doubao (Seed) - Foundation Model - Video Generation) - 2025 Start (PhD)

Machine Learning Software Engineer Intern (Summer 2025)

Student Researcher (Doubao (Seed) - Foundation Model - Speech Understanding) - 2025 Start (PhD)

Machine Learning Engineer - MLDev

Senior Staff Machine Learning Scientist

About The Company

Solutions Architect, Generative AI

VLSI Physical Design Engineer - New College Grad 2025

Senior Software Engineer, ASIC Verification Tools

Senior ASIC Full Chip Verification Engineer

Principal Engineer - Enterprise Applications

Senior Business System Architect, AI and ML

Senior Product Security Engineer

System Design Power Validation Engineer

OEM Account Manager

System Debug Lead Engineer

Level Up Your Career in Game Development!