Jobs Courses Resources Companies Placements

Home >

Jobs >

Senior Manager, Storage Production Engineering

NVIDIA

California, United States (On-site)

Senior Manager, Storage Production Engineering

5 Months ago • 10 Years + • Devops • $272,000 PA - $425,500 PA

Job Summary

Job Description

As a Senior Manager of Storage Production Engineering at NVIDIA, you'll lead a team responsible for designing, building, and maintaining large-scale storage infrastructure for GPU cloud services, AI/ML workloads, and high-throughput computing. This involves overseeing the deployment and optimization of distributed storage, parallel file systems, and object storage platforms. You will collaborate with various teams, drive automation and operational excellence, implement high-availability strategies, and mentor a team of engineers. The role demands expertise in scalable storage architectures, storage networking protocols, automation tools, and monitoring systems. You'll also be responsible for capacity planning, performance tuning, and troubleshooting large-scale storage systems.

Must have:

Lead and mentor storage engineering team
Design & deploy large-scale storage systems
Expertise in parallel & distributed storage
Strong automation & infrastructure-as-code skills
Capacity planning, performance tuning, troubleshooting

Good to have:

AI/ML workload storage experience
Hybrid/multi-cloud storage solutions
Software-defined storage (SDS) experience
Kubernetes-based storage orchestration
Experience driving cross-functional initiatives

Perks:

Equity
Benefits

15 skills required

15 skills required for this role

Add these skills to join the top 1% applicants for this job

kubernetes

puppet

azure

aws

terraform

incident-response

ansible

prometheus

scalability

innovation

cost-optimization

networking

cross-functional

problem-solving

team-management

Job Details

As a Senior Manager, Storage Production Engineering, you will lead a team responsible for designing, building, and maintaining large-scale, high-performance storage infrastructure to support NVIDIA’s GPU cloud services, AI/ML workloads, and high-throughput computing environments. This role requires a deep understanding of storage architectures, scalability challenges, and performance optimization techniques, along with strong leadership and strategic planning abilities.

You will drive the evolution of distributed storage systems, object storage, and parallel file systems to meet the growing demands of NVIDIA’s compute and AI workloads. In this role, you will collaborate closely with engineering, infrastructure, and operations teams to ensure the reliability, scalability, and efficiency of our storage solutions. You will also be responsible for building and mentoring a world-class team of storage production engineers, driving automation and operational excellence, and defining long-term strategies for storage infrastructure.

What You Will Be Doing:

Lead and mentor a team of highly skilled Storage Production Engineers, fostering a culture of innovation, collaboration, and technical excellence.
Oversee the design, deployment, and optimization of large-scale storage systems, including distributed storage, parallel file systems, and object storage platforms.
Partner with cross-functional teams to drive storage automation, monitoring, and predictive analytics to enhance reliability and efficiency.
Establish best practices for capacity planning, data lifecycle management, and cost optimization for storage infrastructure.
Implement high-availability and disaster recovery strategies, ensuring minimal downtime and data loss across mission-critical storage environments.
Drive the adoption of modern storage architectures, including NVMe over Fabrics (NVMe-oF), RDMA, high-speed interconnects, and cloud-based storage solutions.
Lead incident response and root cause analysis efforts, implementing proactive measures to enhance system stability and resilience.
Work closely with engineering, DevOps, and AI/ML teams to optimize data pipelines, storage access patterns, and workflow performance. Advocate for continuous improvements in automation, operational efficiency, and performance tuning within the storage infrastructure.

What We Need To See:

BS/MS in Computer Science, Storage Systems, or a related technical field (or equivalent experience).
10+ overall years of experience in large-scale storage architecture, production engineering, or infrastructure roles.
5+ years of management experience, leading high-performing storage, infrastructure, or site reliability engineering teams.
Proven expertise in scalable storage architectures, including parallel file systems (Lustre, GPFS), distributed storage (Ceph, MinIO), and enterprise-scale object storage (S3, NetApp, Pure Storage, etc.).
Strong background in block, file, and object storage technologies, including their performance tuning, high-availability strategies, and data protection mechanisms.
Experience with storage networking protocols, such as NFS, SMB, iSCSI, Fibre Channel, RDMA, and NVMe-oF.
Hands-on experience with automation and infrastructure as code using Terraform, Ansible, Puppet, or similar tools.
Deep understanding of capacity planning, performance tuning, and troubleshooting large-scale storage systems.
Expertise in monitoring and observability tools like Prometheus, InfluxDB, and Elastic stack for storage infrastructure.

Ways to Stand Out from the crowd:

Experience in designing and scaling storage infrastructure for AI/ML workloads and high-performance computing (HPC). Familiarity with hybrid cloud and multi-cloud storage solutions, including AWS S3, Azure Blob, and Google Cloud Storage.
Proven ability to drive cross-functional initiatives, aligning storage strategies with broader business and engineering objectives.
Experience with software-defined storage (SDS), cloud-native storage, and Kubernetes-based storage orchestration. Passion for mentoring engineers, fostering career growth, and creating a high-performance team culture.

At NVIDIA, you’ll be at the forefront of innovative storage technologies, working on high-performance storage solutions that power the next generation of AI, HPC, and cloud computing. NVIDIA is leading in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. We have some of the most forward-thinking, and hardworking people on the planet working for us. If you're creative, passionate and self-motivated, we want to hear from you!

The base salary range is 272,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Node JS Developer with Automation Expertise

Synechron

Chennai, Tamil Nadu, India (On-Site)

• 4 Months ago

Software Engineer (Infrastructure/Backend)

Argus Labs

Indonesia (Remote)

• 5 Months ago

Lead Product Engineer

Hedra

New York, New York, United States (On-Site)

• 5 Months ago

Senior Computer Vision Engineer - Photo AI

Canva

Vienna, Vienna, Austria (Remote)

• 5 Months ago

Data Engineer

Enphase Energy

Bengaluru, Karnataka, India (On-Site)

• 8 Months ago

Senior DevSecOps Engineer, Italy

ION

Milan, Lombardy, Italy (On-Site)

• 10 Months ago

Senior ASIC Front End Infrastructure Engineer

NVIDIA

Louisiana, United States (Hybrid)

• 5 Months ago

Cloud Technical Solutions Engineer, Infrastructure

Google

Tokyo, Japan (On-Site)

• 4 Months ago

Intermediate/Senior Tools Programmer

Ubisoft

Malmö, Skåne County, Sweden (Hybrid)

• 5 Months ago

Engineering Manager - Infrastructure Tooling, Data Platform

Netflix

Warsaw, Masovian Voivodeship, Poland (On-Site)

• 4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ML Platform Deployment Engineer

neural concept

Belgrade, Serbia (Hybrid)

• 3 Months ago

Junior Software Engineer - Nodejs, Python

Egnyte

India (Remote)

• 4 Months ago

Senior Systems Administrator (Linux)

WildBrain

Vancouver, British Columbia, Canada (Hybrid)

• 3 Months ago

Sales Engineer

gravitee.io

London, England, United Kingdom (Hybrid)

• 5 Months ago

Engineering Manager - Cloud Operations

Nine

North Sydney, New South Wales, Australia (On-Site)

• 3 Months ago

Director of Product Management - AI CodeGen

Stacklok

Bellevue, Washington, United States (Hybrid)

• 3 Months ago

Senior Site Reliability Engineer - FinOps

DraftKings

Canada (Remote)

• 4 Months ago

Support Engineer (APAC - India)

Gitlab

India (Remote)

• 3 Months ago

Senior Technical Marketing Engineer - Datacenter Networking

NVIDIA

Canada (On-Site)

• 6 Months ago

Cloud Infrastructure Architect

Capgemini

Mumbai, Maharashtra, India (On-Site)

• 3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Staff Software Engineer, Account Services

Crunchyroll

San Francisco, California, United States (On-Site)

• 6 Months ago

Operations Manager

Penrose studios

San Francisco, California, United States (On-Site)

• 5 Years ago

Senior Research Engineer, Foundation Model Training Infrastructure

NVIDIA

Santa Clara, California, United States (On-Site)

• 7 Months ago

Senior Account Manager

QuinStreet

United States (Remote)

• 3 Months ago

QA Game Tester

Global Step

Dallas, Texas, United States (On-Site)

• 5 Months ago

Software Engineer, Growth

Notion

San Francisco, California, United States (On-Site)

• 3 Months ago

Solutions Consultant

Hudl

Lincoln, Nebraska, United States (On-Site)

• 3 Months ago

Backend Software Engineer Intern

ByteDance

Seattle, Washington, United States (On-Site)

• 3 Months ago

Engineering Manager, Connectivity

Scale AI

San Francisco, California, United States (Hybrid)

• 4 Months ago

Graphic and Motion Designer

Kabam

Los Angeles, California, United States (Hybrid)

• 6 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

AI Cloud Infrastructure Engineer

SmileGate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)

• 5 Months ago

Senior Platform Administration Engineer

Aera Technology

Bucharest, Bucharest, Romania (Hybrid)

• 10 Months ago

Staff Data Engineer

Visa

Warsaw, Masovian Voivodeship, Poland (Hybrid)

• 10 Months ago

DevOps - Lead

Fractal

Mumbai, Maharashtra, India (On-Site)

• 10 Months ago

Senior DevOps Engineer, Deep Learning Frameworks

NVIDIA

Warsaw, Masovian Voivodeship, Poland (On-Site)

• 7 Months ago

IN- Senior Associate_ DevOps_Advisory Corporate_Advisory _Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)

• 11 Months ago

Senior Software Engineer

Microsoft

(On-Site)

• 4 Months ago

Senior Systems Engineer, Cloud Filestore, Site Reliability Engineering

Google

Dublin, County Dublin, Ireland (On-Site)

• 4 Months ago

Linux System Engineer

ByteDance

London, England, United Kingdom (On-Site)

• 7 Months ago

Software Engineer - Foundation Security

GoTo Group

Bengaluru, Karnataka, India (On-Site)

• 9 Months ago

Get notifed when new similar jobs are uploaded

About The Company

NVIDIA

76 Active Jobs

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Senior Manager, Storage Production Engineering

Job Summary

Job Description

15 skills required

15 skills required for this role

Job Details

Similar Jobs

Node JS Developer with Automation Expertise

Software Engineer (Infrastructure/Backend)

Lead Product Engineer

Senior Computer Vision Engineer - Photo AI

Data Engineer

Senior DevSecOps Engineer, Italy

Senior ASIC Front End Infrastructure Engineer

Cloud Technical Solutions Engineer, Infrastructure

Intermediate/Senior Tools Programmer

Engineering Manager - Infrastructure Tooling, Data Platform

Similar Skill Jobs

ML Platform Deployment Engineer

Junior Software Engineer - Nodejs, Python

Senior Systems Administrator (Linux)

Sales Engineer

Engineering Manager - Cloud Operations

Director of Product Management - AI CodeGen

Senior Site Reliability Engineer - FinOps

Support Engineer (APAC - India)

Senior Technical Marketing Engineer - Datacenter Networking

Cloud Infrastructure Architect

Jobs in Santa Clara, California, United States

Staff Software Engineer, Account Services

Operations Manager

Senior Research Engineer, Foundation Model Training Infrastructure

Senior Account Manager

QA Game Tester

Software Engineer, Growth

Solutions Consultant

Backend Software Engineer Intern

Engineering Manager, Connectivity

Graphic and Motion Designer

Devops Jobs

AI Cloud Infrastructure Engineer

Senior Platform Administration Engineer

Staff Data Engineer

DevOps - Lead

Senior DevOps Engineer, Deep Learning Frameworks

IN- Senior Associate_ DevOps_Advisory Corporate_Advisory _Bangalore

Senior Software Engineer

Senior Systems Engineer, Cloud Filestore, Site Reliability Engineering

Linux System Engineer

Software Engineer - Foundation Security

About The Company

System Design Power Validation Engineer

OEM Account Manager

System Debug Lead Engineer

Network Site Reliability Engineer

ASIC Engineer

Senior ASIC Design Engineer

Physical Design CAD Team Manager

Engineering Farm Engineer

Senior Mixed Signal Design Verification Engineer

Senior Solutions Architect, Cloud Infrastructure and DevOps

Level Up Your Career in Game Development!