Senior Manager, Storage Production Engineering

3 Days ago • 10 Years + • DevOps • $272,000 PA - $425,500 PA

Job Summary

Job Description

As a Senior Manager of Storage Production Engineering at NVIDIA, you'll lead a team responsible for designing, building, and maintaining large-scale storage infrastructure for GPU cloud services, AI/ML workloads, and high-throughput computing. This involves overseeing the deployment and optimization of distributed storage, parallel file systems, and object storage platforms. You will collaborate with various teams, drive automation and operational excellence, implement high-availability strategies, and mentor a team of engineers. The role demands expertise in scalable storage architectures, storage networking protocols, automation tools, and monitoring systems. You'll also be responsible for capacity planning, performance tuning, and troubleshooting large-scale storage systems.
Must have:
  • Lead and mentor storage engineering team
  • Design & deploy large-scale storage systems
  • Expertise in parallel & distributed storage
  • Strong automation & infrastructure-as-code skills
  • Capacity planning, performance tuning, troubleshooting
Good to have:
  • AI/ML workload storage experience
  • Hybrid/multi-cloud storage solutions
  • Software-defined storage (SDS) experience
  • Kubernetes-based storage orchestration
  • Experience driving cross-functional initiatives
Perks:
  • Equity
  • Benefits

Job Details

As a Senior Manager, Storage Production Engineering, you will lead a team responsible for designing, building, and maintaining large-scale, high-performance storage infrastructure to support NVIDIA’s GPU cloud services, AI/ML workloads, and high-throughput computing environments. This role requires a deep understanding of storage architectures, scalability challenges, and performance optimization techniques, along with strong leadership and strategic planning abilities.

You will drive the evolution of distributed storage systems, object storage, and parallel file systems to meet the growing demands of NVIDIA’s compute and AI workloads. In this role, you will collaborate closely with engineering, infrastructure, and operations teams to ensure the reliability, scalability, and efficiency of our storage solutions. You will also be responsible for building and mentoring a world-class team of storage production engineers, driving automation and operational excellence, and defining long-term strategies for storage infrastructure.

What You Will Be Doing:

  • Lead and mentor a team of highly skilled Storage Production Engineers, fostering a culture of innovation, collaboration, and technical excellence.

  • Oversee the design, deployment, and optimization of large-scale storage systems, including distributed storage, parallel file systems, and object storage platforms.

  • Partner with cross-functional teams to drive storage automation, monitoring, and predictive analytics to enhance reliability and efficiency.

  • Establish best practices for capacity planning, data lifecycle management, and cost optimization for storage infrastructure.

  • Implement high-availability and disaster recovery strategies, ensuring minimal downtime and data loss across mission-critical storage environments.

  • Drive the adoption of modern storage architectures, including NVMe over Fabrics (NVMe-oF), RDMA, high-speed interconnects, and cloud-based storage solutions.

  • Lead incident response and root cause analysis efforts, implementing proactive measures to enhance system stability and resilience.

  • Work closely with engineering, DevOps, and AI/ML teams to optimize data pipelines, storage access patterns, and workflow performance. Advocate for continuous improvements in automation, operational efficiency, and performance tuning within the storage infrastructure.

What We Need To See:

  • BS/MS in Computer Science, Storage Systems, or a related technical field (or equivalent experience).

  • 10+ overall years of experience in large-scale storage architecture, production engineering, or infrastructure roles.

  • 5+ years of management experience, leading high-performing storage, infrastructure, or site reliability engineering teams.

  • Proven expertise in scalable storage architectures, including parallel file systems (Lustre, GPFS), distributed storage (Ceph, MinIO), and enterprise-scale object storage (S3, NetApp, Pure Storage, etc.).

  • Strong background in block, file, and object storage technologies, including their performance tuning, high-availability strategies, and data protection mechanisms.

  • Experience with storage networking protocols, such as NFS, SMB, iSCSI, Fibre Channel, RDMA, and NVMe-oF.

  • Hands-on experience with automation and infrastructure as code using Terraform, Ansible, Puppet, or similar tools.

  • Deep understanding of capacity planning, performance tuning, and troubleshooting large-scale storage systems.

  • Expertise in monitoring and observability tools like Prometheus, InfluxDB, and Elastic stack for storage infrastructure.

Ways to Stand Out from the crowd:

  • Experience in designing and scaling storage infrastructure for AI/ML workloads and high-performance computing (HPC). Familiarity with hybrid cloud and multi-cloud storage solutions, including AWS S3, Azure Blob, and Google Cloud Storage.

  • Proven ability to drive cross-functional initiatives, aligning storage strategies with broader business and engineering objectives.

  • Experience with software-defined storage (SDS), cloud-native storage, and Kubernetes-based storage orchestration. Passion for mentoring engineers, fostering career growth, and creating a high-performance team culture.

At NVIDIA, you’ll be at the forefront of innovative storage technologies, working on high-performance storage solutions that power the next generation of AI, HPC, and cloud computing. NVIDIA is leading in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. We have some of the most forward-thinking, and hardworking people on the planet working for us. If you're creative, passionate and self-motivated, we want to hear from you!

The base salary range is 272,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

ARHS - Application Engineer/Administrator

ARHS

The Hague, South Holland, Netherlands (On-Site)
5 Months ago
Onward Search - DevOps Engineer

Onward Search

Irvine, California, United States (Hybrid)
1 Month ago
Appier - Software Engineer, Site Reliability Engineering

Appier

Taipei City, Taiwan (On-Site)
4 Months ago
Activision - Associate Software Engineer - Demonware (Vancouver)

Activision

Vancouver, British Columbia, Canada (On-Site)
4 Days ago
Egnyte - DevOps Engineer

Egnyte

India (Remote)
1 Month ago
Nagarro - Principal Engineer, Machine Learning (Python)

Nagarro

Gurugram, Haryana, India (On-Site)
3 Months ago
Alp Consulting  - Unity 3D developer

Alp Consulting

Bengaluru, Karnataka, India (Hybrid)
11 Months ago
Nielsen Holdings - Software Engineer - Bigdata (Java/ Scala/ Python ,SQL , AWS)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Moon Active - DevOps Team Leader

Moon Active

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Appier - Software Engineer, Site Reliability Engineering

Appier

Taipei City, Taiwan (On-Site)
4 Months ago
Hitachi - Azure Developer

Hitachi

Hyderabad, Telangana, India (Remote)
5 Months ago
EXUSIA - Ab Initio CoE Administrator

EXUSIA

India (Remote)
5 Months ago
Cargo Studio - Lead DevOps Engineer

Cargo Studio

(On-Site)
1 Month ago
Velotio Technologies - Senior DevOps Engineer (AWS)

Velotio Technologies

Maharashtra, India (Remote)
1 Week ago
Metyis - Lead Devops Engineer

Metyis

Bengaluru, Karnataka, India (On-Site)
4 Months ago
ByteDance - Security Systems Engineer, Fleet Management

ByteDance

Singapore (On-Site)
2 Months ago
Rackspace Technology - Customer Data Engineer II

Rackspace Technology

India (Remote)
1 Week ago
CrazyLabs - DevOps Engineer

CrazyLabs

Skopje, Greater Skopje, North Macedonia (On-Site)
2 Months ago
Activision - Associate Software Engineer - Demonware (Vancouver)

Activision

Vancouver, British Columbia, Canada (On-Site)
4 Days ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

ByteDance - AI Product Manager

ByteDance

San Jose, California, United States (On-Site)
1 Week ago
Illumination - Strategy & Business Development Intern, MBA – Summer 2025

Illumination

Santa Monica, California, United States (Hybrid)
1 Month ago
Nintendo - Sr. Engineer, ML (NTD)

Nintendo

Redmond, Washington, United States (On-Site)
1 Month ago
Google - Staff Software Engineer, Infrastructure, Google Cloud

Google

Cambridge, Massachusetts, United States (On-Site)
4 Months ago
Logitech - Public Sector Account Manager

Logitech

Minnesota, United States (Remote)
2 Months ago
On Location - Senior Manager of Technical Project Management

On Location

Austin, Texas, United States (On-Site)
1 Week ago
Nielsen Holdings - Field Sales Representative

Nielsen Holdings

West Mifflin, Pennsylvania, United States (Hybrid)
3 Days ago
Zoox - Staff Autonomy Integration Manager

Zoox

Foster City, California, United States (Hybrid)
5 Months ago
Onward Search - Digital Project Manager

Onward Search

Santa Monica, California, United States (Remote)
1 Month ago
Inkittt - Senior Marketing Manager

Inkittt

San Francisco, California, United States (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Saviynt - Software Architect - Privilege Access Management

Saviynt

United States (Remote)
5 Months ago
Info Stretch - Lead Data Engineer

Info Stretch

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Omnissa - Member of Technical Staff (C++ Windows)

Omnissa

Chennai, Tamil Nadu, India (On-Site)
5 Months ago
Zeta - Data Reliability Engineer II

Zeta

Hyderabad, Telangana, India (On-Site)
5 Months ago
Bounteous - Senior Cloud Engineer - BOT

Bounteous

India (Remote)
5 Months ago
Zoox - Staff/Senior Staff Software Platform Engineer

Zoox

Foster City, California, United States (Hybrid)
5 Months ago
Samsung Semiconductor - Staff DevOps Engineer

Samsung Semiconductor

San Jose, California, United States (Hybrid)
2 Months ago
Playtech - Integration Engineer

Playtech

Kyiv, Kyiv City, Ukraine (On-Site)
4 Days ago
Turbulent - Senior DevOps Engineer

Turbulent

Montreal, Quebec, Canada (On-Site)
2 Weeks ago
Luxoft - DevOps Engineer with Azure

Luxoft

Pune, Maharashtra, India (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Ra'anana, Center District, Israel (On-Site)

Ra'anana, Center District, Israel (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug