Home >

Jobs >

Senior Storage and Data Production Engineer

NVIDIA

California, United States (On-site)

Senior Storage and Data Production Engineer

5 Months ago • 5 Years + • Devops • $148,000 PA - $287,500 PA

Job Summary

Job Description

The Senior Storage and Data Production Engineer at NVIDIA designs, implements, and supports large-scale storage clusters, ensuring scalability, high availability, and data integrity. Responsibilities include developing monitoring systems, optimizing storage architectures for AI/ML workloads, improving the lifecycle of storage services, and maintaining infrastructure. The role requires expertise in various storage technologies, networking protocols, and automation tools. The engineer will optimize storage efficiency, ensure data security, and participate in on-call rotations. This position necessitates strong problem-solving skills, collaboration, and adaptability to emerging technologies.

Must have:

Experience with high-performance storage solutions
Understanding of block, file, and object storage
Expertise in algorithms, data structures
Experience with automation tools (Ansible, Chef, etc.)
Experience with monitoring tools (InfluxDB, Prometheus, etc.)
Proficient in C/C++, Java, Python, Go, etc.

Good to have:

Deep understanding of distributed storage architectures
Experience with Kubernetes, OpenStack, or hybrid cloud
Ability to design automated storage migration strategies
Strong debugging and problem-solving skills

Perks:

Equity
Benefits

15 skills required

15 skills required for this role

Add these skills to join the top 1% applicants for this job

kubernetes

data-analytics

networking

problem-solving

java

ruby

unity

algorithms

ci-cd

github

puppet

containers

chef

data-structures

python

Job Details

Production engineering is a team that involves designing, building, and maintaining large-scale production systems with high efficiency and availability. It encompasses various areas, including software and systems engineering practices, storage, data management, and services. Production Engineers possess expertise in different domains, such as storage architecture, high-performance distributed storage, data management, systems, networking, coding, database management, capacity planning, continuous delivery, and deployment, as well as open-source cloud-enabling technologies like Kubernetes, containers, and virtualization. Their responsibilities include ensuring reliable, scalable, high-performance storage solutions, optimizing data placement and access patterns, managing large-scale distributed storage systems, and ensuring low-latency data access for high-performance computing (HPC) and AI/ML workloads.

Production Engineers at NVIDIA ensure that our internal and external-facing GPU cloud services have reliability and uptime as promised to the users while enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency, and performance. This role also requires an approach focused on automating storage operations, improving data access efficiency, and optimizing storage performance. Much of our software development focuses on eliminating manual work through automation, performance tuning, and growing the efficiency of storage and production systems.

What You Will Be Doing:

Design, implement, and support large-scale storage clusters, ensuring scalability, high availability, and data integrity.
Develop and maintain storage monitoring, logging, and alerting systems to ensure proactive detection and resolution of performance issues.
Work with AI/ML workloads to optimize storage architectures for low-latency access, efficient caching, and high-throughput performance. Improve the lifecycle of storage services – from inception and design to deployment, operation, and continuous optimization.
Support storage services before they launch through activities such as system design consulting, developing automation frameworks, capacity management, and launch reviews.
Maintain storage infrastructure once live by monitoring availability, latency, and system health, using predictive analytics and AI-driven automation.
Optimize storage efficiency through compression, duplication, tiering strategies, and intelligent workload placement.
Scale storage systems sustainably using AI/ML-driven automation, policy-based tiering, and dynamic data migration techniques. Ensure data security and compliance by implementing encryption, access controls, and auditing mechanisms for storage systems.
Practice sustainable incident response and blameless postmortems. Be part of an on-call rotation to support storage and production systems.

What We Need To See:

BS degree or equivalent experience in Computer Science, Storage Systems, or a related technical field (e.g., physics, mathematics), and 5+ years of practical experience.
Experience with high-performance storage solutions, including parallel file systems (Lustre, GPFS), distributed storage (Ceph, MinIO), and enterprise-scale object storage (S3, NetApp, Pure Storage, etc.).
Solid understanding of block, file, and object storage technologies, including their performance characteristics and standard methodologies.
Experience with storage networking protocols such as NFS, SMB, iSCSI, Fibre Channel, RDMA, and NVMe over Fabrics.
Expertise in algorithms, data structures, complexity analysis, software design, and maintaining large-scale Linux-based storage systems.
Experience in one or more of the following: C/C++, Java, Python, Go, Perl, or Ruby for storage automation, monitoring, and performance tuning.
Hands-on experience with infrastructure configuration management tools like Ansible, Chef, Puppet, and Terraform for automating storage deployments.
Experience with observability and tracing tools like InfluxDB, Prometheus, and the Elastic stack for monitoring storage system health.

Ways to stand out from the crowd:

Deep understanding of large-scale distributed storage architectures, replication strategies, and erasure coding techniques. Proven experience in capacity planning, performance tuning, and troubleshooting high-throughput storage systems.
Experience with Git, code review, pipelines, and CI/CD for handling infrastructure as code. Interest in analyzing and improving distributed storage system performance at scale. Strong debugging skills with a systematic problem-solving approach to identify complex storage issues. Experience using or running private and public cloud storage solutions based on Kubernetes, OpenStack, or hybrid cloud architectures.
Ability to design and implement automated storage migration, backup, and disaster recovery strategies. Thrive in collaborative environments and enjoy working with various teams to optimize storage performance. Flexible in adapting to different working styles and emerging storage technologies.

At NVIDIA, you’ll be at the forefront of innovative storage technologies, working on high-performance storage solutions that power the next generation of AI, HPC, and cloud computing. NVIDIA is leading in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. We have some of the most forward-thinking, and hardworking people on the planet working for us. If you're creative, passionate and self-motivated, we want to hear from you!

The base salary range is 148,000 USD - 287,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Staff Backend Engineer

Monzo

London, England, United Kingdom (Remote)

• 3 Months ago

Principal Software Engineer

Zynga

Bengaluru, Karnataka, India (On-Site)

• 3 Months ago

Cloud Engineer Kubernetes

ION

Collecchio, Emilia-Romagna, Italy (Hybrid)

• 10 Months ago

Cloud API Management Technical Consultant

Boomi

Bengaluru, Karnataka, India (On-Site)

• 4 Months ago

Senior Software Engineer

The Walt Disney Company

Santa Monica, California, United States (On-Site)

• 4 Months ago

Senior Site Reliability Engineer - GPU Cloud

NVIDIA

Bengaluru, Karnataka, India (On-Site)

• 5 Months ago

SENIOR SOFTWARE ENGINEER (CLOUD)

Britive

Bengaluru, Karnataka, India (Remote)

• 9 Months ago

CDN Site Reliability Engineer (SRE) L4/L5

Netflix

California, United States (Remote)

• 8 Months ago

Senior Software Engineer (L5) - Developer Infrastructure

Netflix

Los Gatos, California, United States (On-Site)

• 4 Months ago

Java Support Software Engineer

Info Stretch

Mexico (On-Site)

• 10 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Talent Acquisition Partner

Veeam Software

Prague, Czechia (Remote)

• 3 Months ago

Senior Java engineer (with oncall support)

Luxoft

Ukrainka, Kyiv Oblast, Ukraine (Remote)

• 8 Months ago

Senior Software Engineer (Full Stack, Platform)

Whoop

Boston, Massachusetts, United States (Hybrid)

• 4 Months ago

DevOps Lead

GoReel

Bratislava Region, Slovakia (Remote)

• 5 Months ago

Lead Backend Engineer

Gaming Innovation Group

Andalusia, Spain (Hybrid)

• 5 Months ago

Software Engineer, Ad Formats

Ontario, Canada (Remote)

• 4 Months ago

Senior Site Reliability Engineer

Barracuda Networks Inc

Bengaluru, Karnataka, India (Hybrid)

• 1 Year ago

Senior Engineering Manager

GoFundMe

Buenos Aires, Buenos Aires, Argentina (Hybrid)

• 3 Months ago

Azure Senior DevOps Engineer

London stock Exchange

St. Louis, Missouri, United States (On-Site)

• 3 Months ago

Senior Data Engineer

PENN Interactive

Philadelphia, Pennsylvania, United States (Remote)

• 4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Senior Software Verification Engineer

PassiveLogic

Holladay, Utah, United States (On-Site)

• 4 Months ago

Sales Solutions Architect

Alphasense

New York, United States (On-Site)

• 3 Months ago

Account Executive, Enterprise - Mid-Atlantic

GoMotive

United States (Remote)

• 5 Months ago

Sr. Customer Success Manager

Sprinkler

California, United States (Remote)

• 3 Months ago

Product Manager II (Connectivity and Embedded Systems)

Whoop

Boston, Massachusetts, United States (On-Site)

• 4 Months ago

Senior Software Engineer - XR Open Standard, PICO

ByteDance

San Jose, California, United States (On-Site)

• 10 Months ago

Senior Account Executive

Anzuio

New York, New York, United States (Hybrid)

• 5 Months ago

Manager, Solutions Architect - Oracle Cloud ERP & Integrations

McDonald's Corporation

Chicago, Illinois, United States (On-Site)

• 3 Months ago

Senior Software Engineer, Infrastructure, Google Cloud Business Platforms

Google

Kirkland, Washington, United States (On-Site)

• 4 Months ago

Mid-Level Intelligence Analyst

Dynamis Inc

Albuquerque, New Mexico, United States (On-Site)

• 3 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Senior DevOps Engineer - AWS

Ajmera Infotech

Austin, Texas, United States (On-Site)

• 9 Months ago

DevOps Lead

GoReel

Poland (Remote)

• 5 Months ago

Cloud Technical Solutions Engineer, Infrastructure Compute

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)

• 4 Months ago

Senior Software Support Engineer

Luxoft

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (Remote)

• 9 Months ago

Release Manager

Wipro

Bengaluru, Karnataka, India (On-Site)

• 9 Months ago

Cloud Site Reliability Engineer

ByteDance

San Jose, California, United States (On-Site)

• 5 Months ago

Senior Technical Consultant – IT2

ION

Central Sulawesi, Indonesia (On-Site)

• 10 Months ago

Staff Software Engineer

Crunchyroll

Hyderabad, Telangana, India (On-Site)

• 11 Months ago

Customer Engineer, SAP, Google Cloud

Google

Kansas City, Missouri, United States (On-Site)

• 4 Months ago

Senior Software Engineer - Serverless Compute Infrastructure

ByteDance

Seattle, Washington, United States (On-Site)

• 6 Months ago

Get notifed when new similar jobs are uploaded

About The Company

NVIDIA

75 Active Jobs

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Senior Storage and Data Production Engineer

Job Summary

Job Description

15 skills required

15 skills required for this role

Job Details

Similar Jobs

Staff Backend Engineer

Principal Software Engineer

Cloud Engineer Kubernetes

Cloud API Management Technical Consultant

Senior Software Engineer

Senior Site Reliability Engineer - GPU Cloud

SENIOR SOFTWARE ENGINEER (CLOUD)

CDN Site Reliability Engineer (SRE) L4/L5

Senior Software Engineer (L5) - Developer Infrastructure

Java Support Software Engineer

Similar Skill Jobs

Talent Acquisition Partner

Senior Java engineer (with oncall support)

Senior Software Engineer (Full Stack, Platform)

DevOps Lead

Lead Backend Engineer

Software Engineer, Ad Formats

Senior Site Reliability Engineer

Senior Engineering Manager

Azure Senior DevOps Engineer

Senior Data Engineer

Jobs in Santa Clara, California, United States

Senior Software Verification Engineer

Sales Solutions Architect

Account Executive, Enterprise - Mid-Atlantic

Sr. Customer Success Manager

Product Manager II (Connectivity and Embedded Systems)

Senior Software Engineer - XR Open Standard, PICO

Senior Account Executive

Manager, Solutions Architect - Oracle Cloud ERP & Integrations

Senior Software Engineer, Infrastructure, Google Cloud Business Platforms

Mid-Level Intelligence Analyst

Devops Jobs

Senior DevOps Engineer - AWS

DevOps Lead

Cloud Technical Solutions Engineer, Infrastructure Compute

Senior Software Support Engineer

Release Manager

Cloud Site Reliability Engineer

Senior Technical Consultant – IT2

Staff Software Engineer

Customer Engineer, SAP, Google Cloud

Senior Software Engineer - Serverless Compute Infrastructure

About The Company

System Design Power Validation Engineer

OEM Account Manager

System Debug Lead Engineer

Network Site Reliability Engineer

ASIC Engineer

Senior ASIC Design Engineer

Physical Design CAD Team Manager

Engineering Farm Engineer

Senior Mixed Signal Design Verification Engineer

Senior Solutions Architect, Cloud Infrastructure and DevOps

Level Up Your Career in Game Development!