Senior System Reliability Engineer

3 Months ago • 6-8 Years • Devops • $140,000 PA - $264,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior System Reliability Engineer to contribute to the reliability of their GPU servers and high-performance computing systems. Responsibilities include establishing and maintaining product reliability standards, participating in design reviews, working with suppliers and partners, defining reliability plans, performing testing and failure analysis, and correlating test results with field performance. This role requires expertise in hardware reliability engineering for electronics and server systems, including graphics cards, servers, racks, and clusters, encompassing the entire product lifecycle. The ideal candidate will have extensive experience with PCIE peripherals, graphics cards, and servers, strong statistical analysis skills, and excellent communication abilities.
Must have:
  • Hardware Reliability Engineering Expertise
  • Experience with PCIE peripherals, graphics cards, servers
  • Strong statistical analysis skills
  • Excellent communication skills
  • Design for Reliability (DfR) methods
  • Failure analysis and recommendations
Good to have:
  • MS or PhD in relevant field
Perks:
  • Competitive salary
  • Generous benefits package

Job Details

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing — with the GPU acting as the brains of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and build our teams with the most thoughtful people in the world. Join us at the forefront of technological advancement. GPU Servers are one of the fastest-growing segments for NVIDIA and the Artificial Intelligence industry. As the computational power increases with every GPU generation, developing efficient and reliable systems is an imperative. We are looking for a System Reliability Engineer to join NVIDIA's existing Reliability Engineering team, involved in NVIDIA's diverse system product range specifically Graphics and High-Performance Computing printed circuit boards and Data Center Servers.


What you'll be doing:

  • Provide expertise in Hardware Reliability Engineering for Electronics/Server Systems (graphics cards, server, rack, cluster) from Concept to End-of-Life phase.

  • Establish, deliver and maintain product reliability standards and metrics for NVIDIA's new system technologies, using existing tools and processes or developing new as required.

  • Participate in product and engineering design reviews, assess the reliability budget of products/designs, and inspire changes that enhance product reliability.

  • Interface and interact with all pertinent engineering groups, suppliers, and partners ensuring the desired reliability is achieved using Design for Reliability (DfR) methods including FMEA and DoE approaches.

  • Define and implement Reliability Plans & Specifications.

  • Provide reliability predictions, along with test plans and methods to access and drive product reliability to the desired levels.

  • Perform and lead appropriate testing with associated failure analysis and recommendations for improving designs and manufacturing.

  • Develop and present methods of correlating reliability test results with actual field performance.


What we need to see:

  • BS (or equivalent experience) in Engineering, Material Science, Physics, or a related field, MS or PhD preferred.

  • 6+ years in a hardware validation/reliability environment related to PCIE peripherals, graphics cards and servers.

  • Understand power supply, memory, high speed I/O, PCI express, Ethernet and I2C.

  • Hands-on experience in theoretical and practical Reliability concepts as it relates to high-tech electronic enterprise and consumer products.

  • Have a strong command and understanding of statistical concepts/models/analysis and how they relate to product reliability & life analysis.

  • Good verbal and writing skills as well as the ability to communicate at a high level.

  • Self-motivating, independent, and committed to getting things done.

  • Good project management skills and ability to balance multiple simultaneous projects during development and production stages.

With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you. Come build the future with us!

The base salary range is 140,000 USD - 264,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

CME Group - Site Reliability Engineer II - Reliability Engineering & Operations

CME Group

Bengaluru, Karnataka, India (On-Site)
2 Weeks ago
good job games - Growth Manager - New Grad

good job games

İstanbul, Türkiye (On-Site)
10 Months ago
Nexon - Manager, CRM

Nexon

El Segundo, California, United States (Hybrid)
2 Months ago
Riot Games - Manager, Software Engineering (Tools)

Riot Games

Los Angeles, California, United States (On-Site)
3 Months ago
lifechruh - Digital Marketing Manager

lifechruh

Edmond, Oklahoma, United States (On-Site)
2 Months ago
ARHS - DevOps - AWS Cloud Engineer

ARHS

Brussels, Brussels, Belgium (On-Site)
1 Week ago
C3 IoT - AI Solution Architect / Senior AI Solution Architect (Post-Sales)

C3 IoT

New York, New York, United States (On-Site)
3 Weeks ago
Intel  - Sr. Infrastructure Engineer - Windows OS

Intel

Hillsboro, Oregon, United States (On-Site)
2 Months ago
extreme network - Solutions Architect - Remote Eastcoast

extreme network

North Carolina, United States (Remote)
1 Month ago
Nasdaq - Senior DevOps Engineer (AWS, Terraform, Kubernetes)

Nasdaq

Vilnius, Vilnius County, Lithuania (Hybrid)
1 Week ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Philips - Firmware Engineer

Philips

Pune, Maharashtra, India (On-Site)
1 Month ago
Alpha Sense - Manager, Customer Operations

Alpha Sense

Chicago, Illinois, United States (On-Site)
2 Months ago
Expedia - Senior Software Development Engineer

Expedia

Seattle, Washington, United States (On-Site)
1 Year ago
Salesforce - Account Solution Engineer - Mulesoft

Salesforce

Oslo, Oslo, Norway (Hybrid)
1 Week ago
bytedance - Software Engineer, Architecture and Infrastructure

bytedance

San Jose, California, United States (On-Site)
9 Months ago
Neolytix - Senior Associate - Organic SEO Specialist

Neolytix

Gurugram, Haryana, India (Remote)
1 Month ago
Techland - COO Personal Assistant

Techland

Wrocław, Lower Silesian Voivodeship, Poland (On-Site)
5 Months ago
Power Integrations - Technician, Supervisor Prototype Manufacturing

Power Integrations

Biel/Bienne, Canton Of Bern, Switzerland (On-Site)
6 Months ago
Shield AI - Sr Manager of Deployed Operations (R3698)

Shield AI

Dallas, Texas, United States (On-Site)
1 Week ago
Lilt - Voice Talent Required - Hungarian

Lilt

Hungary (Remote)
1 Week ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Next Level Business Services - Information Management Architect (Full Time)

Next Level Business Services

Milford, Ohio, United States (On-Site)
9 Months ago
Axon - Threat Intelligence Analyst

Axon

Scottsdale, Arizona, United States (On-Site)
1 Month ago
Palo Alto Networks - Sr. Revenue Analyst

Palo Alto Networks

Santa Clara, California, United States (On-Site)
1 Month ago
Zscaler - Senior Sales Engineer

Zscaler

Houston, Texas, United States (Remote)
1 Month ago
Philips - R&D Transducers Leader

Philips

Reedsville, Pennsylvania, United States (On-Site)
1 Month ago
Side - Games Producer

Side

United States (Remote)
2 Months ago
Shield AI - Software Engineering Manager, Test

Shield AI

San Diego, California, United States (Hybrid)
1 Week ago
DMG - Staff Engineer

DMG

Cincinnati, Ohio, United States (On-Site)
2 Months ago
Univision - Activations Technician-Seasonal

Univision

Houston, Texas, United States (On-Site)
2 Weeks ago
quience - Americas Sourcing Manager - Furniture

quience

United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Devops Jobs

kaizen gaming  - Senior Site Reliability | DevOps Engineer

kaizen gaming

Athens, Greece (Hybrid)
1 Month ago
Nagarro - Associate Principal Engineer, Cloud

Nagarro

Hyderabad, Telangana, India (On-Site)
9 Months ago
DevRev - Solutions Engineer

DevRev

Mumbai, Maharashtra, India (On-Site)
3 Months ago
Fearless - Software Engineer II (Cloud Solution Architect) Navy NIWC

Fearless

Charleston, South Carolina, United States (On-Site)
1 Week ago
Gravity CO  - Cloud System Engineer Recruitment

Gravity CO

Seoul, South Korea (On-Site)
1 Month ago
luxsoft - Solution Architect

luxsoft

Ukrainka, Kyiv Oblast, Ukraine (Remote)
1 Month ago
Thales - GCP Cloud Architect

Thales

Vélizy-Villacoublay, Île-de-France, France (Hybrid)
2 Months ago
bytedance - Global Head of Solution Architect, SealSuite

bytedance

Singapore (On-Site)
6 Months ago
Argus - Senior Software Engineer (Infrastructure/Backend)

Argus

Indonesia (Remote)
4 Months ago
Rush street interactive  - Senior Full-Stack Automation Engineer

Rush street interactive

Estonia (Hybrid)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Taipei City, Taiwan (On-Site)

Beijing, Beijing, China (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Bengaluru, Karnataka, India (Hybrid)

Yokne'am Illit, North District, Israel (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Dubai, Dubai, United Arab Emirates (On-Site)

Beijing, Beijing, China (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug