Senior System Reliability Engineer

5 Months ago • 6-8 Years • $140,000 PA - $264,500 PA
Devops

Job Description

NVIDIA seeks a Senior System Reliability Engineer to contribute to the reliability of their GPU servers and high-performance computing systems. Responsibilities include establishing and maintaining product reliability standards, participating in design reviews, working with suppliers and partners, defining reliability plans, performing testing and failure analysis, and correlating test results with field performance. This role requires expertise in hardware reliability engineering for electronics and server systems, including graphics cards, servers, racks, and clusters, encompassing the entire product lifecycle. The ideal candidate will have extensive experience with PCIE peripherals, graphics cards, and servers, strong statistical analysis skills, and excellent communication abilities.
Good To Have:
  • MS or PhD in relevant field
Must Have:
  • Hardware Reliability Engineering Expertise
  • Experience with PCIE peripherals, graphics cards, servers
  • Strong statistical analysis skills
  • Excellent communication skills
  • Design for Reliability (DfR) methods
  • Failure analysis and recommendations
Perks:
  • Competitive salary
  • Generous benefits package

Add these skills to join the top 1% applicants for this job

team-management
communication
budget-management
unity
game-texts
css

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing — with the GPU acting as the brains of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and build our teams with the most thoughtful people in the world. Join us at the forefront of technological advancement. GPU Servers are one of the fastest-growing segments for NVIDIA and the Artificial Intelligence industry. As the computational power increases with every GPU generation, developing efficient and reliable systems is an imperative. We are looking for a System Reliability Engineer to join NVIDIA's existing Reliability Engineering team, involved in NVIDIA's diverse system product range specifically Graphics and High-Performance Computing printed circuit boards and Data Center Servers.


What you'll be doing:

  • Provide expertise in Hardware Reliability Engineering for Electronics/Server Systems (graphics cards, server, rack, cluster) from Concept to End-of-Life phase.

  • Establish, deliver and maintain product reliability standards and metrics for NVIDIA's new system technologies, using existing tools and processes or developing new as required.

  • Participate in product and engineering design reviews, assess the reliability budget of products/designs, and inspire changes that enhance product reliability.

  • Interface and interact with all pertinent engineering groups, suppliers, and partners ensuring the desired reliability is achieved using Design for Reliability (DfR) methods including FMEA and DoE approaches.

  • Define and implement Reliability Plans & Specifications.

  • Provide reliability predictions, along with test plans and methods to access and drive product reliability to the desired levels.

  • Perform and lead appropriate testing with associated failure analysis and recommendations for improving designs and manufacturing.

  • Develop and present methods of correlating reliability test results with actual field performance.


What we need to see:

  • BS (or equivalent experience) in Engineering, Material Science, Physics, or a related field, MS or PhD preferred.

  • 6+ years in a hardware validation/reliability environment related to PCIE peripherals, graphics cards and servers.

  • Understand power supply, memory, high speed I/O, PCI express, Ethernet and I2C.

  • Hands-on experience in theoretical and practical Reliability concepts as it relates to high-tech electronic enterprise and consumer products.

  • Have a strong command and understanding of statistical concepts/models/analysis and how they relate to product reliability & life analysis.

  • Good verbal and writing skills as well as the ability to communicate at a high level.

  • Self-motivating, independent, and committed to getting things done.

  • Good project management skills and ability to balance multiple simultaneous projects during development and production stages.

With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you. Come build the future with us!

The base salary range is 140,000 USD - 264,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Set alerts for more jobs like Senior System Reliability Engineer
Set alerts for new jobs by NVIDIA
Set alerts for new Devops jobs in United States
Set alerts for new jobs in United States
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙