Outscal Logooutscal logo

Senior Site Reliability Engineer - Infrastructure

1 Month ago • 5 Years + • DevOps • $148,000 PA - $287,500 PA

Job Summary

Job Description

As a Senior Site Reliability Engineer at NVIDIA, you will collaborate with various teams to enhance the infrastructure environment supporting chip development. Responsibilities include developing automation for scalable infrastructure, implementing infrastructure innovations using broad IT skills (network architecture, storage, virtualization), working closely with EDA teams to translate their requirements into infrastructure solutions, and investigating and debugging complex issues. You will contribute to improving the chip development process, enhancing overall quality, and accelerating time to market for next-generation chips. The role requires strong UNIX systems programming, automation skills (Ansible, Jenkins, Python), experience with distributed UNIX systems, and excellent communication skills.
Must have:
  • Automation workflows (Ansible, Jenkins)
  • UNIX systems programming & automation
  • Experience with architectural decisions (storage, networking, compute)
  • Understanding of distributed UNIX systems
  • Strong debugging skills in UNIX environment
  • 5+ years experience in large distributed UNIX environment
Good to have:
  • Experience with job schedulers (LSF, SLURM)
  • Perl experience
  • Deep understanding of distributed system principles
  • Chip design workflow experience
  • Experience balancing security and productivity
Perks:
  • Equity
  • Benefits

Job Details

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is a “learning machine” that constantly evolves by seeking new opportunities that are hard to solve, that only we can address, and that matter to the world. This is our life’s work, to amplify human creativity and intelligence. Make the choice to join us today!


As an SRE or with equivalent experience, you'll collaborate with various teams to improve our infrastructure environment within NVIDIA's Hardware Infrastructure team. You will enable our engineers to have the best environment on the planet to make the most innovative chips in the world. You will work with your team of EDA and software experts to build new infrastructure in an agile environment. You will continuously innovate and improve scalable, reliable, high performance systems and tools to enable the next generation of chips!


What you’ll be doing:

  • Develop automation in order to scale infrastructure easily and reliably.

  • Use broad IT infrastructure skills to implement infrastructure innovations which accelerate chip development.

  • Design and implement network architecture, storage solutions, virtualization, and services specific to EDA workflows.

  • Work closely with EDA teams to understand their requirements and translate them into infrastructure solutions.

  • Work in a diverse team performing fast paced investigations to empower engineers to develop at the speed of light.

  • Collaborate to improve how our chip development process utilizes our infrastructure.

  • Directly contribute to the overall quality and improve time to market for our next generation chips.


What we need to see:

  • Experience with automation workflows such as Ansible and Jenkins.

  • UNIX Systems programming and automation using industry standard languages and familiar with API calls. Python experience preferred.

  • Authoritative level usage of UNIX and UNIX CLI utilities such as sed, awk, grep.

  • Hands on experience with architectural decisions in technologies (storage, networking, compute) our chip engineers depend on.

  • Understanding of distributed UNIX system concepts such as NFS, autofs, DNS, LDAP and/or NIS.

  • Excellent planning and communication skills and a passion for improving the productivity and efficiency of other specialists.

  • Strong experience investigating and debugging complex, multi-discipline problems in a UNIX environment.

  • 5+ years experience in a large, distributed UNIX environment.

  • History of using data analysis principles and influencing data-driven decisions.

  • MS (preferred) or BS in Computer Science, similar degree or equivalent experience.


Ways to stand out from the crowd:

  • Extensive knowledge with job schedulers (in particular IBM Spectrum LSF and/or SLURM).

  • Experience with perl.

  • Deep understanding of distributed system principles.

  • Experience with chip design workflows, such as front end verification, back end workflows, or mixed signal workflows.

  • Experience in crafting solutions that balance security and productivity for the end user.

The base salary range is 148,000 USD - 287,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

ION - Senior C++ Developer, Italy

ION

Collecchio, Emilia-Romagna, Italy (On-Site)
5 Months ago
PlayStation Global - Senior Full Stack Software Engineer - Golang

PlayStation Global

Madison, Wisconsin, United States (On-Site)
11 Hours ago
Nagarro - Associate Principal Engineer, QA Automation

Nagarro

Spain (Remote)
5 Months ago
PlayStation Global - Senior Full Stack Software Engineer

PlayStation Global

Carlsbad, California, United States (On-Site)
2 Months ago
Logifuture - Senior AQA Engineer

Logifuture

(Remote)
6 Days ago
ByteDance - Site Reliability Engineer, Traffic Infrastructure

ByteDance

Singapore (On-Site)
4 Months ago
NVIDIA - Senior Site Reliability Engineer - GPU Clusters

NVIDIA

Westford, Massachusetts, United States (On-Site)
1 Month ago
Gaming Innovation Group  - DevOps Engineer

Gaming Innovation Group

Catalonia, Spain (Hybrid)
12 Hours ago
ByteDance - Production System Engineer, Infrastructure Engineering Intern

ByteDance

Singapore (On-Site)
1 Day ago
Crunchyroll - Staff Software Engineer

Crunchyroll

Hyderabad, Telangana, India (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ARHS - Java Jee Developer

ARHS

Luxembourg (On-Site)
5 Months ago
WebFX - Jr. Web Developer

WebFX

Ann Arbor, Michigan, United States (On-Site)
5 Months ago
Larian Studios - Technical QA Tester Internship

Larian Studios

Quebec, Canada (On-Site)
1 Month ago
Playrix - Senior Release Support Engineer

Playrix

Almaty, Almaty Region, Kazakhstan (Remote)
5 Months ago
ION - Senior Software Engineer, Italy

ION

Rome, Lazio, Italy (On-Site)
5 Months ago
ION - Senior C++ Developer, Italy

ION

Rome, Lazio, Italy (On-Site)
5 Months ago
PwC - IN_Associate_Azure Cloud Data Engineer_OneCloud _Advisory _Bangalore

PwC

Gurugram, Haryana, India (On-Site)
3 Months ago
Studio Wildcard - Senior Engine Programmer

Studio Wildcard

Bellevue, Washington, United States (Remote)
12 Hours ago
Evolution - Scala Engineer

Evolution

Lisbon, Lisbon, Portugal (On-Site)
10 Months ago
PlayStation Global - QA Senior Specialist

PlayStation Global

Los Angeles, California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Westford, Massachusetts, United States

Epoch Games - Unreal Engine C++ Programmer

Epoch Games

North Carolina, United States (Remote)
11 Hours ago
Nielsen Holdings - Principal Data Engineer

Nielsen Holdings

New York, New York, United States (Remote)
2 Months ago
The Walt Disney Company - Sales Associate (Part-Time)

The Walt Disney Company

New York, New York, United States (On-Site)
1 Day ago
Entrata - Regional Vice President, Sales-Inside Sales IC role at HQ in Lehi

Entrata

Lehi, Utah, United States (On-Site)
5 Months ago
Crunchyroll - Staff Mobile Games Integration Engineer

Crunchyroll

San Francisco, California, United States (On-Site)
2 Months ago
IGN - Senior Full Stack Software Engineer

IGN

New York, New York, United States (Hybrid)
4 Months ago
Meta - Software Engineer, Systems ML - SW/HW Co-design

Meta

Bellevue, Washington, United States (Remote)
4 Months ago
Saviynt - Software Architect - Privilege Access Management

Saviynt

United States (Remote)
5 Months ago
Attentive - Senior Software Engineer

Attentive

New York, New York, United States (Hybrid)
5 Months ago
Evolution - Online Game Presenter 3PM-11PM Shift/ Paid Training/ Full Time Benefits $20-$25/hr.

Evolution

Atlantic City, New Jersey, United States (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Playtech - DevOps Engineer

Playtech

Kyiv, Kyiv City, Ukraine (On-Site)
1 Week ago
Luxoft - Orchestrade - Azure infrastructure cloud Senior engineer

Luxoft

Poland, Ohio, United States (Remote)
4 Months ago
Alp Consulting  - Unity 3D developer

Alp Consulting

Bengaluru, Karnataka, India (Hybrid)
11 Months ago
Brillio - Azure Kubernetes Architect - R01530963

Brillio

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Matific - Senior DevOps Engineer/ Lead DevOps

Matific

São Paulo, State Of São Paulo, Brazil (On-Site)
3 Months ago
ByteDance - Security Systems Engineer, Fleet Management

ByteDance

Singapore (On-Site)
2 Months ago
N-iX - Senior DevOps Engineer

N-iX

India (Remote)
3 Weeks ago
Britive - SOFTWARE ENGINEER (CLOUD)

Britive

Bengaluru, Karnataka, India (Remote)
4 Months ago
Playnetic - Site Reliability Engineering Manager

Playnetic

(Remote)
1 Month ago
Visa - Sr. Site Reliability Engineer, Product Reliability Engineering - Middleware

Visa

Austin, Texas, United States (Hybrid)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Hsinchu, Hsinchu City, Taiwan (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Seoul, South Korea (Hybrid)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Ra'anana, Center District, Israel (On-Site)

Shanghai, Shanghai, China (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Be'er Sheva, South District, Israel (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug