Outscal Logooutscal logo

Senior Site Reliability Engineer

20 Hours ago • 10 Years + • DevOps • $168,000 PA - $322,000 PA

Job Summary

Job Description

NVIDIA seeks a Senior Site Reliability Engineer to guarantee the smooth operation of their cutting-edge technologies. Responsibilities include owning solution implementation, collaborating with cross-functional teams, automating provisioning and management, improving service resiliency, detecting and resolving performance issues, conducting capacity planning, participating in incident reviews, and delivering SRE solutions in a multi-cloud environment (AWS, GCP, On-prem). The role demands ensuring high uptime and QoS for internal customers and participation in on-call rotations.
Must have:
  • 10+ years experience in building and supporting critical services
  • Kubernetes administration, CI/CD, IaC proficiency
  • Linux OS and TCP/IP expertise
  • Experience with at least one major cloud provider (AWS, GCP, Azure)
  • 5+ years coding/scripting (Python, Go, Ruby, or Groovy)
  • Excellent debugging and communication skills
Good to have:
  • Linux certification
  • Large-scale Kubernetes deployment experience
  • Modern container networking and storage architecture skills
  • Cloud certifications
  • Slurm/LSF environment experience
Perks:
  • Equity
  • Benefits

Job Details

Join our team in Santa Clara, CA, USA as a Senior Site Reliability Engineer. At NVIDIA, you'll be part of the team shaping the future of computing and guaranteeing the smooth operation of our brand-new technologies. Our mission is to leverage AI's power to build outstanding and pioneering solutions that have a significant impact on the world.

What you'll be doing:

  • Own the solutions you build, collaborating with cross-functional teams to successfully implement them.

  • Collaborate with various teams in a fast-paced environment to ensure seamless project completion.

  • Continuously improve solution provisioning and management through automation.

  • Identify areas to improve service resiliency using industry-standard practices.

  • Detect performance issues and recommend solutions to maintain world-class service quality.

  • Conduct capacity management and planning to meet ongoing operational needs.

  • Participate in incident reviews, assist in root cause identification, and write RCA reports.

  • Deliver SRE solutions in a globally distributed, multi-cloud hybrid environment - AWS, GCP, and On-prem.

  • Ensure the highest level of uptime and Quality of Service (QoS) for internal customers through operational excellence.

  • Participate in the team's on-call rotation.

What we need to see:

  • B.S. degree in Computer Science or related technical field (or equivalent experience) with over 10 years in building and supporting critical services.

  • Proficiency in Kubernetes administration, modern CI/CD techniques and Infrastructure as Code (IaC).

  • Deep understanding of Linux operating systems and TCP/IP fundamentals.

  • Expertise with at least one major cloud service provider - AWS, GCP, Azure.

  • Demonstrated proficiency with end-to-end SRE capabilities and observability.

  • Proficient in monitoring, metrics gathering, APM, container management, and log collection tools.

  • 5+ years of coding/scripting experience in at least two high-level programming languages such as Python, Go, Ruby, or Groovy.

  • Creative problem solver with excellent debugging skills and great communication and documentation abilities.

Ways to stand out from the crowd:

  • Linux certification from a well-known vendor - RedHat, Oracle, etc.

  • Prior experience managing large-scale Kubernetes deployment in production.

  • Strong skills in modern container networking and storage architecture.

  • Well-known Cloud Certification(s).

  • Hands-on experience working with Slurm/LSF environments.

The base salary range is 168,000 USD - 322,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Match Group - 機械学習エンジニア(Machine Learning Engineer)

Match Group

Tokyo, Japan (Hybrid)
5 Months ago
ByteDance - Software Engineer

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago
Life church - APIs Staff Engineer

Life church

Edmond, Oklahoma, United States (On-Site)
5 Months ago
seeking alpha - Senior Back-End Developer

seeking alpha

Israel (Remote)
3 Months ago
seeking alpha - Expert Ruby on Rails Developer

seeking alpha

Poland (Remote)
2 Weeks ago
NICE - Senior Cloud SRE

NICE

Pune, Maharashtra, India (Hybrid)
5 Months ago
SmileGate - Head of IT Infrastructure/Service Operations

SmileGate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
1 Month ago
ION - Senior DevSecOps Engineer, Italy

ION

Pisa, Tuscany, Italy (On-Site)
5 Months ago
Rackspace Technology - Database Reliability Engineer

Rackspace Technology

(Remote)
2 Weeks ago
Nielsen Holdings - Sr. Data Engineer - (Big Data, Spark, Scala, Python, AWS, RDBMS, SQL) (copy)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
5 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Nintendo - Contract - DevOps Engineer

Nintendo

Redmond, Washington, United States (On-Site)
2 Months ago
seeking alpha - Expert Ruby on Rails Developer

seeking alpha

Poland (Remote)
2 Weeks ago
Life church - APIs Staff Engineer

Life church

Edmond, Oklahoma, United States (On-Site)
5 Months ago
PwC - AWS Data Engineer|Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Onward Search - User Interface Software Engineer

Onward Search

New York, New York, United States (Remote)
1 Month ago
GoMotive - Software Engineer, Backend

GoMotive

India (Remote)
1 Week ago
GoTo Group - Lead Software Engineer (IC)

GoTo Group

Jakarta, Jakarta, Indonesia (On-Site)
5 Months ago
Velotio Technologies - Lead DevOps Engineer

Velotio Technologies

Maharashtra, India (Remote)
1 Week ago
Fluxon - Staff Software Engineer

Fluxon

Bengaluru, Karnataka, India (Remote)
5 Months ago
Fluxon - Senior Software Engineer

Fluxon

Hyderabad, Telangana, India (Remote)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Westford, Massachusetts, United States

GoMotive - Account Executive, Enterprise - Mid-Atlantic

GoMotive

United States (Remote)
1 Week ago
Meta - ASIC Engineer, Design

Meta

Sunnyvale, California, United States (On-Site)
4 Months ago
ZeniMax Media - Senior Animator (Faces)

ZeniMax Media

Rockville, Maryland, United States (On-Site)
6 Months ago
The Walt Disney Company - Project Coordinator

The Walt Disney Company

Lake Buena Vista, Florida, United States (On-Site)
1 Day ago
The Walt Disney Company - Senior Product Manager II, Content Artwork

The Walt Disney Company

New York, New York, United States (On-Site)
2 Days ago
Trend Micro - Staff HW Engineer

Trend Micro

Austin, Texas, United States (On-Site)
5 Months ago
NVIDIA - Senior Product Marketing Manager, GPUs

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
Bonfire Studios - Lead Environment Artist

Bonfire Studios

California, United States (Hybrid)
1 Month ago
Entrata - Regional Vice President of Sales | IC Role | Texas Region

Entrata

United States (Remote)
5 Months ago
Hawk Eye Innovations - Systems Technician

Hawk Eye Innovations

Rosemont, Illinois, United States (On-Site)
17 Hours ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

SmileGate - Platform Engineering Lead

SmileGate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
1 Month ago
Animoca Brands - Senior DevOps Engineer

Animoca Brands

Hong Kong (On-Site)
6 Months ago
Sony Interactive Entertainment - Server-Side Engineer (PlayStation™Network Server Application Development)

Sony Interactive Entertainment

Tokyo, Japan (On-Site)
2 Months ago
GoTo Group - Site Reliability Engineer - EP (SE4)

GoTo Group

Bengaluru, Karnataka, India (On-Site)
5 Months ago
ION - Senior Technical Consultant – IT2

ION

Central Sulawesi, Indonesia (On-Site)
5 Months ago
Playtech - Dev Ops Engineer

Playtech

London, England, United Kingdom (On-Site)
3 Months ago
NVIDIA - Senior DevOps Engineer, Deep Learning Frameworks

NVIDIA

Warsaw, Masovian Voivodeship, Poland (On-Site)
2 Months ago
Crunchyroll - Staff Site Reliability Engineer - Data Engineering, Platform

Crunchyroll

San Francisco, California, United States (Remote)
4 Months ago
Nagarro - Principal Engineer -- PHP Developer

Nagarro

New Jersey, United States (Remote)
5 Months ago
SmileGate - Server Engineer

SmileGate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Westford, Massachusetts, United States (Hybrid)

Massachusetts, United States (On-Site)

Seattle, Washington, United States (On-Site)

Canada (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Durham, North Carolina, United States (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug