Senior Software Engineer - HPC

2 Months ago • 10 Years + • DevOps • $184,000 PA - $356,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior Software Engineer for its HPC infrastructure team. Responsibilities include designing highly available and scalable systems, evaluating new technologies, improving infrastructure provisioning and management using automation, supporting a multi-cloud environment (AWS, GCP, on-prem), collaborating with cross-functional teams, ensuring high uptime and QoS, and participating in on-call rotations. The ideal candidate has 10+ years of experience in large engineering projects, proficiency in at least two programming languages (Golang, Java, C/C++, Scala, Python, Elixir), cloud computing expertise, and strong CI/CD skills.
Must have:
  • 10+ years experience in large engineering projects
  • Proficiency in at least two programming languages
  • Cloud computing expertise (GCP, AWS, Azure)
  • Strong CI/CD, GitOps, and IaC skills
  • Design highly available and scalable systems
  • Experience with HPC clusters (Slurm or Kubernetes)
Good to have:
  • Strong understanding of Linux and TCP/IP
Perks:
  • Equity
  • Benefits

Job Details

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to address, that matters to the world, and that only we can address. This is our life’s work, to amplify human imagination and intelligence, and expand what is possible. We’re seeking strategic, bold, hard-working, and creative individuals who are passionate about helping us tackle challenges no one else can solve. Make the choice to join us today.
 

We are looking for a Senior Software Engineer to join our mission to continue improving our HPC infrastructure. Our team builds and operates sophisticated infrastructure to enable business critical services and AI applications. You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure. Ideal candidate is strong in software development, designing and creating reliable distributed systems, and has the ability to implement well thought out long term maintenance strategy.


What you’ll be doing:

  • Design highly available and scalable systems to meet the demands of our HPC clusters

  • Evaluate new and innovative technologies as the landscape evolves

  • Continuously improve infrastructure provisioning and management using automation

  • Support a globally distributed, multi-cloud hybrid environment - AWS, GCP and On-prem

  • Build strong cross functional relationships and align with partners across various business units

  • Ensure the highest level of up-time and Quality of Service (QoS) to our users through operational excellence

  • Participate in team's on-call rotation and be a contact for service incidents


What we need to see:

  • 10+ years of experience in design, implementation, and delivery of large engineering projects

  • Comfortable with at least two of the following programming languages: Golang, Java, C/C++, Scala, Python, Elixir.

  • Understands scalability challenges and performance of server-side code. Able to craft and develop horizontally-scalable, resilient and performing-under-load systems.

  • Versatile technologist with experience in full software development lifecycle – from inception and design to deployment, operation, and iterative development.

  • Proficient in cloud computing and are hands-on in at least one cloud platform: GCP, AWS, or Azure.

  • Proficient in modern CI/CD techniques, GitOps and Infrastructure as Code(IaC)

  • Strong work ethic and a passion for problem solving

  • B.S. degree in Computer Science or related technical field (or equivalent experience)

  • Detail oriented with great communication and collaboration skills


Ways to stand out from the crowd:

  • Prior experience building solutions for HPC clusters based on Slurm or Kubernetes

  • Strong understanding of Linux operation system and TCP/IP fundamentals

The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Red Point Labs - Java Backend Developer (Remote OK)

Red Point Labs

Argentina (Remote)
10 Months ago
The Walt Disney Company - Principal Machine Learning Engineer

The Walt Disney Company

Santa Monica, California, United States (On-Site)
1 Month ago
Zeta - Software Development Engineer in Test I / II

Zeta

Hyderabad, Telangana, India (On-Site)
5 Months ago
Just Play GmbH - Backend Engineer

Just Play GmbH

Berlin, Berlin, Germany (Hybrid)
5 Days ago
Next Level Business Services - Java Developer (Full Time)

Next Level Business Services

Littleton, Colorado, United States (On-Site)
5 Months ago
PwC - Power BI Developer| Senior Associate [tag01]

PwC

Barueri, São Paulo, Brazil (On-Site)
3 Months ago
ByteDance - Site Reliability Engineer (Cloud Native Platform) - Traffic Infrastructure

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
ByteDance - Backend Software Engineer - Foundational Technology

ByteDance

Singapore (On-Site)
1 Week ago
Netflix - Distributed Systems Engineer (L5) - Compute Abstractions

Netflix

United States (Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Nielsen Holdings - Senior Software Engineer - Bigdata ( Java/Scala , Spark, SQL , AWS)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Blind Squirrel Games - Sr. Generalist Engineer, Austin

Blind Squirrel Games

Austin, Texas, United States (Hybrid)
1 Week ago
Everyday Health Group - Principal Software Engineer - Android

Everyday Health Group

Massachusetts, United States (Remote)
2 Months ago
PwC - Senior Associate

PwC

Bhopal, Madhya Pradesh, India (On-Site)
6 Months ago
Next Level Business Services - Azure Services developer

Next Level Business Services

Redmond, Washington, United States (On-Site)
5 Months ago
Limit Break - Lead Engineer (Unity) (Japan)

Limit Break

Tokyo, Japan (On-Site)
8 Months ago
C1X  Inc  - Senior QA Engineer

C1X Inc

Chennai, Tamil Nadu, India (On-Site)
5 Months ago
Thatgamecompany - Product Data Scientist

Thatgamecompany

United States (Remote)
6 Days ago
Logitech - Sr. System Engineer (Atlassian Platforms)

Logitech

Cork, County Cork, Ireland (Hybrid)
5 Months ago
Rackspace Technology - Oracle Business Systems Analyst II

Rackspace Technology

Gurugram, Haryana, India (Remote)
6 Days ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Playtech - Live Dealer - Swing Shift

Playtech

Atlantic City, New Jersey, United States (On-Site)
5 Days ago
ByteDance - Software Engineer Intern (Recommendation Infrastructure)

ByteDance

San Jose, California, United States (On-Site)
6 Days ago
ByteDance - Software Engineer, SRE - Platform Services

ByteDance

San Jose, California, United States (On-Site)
1 Month ago
Electronic Arts - Account Specialist, Advertising & Sponsorships - American Football

Electronic Arts

New York, United States (Hybrid)
2 Weeks ago
Smarsh - Sales Development Representative I

Smarsh

New York, New York, United States (Hybrid)
5 Months ago
Hawk Eye Innovations - College Sports Systems Technician

Hawk Eye Innovations

Wisconsin, United States (On-Site)
5 Days ago
Keywords Studios (Player Support) - Regional Service Delivery Manager

Keywords Studios (Player Support)

United States (On-Site)
2 Weeks ago
holospark - Gameplay Engineer

holospark

Bellevue, Washington, United States (On-Site)
3 Months ago
Penumbra - US Field Reimbursement Manager

Penumbra

Alameda, California, United States (Hybrid)
5 Months ago
Thatgamecompany - General Art

Thatgamecompany

United States (Remote)
6 Days ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

ION - Cloud Engineer Kubernetes

ION

Italy (Hybrid)
5 Months ago
Jagex - Senior DevOps Engineer, Cloud Platform

Jagex

Cambridge, England, United Kingdom (Hybrid)
6 Days ago
NVIDIA - Senior DevOps Engineer

NVIDIA

Ra'anana, Center District, Israel (On-Site)
2 Months ago
Interactive Brokers - Senior Systems Engineer- Microsoft M365/Active Directory

Interactive Brokers

Chicago, Illinois, United States (Hybrid)
5 Months ago
Qatar Airways - DevOps Engineer

Qatar Airways

Ahmedabad, Gujarat, India (On-Site)
6 Months ago
Axon - Manager, Site Reliability Engineering (Observability)

Axon

Seattle, Washington, United States (Remote)
1 Month ago
Axinous - Principal Software Development Engineer

Axinous

(Remote)
1 Month ago
NVIDIA - Software Manager, Golang Kubernetes

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
CharacterAI - Staff Software Engineer, Site Reliability (SRE)

CharacterAI

Menlo Park, California, United States (On-Site)
6 Days ago
NVIDIA - Senior DevOps Engineer

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Ra'anana, Center District, Israel (On-Site)

Ra'anana, Center District, Israel (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug