Senior Software Engineer - HPC

3 Months ago • 10 Years + • DevOps • $184,000 PA - $356,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior Software Engineer for its HPC infrastructure team. Responsibilities include designing highly available and scalable systems, evaluating new technologies, improving infrastructure provisioning and management using automation, supporting a multi-cloud environment (AWS, GCP, on-prem), collaborating with cross-functional teams, ensuring high uptime and QoS, and participating in on-call rotations. The ideal candidate has 10+ years of experience in large engineering projects, proficiency in at least two programming languages (Golang, Java, C/C++, Scala, Python, Elixir), cloud computing expertise, and strong CI/CD skills.
Must have:
  • 10+ years experience in large engineering projects
  • Proficiency in at least two programming languages
  • Cloud computing expertise (GCP, AWS, Azure)
  • Strong CI/CD, GitOps, and IaC skills
  • Design highly available and scalable systems
  • Experience with HPC clusters (Slurm or Kubernetes)
Good to have:
  • Strong understanding of Linux and TCP/IP
Perks:
  • Equity
  • Benefits

Job Details

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to address, that matters to the world, and that only we can address. This is our life’s work, to amplify human imagination and intelligence, and expand what is possible. We’re seeking strategic, bold, hard-working, and creative individuals who are passionate about helping us tackle challenges no one else can solve. Make the choice to join us today.
 

We are looking for a Senior Software Engineer to join our mission to continue improving our HPC infrastructure. Our team builds and operates sophisticated infrastructure to enable business critical services and AI applications. You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure. Ideal candidate is strong in software development, designing and creating reliable distributed systems, and has the ability to implement well thought out long term maintenance strategy.


What you’ll be doing:

  • Design highly available and scalable systems to meet the demands of our HPC clusters

  • Evaluate new and innovative technologies as the landscape evolves

  • Continuously improve infrastructure provisioning and management using automation

  • Support a globally distributed, multi-cloud hybrid environment - AWS, GCP and On-prem

  • Build strong cross functional relationships and align with partners across various business units

  • Ensure the highest level of up-time and Quality of Service (QoS) to our users through operational excellence

  • Participate in team's on-call rotation and be a contact for service incidents


What we need to see:

  • 10+ years of experience in design, implementation, and delivery of large engineering projects

  • Comfortable with at least two of the following programming languages: Golang, Java, C/C++, Scala, Python, Elixir.

  • Understands scalability challenges and performance of server-side code. Able to craft and develop horizontally-scalable, resilient and performing-under-load systems.

  • Versatile technologist with experience in full software development lifecycle – from inception and design to deployment, operation, and iterative development.

  • Proficient in cloud computing and are hands-on in at least one cloud platform: GCP, AWS, or Azure.

  • Proficient in modern CI/CD techniques, GitOps and Infrastructure as Code(IaC)

  • Strong work ethic and a passion for problem solving

  • B.S. degree in Computer Science or related technical field (or equivalent experience)

  • Detail oriented with great communication and collaboration skills


Ways to stand out from the crowd:

  • Prior experience building solutions for HPC clusters based on Slurm or Kubernetes

  • Strong understanding of Linux operation system and TCP/IP fundamentals

The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Sporty Group - Software Engineering Team Lead - EU

Sporty Group

(Remote)
4 Weeks ago
Enphase Energy - Sr. Staff Engineer Cloud

Enphase Energy

Bengaluru, Karnataka, India (On-Site)
4 Months ago
GoTo Group - Software Engineer (Backend) - Consumer Payments

GoTo Group

Jakarta, Jakarta, Indonesia (On-Site)
6 Months ago
Nielsen Holdings - Technical Manager (Big Data, Java/Scala, AWS, Spark on Kubernetes)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Push Gaming - Game Mathematician

Push Gaming

(Hybrid)
1 Month ago
Rackspace Technology - Cloud Architect

Rackspace Technology

India (Remote)
1 Month ago
Nagarro - Principal Engineer, QA Automation

Nagarro

India (Remote)
6 Months ago
Interactive Brokers - Senior Systems Engineer- Microsoft M365/Active Directory

Interactive Brokers

Fort Lauderdale, Florida, United States (Hybrid)
6 Months ago
Rackspace Technology - Software Engineer IV

Rackspace Technology

India (Remote)
1 Month ago
Normalyze - Lead DevOps Engineer - Enterprise Cybersecurity - SaaS - Bay Area, CA

Normalyze

California, United States (Remote)
6 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ION - Lead Software Engineer, Italy

ION

Collecchio, Emilia-Romagna, Italy (On-Site)
6 Months ago
ByteDance - Senior Software Engineer

ByteDance

San Jose, California, United States (On-Site)
1 Month ago
CData Software - Platform Engineer

CData Software

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Fanatee - Data Intern

Fanatee

Spain (Hybrid)
1 Month ago
Dream Games - Game Developer

Dream Games

İstanbul, Türkiye (On-Site)
11 Months ago
Moloco - Staff Software Engineer

Moloco

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Supercell - Senior Server Engineer

Supercell

Helsinki, Uusimaa, Finland (On-Site)
6 Months ago
ION - Senior Software Engineer, Italy

ION

Turin, Piedmont, Italy (On-Site)
6 Months ago
NVIDIA - Product Validation Tools Software Engineer

NVIDIA

Shanghai, Shanghai, China (On-Site)
2 Months ago
Next Level Business Services - Salesforce Technical Lead

Next Level Business Services

San Jose, California, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Anthology  Inc  - Global Support Specialist

Anthology Inc

United States (Remote)
1 Month ago
Nagarro - Associate Director, Delivery

Nagarro

New York, New York, United States (On-Site)
6 Months ago
Scope AR - ABX Marketing Manager

Scope AR

San Francisco, California, United States (Remote)
4 Months ago
Nintendo - Advertising Manager (Spanish Bilingual)

Nintendo

Redmond, Washington, United States (Hybrid)
3 Months ago
DraftKings - FP&A Associate

DraftKings

Boston, Massachusetts, United States (On-Site)
1 Month ago
Linden Lab - Senior Product Manager

Linden Lab

San Francisco, California, United States (On-Site)
5 Months ago
Aristocrat Gaming - Licensing Officer, Gaming

Aristocrat Gaming

Las Vegas, Nevada, United States (Hybrid)
1 Month ago
Meta - Research Scientist Intern, Language and Multimodal Research for MetaAI (PhD)

Meta

Seattle, Washington, United States (On-Site)
5 Months ago
Netflix - Data Engineer (L5) - Product (Device)

Netflix

United States (Remote)
6 Months ago
Nagarro - Associate Staff Consultant, Business Analyst

Nagarro

North Wales, Pennsylvania, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Thatgamecompany - ML Engineer

Thatgamecompany

United States (Remote)
1 Month ago
Info Stretch - Java Developer

Info Stretch

Sunderland, England, United Kingdom (On-Site)
6 Months ago
Omnissa - Senior Member of Technical Staff (C++ Windows)

Omnissa

Chennai, Tamil Nadu, India (On-Site)
6 Months ago
Epic Games - Senior DevOps Programmer

Epic Games

London, England, United Kingdom (On-Site)
2 Months ago
VGW - Staff Site Reliability Engineer

VGW

Krakow Am See, Mecklenburg-Vorpommern, Germany (On-Site)
2 Months ago
Google - Staff Software Engineer, Site Reliability Engineering, Google Cloud

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
4 Months ago
Unisys - AVD Support Senior Engineer

Unisys

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Revolgy - L2 Cloud Operations Engineer

Revolgy

Georgia, United States (Remote)
1 Month ago
Luxoft - Solutions Architect

Luxoft

Gurugram, Haryana, India (On-Site)
4 Months ago
EXUSIA - Google Cloud Platform - Data Architect / Engineer

EXUSIA

United States (Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug