Senior Software Engineer - HPC

1 Month ago • 10 Years + • DevOps • $184,000 PA - $356,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior Software Engineer for its HPC infrastructure team. Responsibilities include designing highly available and scalable systems, evaluating new technologies, improving infrastructure provisioning and management using automation, supporting a multi-cloud environment (AWS, GCP, on-prem), collaborating with cross-functional teams, ensuring high uptime and QoS, and participating in on-call rotations. The ideal candidate has 10+ years of experience in large engineering projects, proficiency in at least two programming languages (Golang, Java, C/C++, Scala, Python, Elixir), cloud computing expertise, and strong CI/CD skills.
Must have:
  • 10+ years experience in large engineering projects
  • Proficiency in at least two programming languages
  • Cloud computing expertise (GCP, AWS, Azure)
  • Strong CI/CD, GitOps, and IaC skills
  • Design highly available and scalable systems
  • Experience with HPC clusters (Slurm or Kubernetes)
Good to have:
  • Strong understanding of Linux and TCP/IP
Perks:
  • Equity
  • Benefits

Job Details

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to address, that matters to the world, and that only we can address. This is our life’s work, to amplify human imagination and intelligence, and expand what is possible. We’re seeking strategic, bold, hard-working, and creative individuals who are passionate about helping us tackle challenges no one else can solve. Make the choice to join us today.
 

We are looking for a Senior Software Engineer to join our mission to continue improving our HPC infrastructure. Our team builds and operates sophisticated infrastructure to enable business critical services and AI applications. You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure. Ideal candidate is strong in software development, designing and creating reliable distributed systems, and has the ability to implement well thought out long term maintenance strategy.


What you’ll be doing:

  • Design highly available and scalable systems to meet the demands of our HPC clusters

  • Evaluate new and innovative technologies as the landscape evolves

  • Continuously improve infrastructure provisioning and management using automation

  • Support a globally distributed, multi-cloud hybrid environment - AWS, GCP and On-prem

  • Build strong cross functional relationships and align with partners across various business units

  • Ensure the highest level of up-time and Quality of Service (QoS) to our users through operational excellence

  • Participate in team's on-call rotation and be a contact for service incidents


What we need to see:

  • 10+ years of experience in design, implementation, and delivery of large engineering projects

  • Comfortable with at least two of the following programming languages: Golang, Java, C/C++, Scala, Python, Elixir.

  • Understands scalability challenges and performance of server-side code. Able to craft and develop horizontally-scalable, resilient and performing-under-load systems.

  • Versatile technologist with experience in full software development lifecycle – from inception and design to deployment, operation, and iterative development.

  • Proficient in cloud computing and are hands-on in at least one cloud platform: GCP, AWS, or Azure.

  • Proficient in modern CI/CD techniques, GitOps and Infrastructure as Code(IaC)

  • Strong work ethic and a passion for problem solving

  • B.S. degree in Computer Science or related technical field (or equivalent experience)

  • Detail oriented with great communication and collaboration skills


Ways to stand out from the crowd:

  • Prior experience building solutions for HPC clusters based on Slurm or Kubernetes

  • Strong understanding of Linux operation system and TCP/IP fundamentals

The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Crunchyroll - Senior Engineering Manager, Payments

Crunchyroll

San Francisco, California, United States (Hybrid)
3 Months ago
STAGE - Analytics Engineer

STAGE

Noida, Uttar Pradesh, India (On-Site)
6 Months ago
Playtech - Development QA Engineer (Intern)

Playtech

Tallinn, Harju County, Estonia (On-Site)
6 Days ago
ByteDance - Mobile App Engineering Intern (Product RD and Infrastructure - Global E- Commerce) - 2025 Summer (BS/MS)

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
Meta - Research Scientist Intern, Language and Multimodal Research for GenAI (PhD)

Meta

New York, New York, United States (On-Site)
3 Months ago
Visa - Staff Data Engineer

Visa

Warsaw, Masovian Voivodeship, Poland (Hybrid)
3 Months ago
Nagarro - Associate Principal Engineer, QA Automation

Nagarro

New York, New York, United States (On-Site)
4 Months ago
Anthology  Inc  - Platform Engineer II

Anthology Inc

Bogotá, Bogota, Colombia (Remote)
2 Months ago
ION - Cloud Engineer Kubernetes

ION

Collecchio, Emilia-Romagna, Italy (Hybrid)
4 Months ago
Ubisoft - DevOps Linux Administrator

Ubisoft

Saint-Mandé, Île-de-France, France (Hybrid)
5 Days ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Arkose Labs - Senior Machine Learning Researcher

Arkose Labs

Pune, Maharashtra, India (Hybrid)
4 Months ago
The Walt Disney Company - Lead Software Engineer

The Walt Disney Company

California, United States (On-Site)
3 Weeks ago
version 1 - Senior Oracle E-Business Technical Consultant

version 1

London, England, United Kingdom (Hybrid)
2 Months ago
Nagarro - Senior Engineer, Hybris

Nagarro

Sri Lanka (Remote)
4 Months ago
Nagarro - Java Developer

Nagarro

Cairo, Cairo Governorate, Egypt (On-Site)
4 Months ago
Trend Micro - Backend Engineer

Trend Micro

Manila, Metro Manila, Philippines (On-Site)
15 Years ago
LeoVegas - Fullstack Software Engineer

LeoVegas

Gżira, Malta (Hybrid)
1 Month ago
The Walt Disney Company - Manager, Software Technology

The Walt Disney Company

Glendale, California, United States (On-Site)
1 Hour ago
Nielsen Holdings - Senior Software Engineer (Java/Scala, Spark, Kubernetes, AWS)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
Appier - Software Engineer, Backend Development

Appier

Taipei City, Taiwan (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Axon - Manager, Site Reliability Engineering (Observability)

Axon

Seattle, Washington, United States (Remote)
6 Days ago
Click Therapeutics - Senior IT Systems Administrator

Click Therapeutics

New York, New York, United States (On-Site)
2 Months ago
undefined - Software Engineer, Edge

Canada, Kentucky, United States (Remote)
4 Months ago
Nagarro - Associate Staff Engineer, Mainframe

Nagarro

Atlanta, Georgia, United States (On-Site)
4 Months ago
Sphere Entertainment Co - Senior Accountant

Sphere Entertainment Co

New York, New York, United States (On-Site)
3 Months ago
AVER LLC - Oracle Exadata Administrator/DBA

AVER LLC

United States (Remote)
1 Month ago
Microsoft - Research Intern - Post-Transformer

Microsoft

Redmond, Washington, United States (On-Site)
1 Month ago
Varonis  - HR Program Manager

Varonis

New York, New York, United States (On-Site)
3 Months ago
prizepicks - Game Operations Associate - Trading

prizepicks

Atlanta, Georgia, United States (Remote)
1 Month ago
Microsoft - Research Intern - AI Frontiers - Foundation Model Evaluation and Understanding

Microsoft

Redmond, Washington, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Integral Ad Science - Senior Site Reliability Engineer

Integral Ad Science

Pune, Maharashtra, India (Hybrid)
4 Months ago
Granicus - Sr. DevOps Engineer

Granicus

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
ByteDance - Site Reliability Engineer - Security Engineering - San Jose

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
NVIDIA - Senior Site Reliability Engineer - AI Research Clusters

NVIDIA

Hyderabad, Telangana, India (Hybrid)
1 Month ago
Larian Studios - Senior Automation Engineer

Larian Studios

Guildford, England, United Kingdom (On-Site)
2 Days ago
SmileGate - System Engineer (Private Cloud)

SmileGate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
1 Month ago
DEVOTEAM - Distributed Cloud | Senior AWS Cloud Engineer

DEVOTEAM

Lisbon, Lisbon, Portugal (Remote)
4 Months ago
Keywords Studios (Player Support) - Architecte de solutions

Keywords Studios (Player Support)

Montreal, Quebec, Canada (Remote)
3 Months ago
Ajmera Infotech - Kubernetes Experts

Ajmera Infotech

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Easygo - Senior DevOps Engineer

Easygo

Belgrade, Serbia (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Shenzhen, Guangdong Province, China (On-Site)

Bengaluru, Karnataka, India (On-Site)

Taipei City, Taiwan (On-Site)

Taipei City, Taiwan (On-Site)

Shanghai, Shanghai, China (On-Site)

Shanghai, Shanghai, China (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug