Software Platform Support Engineer - GPU Cloud

2 Months ago • 2 Years + • DevOps • $76,000 PA - $172,500 PA

Job Summary

Job Description

NVIDIA's DGX Cloud team seeks passionate Software Platform Support Engineers to provide Tier 1 support for complex cloud platforms. Responsibilities include partnering with internal teams, troubleshooting issues, creating documentation (knowledge base articles, how-to guides), building support tooling, understanding user workloads, collaborating with engineering on solutions, and participating in on-call rotations. The role requires expertise in cloud deployments, Linux, Kubernetes, and data storage technologies, alongside strong troubleshooting and communication skills. Experience with SLURM, HPC, machine learning, and a customer-centric approach are highly valued.
Must have:
  • 2+ years supporting distributed software systems
  • 2+ years supporting end-user software platforms
  • Linux experience
  • Kubernetes expertise
  • Cloud platform (AWS, Azure, OCI, GCP) knowledge
  • Data storage technology understanding
  • Troubleshooting and communication skills
Good to have:
  • SLURM or HPC experience
  • Machine Learning/AI experience
  • Strong organizational skills
Perks:
  • Equity
  • Benefits

Job Details

The NVIDIA DGX Cloud organization is looking for passionate software support engineers to partner closely with our internal customers to support them on our internal platforms. This partnership requires you to gain a deep understanding of the customer needs, how their application(s) work, assist them in troubleshooting issues, and create documentation to make it easier for users to troubleshoot issues themselves. The support you provide will help our users have a better experience and help shape our platform. 

 

We expect you to have knowledge of supporting cloud-based deployments across compute, storage and networking environments. 

 

What will you be doing:

  • Partner with multiple internal teams to provide Tier 1 support for complex cloud platforms

  • Triage/investigate root cause of customer issues and escalate as needed 

  • File bugs and report issues while working closely with the Site Reliability team

  • Build tooling to improve customer support process and visibility

  • Document best practices, solutions, knowledge base articles, how to’s, and blog posts 

  • Deeply understand user workloads and use cases 

  • Partner with multiple internal teams to give feedback to engineering teams and develop solutions to aid in their success

  • Be part of an on call rotation to support production systems

 

What we need to see:

  • BS/MS degree in Computer science or related areas (or equivalent experience)

  • 2+ yrs of experience with supporting distributed software systems

  • 2+ yrs of experience supporting end user software platforms 

  • 2+ yrs of experience with Linux

  • Experience with Kubernetes as well as experience with AWS, Azure, OCI, and GCP 

  • Background of Infrastructure, Networking, Storage, and DevOps scripting/tooling

  • Understanding of data storage technologies (databases, file, block, blob)

  • Willingness to become an expert in DGX Cloud

  • Customer Service/Support Experience

  • Willingness to work up and down the stack as well as across multiple teams 

  • Strong skills in troubleshooting with outstanding communication skills 

 

Ways to stand out from the crowd:

  • SLURM or HPC previous experience

  • Machine Learning and/or AI experience (self-taught is great!)

  • A strong drive to work with internal customers and make them successful

  • A drive to improve process with strong organizational skills  

 

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence.

The base salary range is 76,000 USD - 172,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Canva - Software Reliability Engineer (Observability)

Canva

Surry Hills, New South Wales, Australia (Remote)
3 Months ago
ComeOn Group - DevOps Engineer

ComeOn Group

Stockholm, Stockholm County, Sweden (Hybrid)
6 Months ago
PwC - IN-Senior Associate _Java Developer _Data & Analytics _Advisory _PAN India

PwC

Kolkata, West Bengal, India (On-Site)
6 Months ago
Patterned Learning Career - Senior Software Engineer, Infrastructure

Patterned Learning Career

(Remote)
2 Months ago
Trailmix Games - Senior DevOps Engineer

Trailmix Games

London, England, United Kingdom (Hybrid)
1 Month ago
Razer - Software Engineer (DevOps)

Razer

Shah Alam, Selangor, Malaysia (On-Site)
6 Months ago
Rackspace Technology - L3 Support Engineer (Windows/Linux on AWS)

Rackspace Technology

India (Remote)
1 Month ago
ION - Senior DevSecOps Engineer, Italy

ION

Collecchio, Emilia-Romagna, Italy (On-Site)
6 Months ago
Nielsen Holdings - Senior Software Engineer - Bigdata (Java/Scala , Spark, Python, AWS )

Nielsen Holdings

Gurugram, Haryana, India (Hybrid)
6 Months ago
Salesforce - Principal, Technical Architect

Salesforce

(Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Security Operation Engineer, Security Assurance

ByteDance

Singapore (On-Site)
2 Months ago
Intel Corporation - Info Security DevSecOps Engineer

Intel Corporation

Penang, Malaysia (Hybrid)
4 Months ago
Every matrix - Senior Java Developer

Every matrix

Lviv, Lviv Oblast, Ukraine (Hybrid)
3 Months ago
Provenir - Senior Quality Assurance Automation Engineer

Provenir

Bengaluru, Karnataka, India (On-Site)
8 Months ago
NVIDIA - Senior Site Reliability Engineer - AI Research Clusters

NVIDIA

Santa Clara, California, United States (Hybrid)
3 Months ago
Electronic Arts - Software Engineer - AI Solutions

Electronic Arts

Vancouver, British Columbia, Canada (Hybrid)
2 Months ago
ComeOn Group - DevOps Engineer

ComeOn Group

Stockholm, Stockholm County, Sweden (Hybrid)
6 Months ago
Epic Games - Web Engineer

Epic Games

(On-Site)
2 Months ago
Rockstar Games - Senior Data Engineer

Rockstar Games

Carlsbad, California, United States (On-Site)
1 Month ago
Sporty Group - Head of Technology

Sporty Group

(Remote)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in United States

Trailer Park Group - Creative Director - CG Content Design & Animation

Trailer Park Group

Los Angeles, California, United States (Hybrid)
3 Months ago
The Walt Disney Company - Sr. Principal Software Engineer - Identity

The Walt Disney Company

New York, New York, United States (On-Site)
3 Months ago
VGW - Senior Animator

VGW

California, United States (Remote)
2 Months ago
The Walt Disney Company - Senior Analyst, Audience Advancement

The Walt Disney Company

New York, New York, United States (On-Site)
2 Months ago
The Walt Disney Company - Resort Concierge - 3rd Shift - Part Time

The Walt Disney Company

Florida, United States (On-Site)
2 Months ago
Hypixel Studios - Principal Engineer - Project Technical Lead

Hypixel Studios

Seattle, Washington, United States (Remote)
6 Months ago
Netflix - Manager, Security Protocols Engineering

Netflix

United States (Remote)
5 Months ago
Warner Bros Games - Staff Data Engineer

Warner Bros Games

Atlanta, Georgia, United States (Hybrid)
1 Month ago
Onward Search - Online Media Editor – Evening Shift

Onward Search

New York, New York, United States (Remote)
1 Month ago
Varonis  - Channel & Alliance Marketing Director

Varonis

United States (Remote)
5 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Luxoft - Senior Software Support Engineer

Luxoft

Zlínský Kraj, Czechia (Remote)
5 Months ago
ION - Site Reliability Engineer

ION

London, England, United Kingdom (Hybrid)
6 Months ago
Zones - Azure Backend Developer

Zones

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Rockstar Games - DevOps Engineer

Rockstar Games

Edinburgh, Scotland, United Kingdom (On-Site)
10 Months ago
Microsoft - Senior Hardware Engineer

Microsoft

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Rackspace Technology - Cloud NoSQL (MongoDB) & Graph Database Engineer IV

Rackspace Technology

India (Remote)
1 Month ago
Playgendary - DevOps (Cloud Engineer)

Playgendary

Limassol, Limassol, Cyprus (Remote)
2 Months ago
E-Hireo - Cloud Engineer

E-Hireo

Bengaluru, Karnataka, India (On-Site)
6 Months ago
HiLabs - Sr. DevOps Engineer

HiLabs

Pune, Maharashtra, India (On-Site)
7 Months ago
Crunchyroll - Principal Software Engineer

Crunchyroll

Dallas, Texas, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Taipei City, Taiwan (On-Site)

Taipei City, Taiwan (On-Site)

Taipei City, Taiwan (On-Site)

Taipei City, Taiwan (On-Site)

Shanghai, Shanghai, China (On-Site)

India (Remote)

Santa Clara, California, United States (Remote)

Santa Clara, California, United States (Remote)

Santa Clara, California, United States (Remote)

California, United States (Remote)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug