Senior Solution Engineer, Mission Control

1 Month ago • 5 Years + • Artificial Intelligence • Research & Development • $136,000 PA - $264,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior Solution Engineer for its Mission Control team, focusing on automating AI Factory operations. The role involves direct customer interaction, troubleshooting software issues, resolving customer problems, and collaborating with engineering teams. Responsibilities include providing technical support, creating support tools, owning customer issues from start to finish, and documenting interactions. Expertise in Linux, container technologies (Kubernetes), and experience with distributed GPU-accelerated workloads is crucial. The position requires strong problem-solving, communication, and organizational skills, along with proficiency in Python and experience with various AI/ML tools and frameworks.
Must have:
  • 5+ years of AI/ML engineering experience
  • Linux expertise for AI/ML workloads
  • Kubernetes experience on compute clusters
  • Excellent communication and problem-solving skills
  • Python proficiency, custom tool development
Good to have:
  • Experience with Chatbots, RAG pipelines, vector databases
  • Distributed training/inference workloads
  • GPU accelerated/cloud/virtualized environment experience
  • Docker/Kubernetes/Slurm experience
  • Experience with PyTorch or TensorFlow
  • C/C++ development experience
Perks:
  • Equity
  • Benefits

Job Details

NVIDIA is looking for an engineer who wants the buzz of direct customer interaction, and the reward of contributing to software and products. We want the right person to join our team of Solution Engineers working on the NVIDIA Mission Control, which automates the operations of AI Factories.  We need an expert engineer to triage customer software issues and resolve customer problems. You must have excellent problem-solving abilities and communication experience and be able to work on multiple projects and tasks. You must be strong in Linux, have solid programming skills, and possess experience working with containers and related technologies such as Kubernetes.  Experience analyzing the distributed GPU-accelerated workload performance is a plus.

What you'll be doing:

  • Provide direct support to our NVIDIA Enterprise customers and work to answer questions, reproduce, or resolve customer issues.

  • Work with engineering teams on customer issues, providing logs, reproduction information, and other triage information.

  • Create/update product and/or support tools.

  • Own and drive customer issues from inception to resolution.

  • Document customer interactions and better enhance our knowledge base.

  • Work with the latest hardware (e.g. GPUs, AI accelerators, high-speed interconnects) and software technologies such as parallel filesystems (e.g. Lustre, GPFS, WekaIO), Jupyter, and various ML frameworks and tools, Spark, Kubernetes, and Ceph

  • Occasional work on weekends and holidays to support customers

What we need to see:

  • Minimum of a BS in Computer Science, Electrical Engineering, or equivalent experience.

  • At least 5+ years of engineering experience with a proven track record in AI/ML-focused projects or enterprise-grade solutions.

  • Expertise analyzing, optimizing, and customizing Linux environments for AI/ML workloads.

  • Strong container orchestration/job scheduling experience on compute clusters, especially with Kubernetes

  • Professional-level communication experience, able to adjust to the technical level of the audience, and stay calm and focused in negative situations.

  • Excellent follow-up and organizational skills, with a love for solving problems.

  • Proficient in Python programming with the ability to develop scripts and build custom tools. Experience with parallel programming or GPU acceleration (e.g., CUDA) is highly desirable.
     

Ways to stand out from the crowd:

  • Experience with Chatbots, RAG pipelines, vector databases, distributed training or inference workloads

  • Experience developing in GPU accelerated / cloud / virtualized environments

  • Containerized solutions/job scheduling experience with knowledge of Docker and/or Kubernetes and/or Slurm, and/or experience analyzing software performance of distributed workloads

  • Experience with common deep learning frameworks such as PyTorch or TensorFlow

  • Experience developing with C/C++

The base salary range is 136,000 USD - 264,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Xsolla - Machine Learning Engineer

Xsolla

Montreal, Quebec, Canada (Remote)
1 Week ago
Luxoft - Regular Data Engineer

Luxoft

(Remote)
5 Months ago
ByteDance - Research Engineer (Machine Learning Training System) - 2025 Start

ByteDance

Singapore (On-Site)
6 Months ago
Google - Machine Learning Engineer, Design Verification, Silicon

Google

Bengaluru, Karnataka, India (On-Site)
2 Weeks ago
Playrix - Generative AI Engineer

Playrix

Cyprus (Remote)
2 Weeks ago
Meta - Software Engineer, Systems ML - SW/HW Co-design

Meta

Redmond, Washington, United States (On-Site)
5 Months ago
Google - Senior Research Engineer, AI/ML

Google

London, England, United Kingdom (On-Site)
1 Week ago
ByteDance - Research Engineer Intern

ByteDance

Seattle, Washington, United States (On-Site)
2 Days ago
Google - Research Scientist, Ads QUEST

Google

Los Angeles, California, United States (On-Site)
2 Days ago
Zoox - Senior/Staff Software Engineer - Simulation Traffic & Behavior Modeling

Zoox

Foster City, California, United States (Hybrid)
6 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Research Scientist - Multimodal Foundation Model - 2025 Start

ByteDance

Singapore (On-Site)
5 Months ago
Eightfold - Staff Engineer-Backend

Eightfold

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
Hedra - Research Scientist

Hedra

New York, New York, United States (On-Site)
1 Month ago
AI Fund - ML Engineer

AI Fund

San Francisco, California, United States (On-Site)
1 Week ago
Demandbase - Senior Applied Scientist

Demandbase

San Francisco, California, United States (On-Site)
7 Hours ago
Snail Studios - Software Engineer - AI/Machine Translation

Snail Studios

(Remote)
2 Months ago
Framestore - Machine Learning Developer - London Launchpad Internship 2025

Framestore

London, England, United Kingdom (On-Site)
1 Month ago
Match Group - Staff Software Engineer, Machine Learning

Match Group

Palo Alto, California, United States (Hybrid)
6 Months ago
Google - Senior Software Engineer, Distributed Machine Learning

Google

Mountain View, California, United States (On-Site)
2 Days ago

Get notifed when new similar jobs are uploaded

Jobs in Durham, North Carolina, United States

Google - Strategic Partner Development Manager III, Service Partners

Google

Austin, Texas, United States (On-Site)
2 Days ago
Electronic Arts - Senior Product Manager, Player Safety

Electronic Arts

Austin, Texas, United States (Hybrid)
3 Weeks ago
onwards Search - Presentation Specialist

onwards Search

San Francisco, California, United States (On-Site)
22 Hours ago
Life church - Associate Operations Pastor

Life church

United States (On-Site)
6 Months ago
Unity - Director, GTM Technology

Unity

San Francisco, California, United States (Hybrid)
8 Hours ago
NVIDIA - Senior System Reliability Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
3 Weeks ago
Google - Senior Hardware Engineer, Display Metrology

Google

Fremont, California, United States (On-Site)
2 Days ago
Salesforce - Full-Stack Software Engineer – Senior/Lead/Principal

Salesforce

San Francisco, California, United States (On-Site)
6 Months ago
Whatnot - Engineering Manager, Infrastructure

Whatnot

Los Angeles, California, United States (Remote)
6 Months ago
Postman - Product Marketing Manager – API Testing Automation

Postman

San Francisco, California, United States (Hybrid)
1 Day ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

NVIDIA - Machine Learning Intern - 2025

NVIDIA

(On-Site)
3 Months ago
Globalization Partners - Senior AI Engineer

Globalization Partners

(Remote)
2 Months ago
NVIDIA - Principal Engineer

NVIDIA

(Remote)
2 Months ago
Reality Games - Machine Learning Engineer - Monopoly World

Reality Games

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
2 Months ago
Microsoft - Applied Scientist II

Microsoft

Redmond, Washington, United States (On-Site)
1 Week ago
NVIDIA - Continuous Bring Up Solution Architect

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Week ago
N-iX - Senior AI/GenAI Solution Engineer

N-iX

(Remote)
2 Weeks ago
PlayStation Global - Senior Machine Learning Engineer

PlayStation Global

London, England, United Kingdom (On-Site)
1 Week ago
ClinDCast - GenAI Application Lead

ClinDCast

Austin, Texas, United States (Remote)
9 Months ago
Google - Software Engineer III, Education Scaled Deployments

Google

Mexico City, Mexico City, Mexico (On-Site)
2 Days ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug