Senior Solution Engineer, Mission Control

2 Months ago • 5 Years + • Artificial Intelligence • Research & Development • $136,000 PA - $264,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior Solution Engineer to join its Mission Control team, focusing on automating AI Factory operations. This role involves direct customer interaction, troubleshooting software issues, collaborating with engineering teams, creating support tools, and driving issue resolution. The ideal candidate will possess strong Linux, containerization (Kubernetes), and programming (Python) skills, expertise in analyzing distributed GPU-accelerated workloads, and excellent communication abilities. Responsibilities include providing direct customer support, working with engineering teams on issue triage, developing and updating tools, and documenting customer interactions. Experience with parallel filesystems (Lustre, GPFS, WekaIO), Jupyter, ML frameworks, Spark, Ceph, and various hardware (GPUs, AI accelerators) is beneficial. Occasional weekend/holiday work may be required.
Must have:
  • 5+ years AI/ML experience
  • Linux expertise
  • Kubernetes experience
  • Python proficiency
  • Excellent communication
  • Problem-solving skills
Good to have:
  • Chatbot experience
  • RAG pipelines
  • Vector databases
  • Distributed training
  • PyTorch/TensorFlow
  • C/C++ development
  • CUDA experience
Perks:
  • Equity
  • Benefits

Job Details

NVIDIA is looking for an engineer who wants the buzz of direct customer interaction, and the reward of contributing to software and products. We want the right person to join our team of Solution Engineers working on the NVIDIA Mission Control, which automates the operations of AI Factories.  We need an expert engineer to triage customer software issues and resolve customer problems. You must have excellent problem-solving abilities and communication experience and be able to work on multiple projects and tasks. You must be strong in Linux, have solid programming skills, and possess experience working with containers and related technologies such as Kubernetes.  Experience analyzing the distributed GPU-accelerated workload performance is a plus.

What you'll be doing:

  • Provide direct support to our NVIDIA Enterprise customers and work to answer questions, reproduce, or resolve customer issues.

  • Work with engineering teams on customer issues, providing logs, reproduction information, and other triage information.

  • Create/update product and/or support tools.

  • Own and drive customer issues from inception to resolution.

  • Document customer interactions and better enhance our knowledge base.

  • Work with the latest hardware (e.g. GPUs, AI accelerators, high-speed interconnects) and software technologies such as parallel filesystems (e.g. Lustre, GPFS, WekaIO), Jupyter, and various ML frameworks and tools, Spark, Kubernetes, and Ceph

  • Occasional work on weekends and holidays to support customers

What we need to see:

  • Minimum of a BS in Computer Science, Electrical Engineering, or equivalent experience.

  • At least 5+ years of engineering experience with a proven track record in AI/ML-focused projects or enterprise-grade solutions.

  • Expertise analyzing, optimizing, and customizing Linux environments for AI/ML workloads.

  • Strong container orchestration/job scheduling experience on compute clusters, especially with Kubernetes

  • Professional-level communication experience, able to adjust to the technical level of the audience, and stay calm and focused in negative situations.

  • Excellent follow-up and organizational skills, with a love for solving problems.

  • Proficient in Python programming with the ability to develop scripts and build custom tools. Experience with parallel programming or GPU acceleration (e.g., CUDA) is highly desirable.
     

Ways to stand out from the crowd:

  • Experience with Chatbots, RAG pipelines, vector databases, distributed training or inference workloads

  • Experience developing in GPU accelerated / cloud / virtualized environments

  • Containerized solutions/job scheduling experience with knowledge of Docker and/or Kubernetes and/or Slurm, and/or experience analyzing software performance of distributed workloads

  • Experience with common deep learning frameworks such as PyTorch or TensorFlow

  • Experience developing with C/C++

The base salary range is 136,000 USD - 264,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Google - Senior ML Systems Engineer, AICore

Google

Taipei City, Taiwan (On-Site)
1 Month ago
Google - Senior Software Engineer, Generative AI

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
ByteDance - Tech Lead Machine Learning Engineer

ByteDance

Seattle, Washington, United States (On-Site)
2 Months ago
Starkflow - Principal Full Stack Developer

Starkflow

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
CyberArk - Data Architect

CyberArk

Israel (Hybrid)
4 Weeks ago
ByteDance - Machine Learning Engineer - Pico Perception

ByteDance

San Jose, California, United States (On-Site)
1 Month ago
Zoox - Senior/Staff Software Engineer, ML Performance Optimization

Zoox

Foster City, California, United States (On-Site)
7 Months ago
Google - Student Researcher, BS/MS, Winter/Summer 2025

Google

Mountain View, California, United States (On-Site)
6 Months ago
Google - Lead Group Product Manager, Developer AI, Core

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
Google - Technical Program Manager, Cloud AI and Industry Solutions

Google

Sunnyvale, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Google - Senior Software Engineer, Visual Language and Multimodal Modeling

Google

Sydney, New South Wales, Australia (On-Site)
1 Month ago
Canva - Machine Learning Engineer Lead - User Voice

Canva

Auckland, Auckland, New Zealand (Remote)
1 Month ago
Starkflow - Principal Full Stack Developer

Starkflow

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
ByteDance - AI Security Researcher - Security Flow

ByteDance

San Jose, California, United States (On-Site)
7 Months ago
DNEG - Head of Machine Learning

DNEG

London, England, United Kingdom (Remote)
2 Months ago
ION - Data Engineer, Italy

ION

Italy (Hybrid)
7 Months ago
Google - Staff Software Engineer, Machine Learning Performance, TPU

Google

Mountain View, California, United States (On-Site)
1 Month ago
ByteDance - Architect - AML Engine

ByteDance

San Jose, California, United States (On-Site)
7 Months ago
2K - Applied Scientist

2K

Dublin, County Dublin, Ireland (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

The Walt Disney Company - Lead Software Engineer - Big Data Infrastructure

The Walt Disney Company

California, United States (On-Site)
2 Months ago
WebFX - Jr. SaaS Project Manager

WebFX

Ann Arbor, Michigan, United States (On-Site)
6 Months ago
Sail Point - Manager, Reporting & Technical Accounting

Sail Point

Austin, Texas, United States (On-Site)
3 Weeks ago
Ansys - Software Engineer II - Systems Integration

Ansys

Exton, Pennsylvania, United States (On-Site)
3 Weeks ago
Google - Staff Software Engineer, Infrastructure, Google Cloud AI

Google

Kirkland, Washington, United States (On-Site)
1 Month ago
Second Dinner - Senior Design Director (MARVEL SNAP)

Second Dinner

United States (Remote)
1 Month ago
Corsair - Global Supply Manager

Corsair

Milpitas, California, United States (On-Site)
2 Months ago
Canva - Revenue Operations Manager, NPI

Canva

Los Angeles, California, United States (Remote)
1 Month ago
McDonald's Corporation - Functional Solutions Lead - Oracle Enterprise Planning and Budgeting (EPB)

McDonald's Corporation

Chicago, Illinois, United States (On-Site)
1 Month ago
INTEL - GPU Firmware Development Engineer

INTEL

Hillsboro, Oregon, United States (Hybrid)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Spell Brush - LLM Engineer

Spell Brush

San Francisco, California, United States (On-Site)
2 Months ago
Microsoft - Member of Technical Staff, AI Reinforcement Systems

Microsoft

Zürich, Zurich, Switzerland (On-Site)
1 Month ago
Social Discovery Group - Senior NLP Engineer

Social Discovery Group

Serbia (Remote)
7 Months ago
NVIDIA - Director, Regional Developer Relations - ROAP

NVIDIA

(Remote)
2 Months ago
Google - Staff Software Engineer, Machine Learning Infrastructure

Google

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Henkel - Data Scientist-Intern

Henkel

Pune, Maharashtra, India (On-Site)
8 Months ago
Google - Machine Learning Engineer, LLM, Personal AI, Google Pixel

Google

New Taipei, New Taipei City, Taiwan (On-Site)
1 Month ago
Zoox - Senior Machine Learning Engineer - Collision Avoidance System

Zoox

Foster City, California, United States (Hybrid)
7 Months ago
Google - Senior Software Engineer, Google Cloud AI

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Pune, Maharashtra, India (On-Site)

Taipei City, Taiwan (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug