Senior Solution Engineer, Mission Control

1 Month ago • 5 Years + • Artificial Intelligence • Research & Development • $136,000 PA - $264,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior Solution Engineer to join its Mission Control team, focusing on automating AI Factory operations. This role involves direct customer interaction, troubleshooting software issues, collaborating with engineering teams, creating support tools, and driving issue resolution. The ideal candidate will possess strong Linux, containerization (Kubernetes), and programming (Python) skills, expertise in analyzing distributed GPU-accelerated workloads, and excellent communication abilities. Responsibilities include providing direct customer support, working with engineering teams on issue triage, developing and updating tools, and documenting customer interactions. Experience with parallel filesystems (Lustre, GPFS, WekaIO), Jupyter, ML frameworks, Spark, Ceph, and various hardware (GPUs, AI accelerators) is beneficial. Occasional weekend/holiday work may be required.
Must have:
  • 5+ years AI/ML experience
  • Linux expertise
  • Kubernetes experience
  • Python proficiency
  • Excellent communication
  • Problem-solving skills
Good to have:
  • Chatbot experience
  • RAG pipelines
  • Vector databases
  • Distributed training
  • PyTorch/TensorFlow
  • C/C++ development
  • CUDA experience
Perks:
  • Equity
  • Benefits

Job Details

NVIDIA is looking for an engineer who wants the buzz of direct customer interaction, and the reward of contributing to software and products. We want the right person to join our team of Solution Engineers working on the NVIDIA Mission Control, which automates the operations of AI Factories.  We need an expert engineer to triage customer software issues and resolve customer problems. You must have excellent problem-solving abilities and communication experience and be able to work on multiple projects and tasks. You must be strong in Linux, have solid programming skills, and possess experience working with containers and related technologies such as Kubernetes.  Experience analyzing the distributed GPU-accelerated workload performance is a plus.

What you'll be doing:

  • Provide direct support to our NVIDIA Enterprise customers and work to answer questions, reproduce, or resolve customer issues.

  • Work with engineering teams on customer issues, providing logs, reproduction information, and other triage information.

  • Create/update product and/or support tools.

  • Own and drive customer issues from inception to resolution.

  • Document customer interactions and better enhance our knowledge base.

  • Work with the latest hardware (e.g. GPUs, AI accelerators, high-speed interconnects) and software technologies such as parallel filesystems (e.g. Lustre, GPFS, WekaIO), Jupyter, and various ML frameworks and tools, Spark, Kubernetes, and Ceph

  • Occasional work on weekends and holidays to support customers

What we need to see:

  • Minimum of a BS in Computer Science, Electrical Engineering, or equivalent experience.

  • At least 5+ years of engineering experience with a proven track record in AI/ML-focused projects or enterprise-grade solutions.

  • Expertise analyzing, optimizing, and customizing Linux environments for AI/ML workloads.

  • Strong container orchestration/job scheduling experience on compute clusters, especially with Kubernetes

  • Professional-level communication experience, able to adjust to the technical level of the audience, and stay calm and focused in negative situations.

  • Excellent follow-up and organizational skills, with a love for solving problems.

  • Proficient in Python programming with the ability to develop scripts and build custom tools. Experience with parallel programming or GPU acceleration (e.g., CUDA) is highly desirable.
     

Ways to stand out from the crowd:

  • Experience with Chatbots, RAG pipelines, vector databases, distributed training or inference workloads

  • Experience developing in GPU accelerated / cloud / virtualized environments

  • Containerized solutions/job scheduling experience with knowledge of Docker and/or Kubernetes and/or Slurm, and/or experience analyzing software performance of distributed workloads

  • Experience with common deep learning frameworks such as PyTorch or TensorFlow

  • Experience developing with C/C++

The base salary range is 136,000 USD - 264,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Evolution - Data Scientist

Evolution

Warsaw, Masovian Voivodeship, Poland (Hybrid)
9 Months ago
PlayStation Global - Machine Learning Engineer for Game Technology

PlayStation Global

Aliso Viejo, California, United States (On-Site)
9 Months ago
Socure - Data Scientist - II

Socure

(Remote)
19 Hours ago
Google - Software Engineering Manager, Visual Language and Multimodal Modeling

Google

Sydney, New South Wales, Australia (On-Site)
2 Weeks ago
DNEG - Head of Machine Learning

DNEG

London, England, United Kingdom (Remote)
2 Months ago
Microsoft - Principal Research Scientist

Microsoft

Redmond, Washington, United States (On-Site)
2 Days ago
Microsoft - Member of Technical Staff, AI Post-Training

Microsoft

London, England, United Kingdom (On-Site)
1 Month ago
Google - Silicon Architecture/Design Engineer

Google

Bengaluru, Karnataka, India (On-Site)
2 Days ago
Google - Customer Engineer, Machine Learning, Google Cloud

Google

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Week ago
NVIDIA - AI Digital Human Development Intern - 2025

NVIDIA

(On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Google - Customer Engineer, Applied and Generative AI, Google Cloud

Google

Singapore, Singapore (On-Site)
1 Week ago
Google - Staff Software Engineer, Generative AI, Google Workspace

Google

Kirkland, Washington, United States (On-Site)
2 Weeks ago
ByteDance - Research Scientist Graduate (Foundation Model Speech & Audio Generation)

ByteDance

Seattle, Washington, United States (On-Site)
2 Days ago
Canva - Machine Learning Engineer Lead - User Voice

Canva

Auckland, Auckland, New Zealand (Remote)
3 Weeks ago
Inworld AI - Staff / Principal AI Researcher - USA

Inworld AI

Mountain View, California, United States (Remote)
4 Months ago
Google - Accelerator Architect and Performance Engineer, Generative AI

Google

San Diego, California, United States (On-Site)
1 Week ago
Arrise Solutions (India)   - Lead ML Engineer

Arrise Solutions (India)

Hyderabad, Telangana, India (On-Site)
7 Months ago
Hashlist - Staff Perception Engineer

Hashlist

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
ByteDance - Research Scientist, Code Generation

ByteDance

Seattle, Washington, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Volley - Resume Drop

Volley

San Francisco, California, United States (Hybrid)
7 Months ago
PlayStation Global - Senior Program Manager, Account & Identity

PlayStation Global

California, United States (On-Site)
1 Month ago
NVIDIA - Senior Graphics System Software Engineer - Tegra

NVIDIA

Santa Clara, California, United States (On-Site)
1 Week ago
NVIDIA - Senior Cost Accountant

NVIDIA

Santa Clara, California, United States (On-Site)
2 Weeks ago
Snail Games - Bilingual Executive Assistant (English/Mandarin)

Snail Games

Beverly Hills, California, United States (On-Site)
5 Months ago
Next Level Business Services - Java Full Stack Developer

Next Level Business Services

Reston, Virginia, United States (On-Site)
6 Months ago
WildBrain - Licensing Coordinator - Japan

WildBrain

New York, New York, United States (Hybrid)
1 Month ago
ByteDance - Research Scientist Intern (Doubao (Seed) - Music Foundation Model) - 2024 Summer (PhD)

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Super - Senior Full-Stack Software Engineer ( Remote! )

Super

Boston, Massachusetts, United States (Remote)
6 Months ago
Google - ISV Sales Specialist III

Google

Chicago, Illinois, United States (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Google - Software Engineer, AI/ML, Health and Safety Intelligence

Google

Mountain View, California, United States (On-Site)
2 Days ago
GoMotive - Computer Vision Engineer

GoMotive

Pakistan (Remote)
1 Month ago
Scale AI - Software Engineer, GenAI Model Evaluation

Scale AI

San Francisco, California, United States (Hybrid)
6 Months ago
Egnyte - Machine Learning Engineer - AI

Egnyte

India (Remote)
1 Month ago
ByteDance - Research Engineer / Scientist - AI for Databases

ByteDance

San Jose, California, United States (On-Site)
2 Days ago
Google - Software Engineer, Compiler, Scheduling and Optimization, Silicon

Google

Mountain View, California, United States (On-Site)
2 Weeks ago
Google - Software Engineer III, AI/ML, Core

Google

Sunnyvale, California, United States (On-Site)
2 Weeks ago
Google - Software Engineering Manager, RDMA Networking

Google

Sunnyvale, California, United States (On-Site)
2 Weeks ago
Google - Open Career Opportunities, Autonomous (Self-Driving) Vehicle Jobs, Waymo

Google

Phoenix, Arizona, United States (On-Site)
5 Months ago
NVIDIA - Director of Product - AI Training Platform Software

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug