Technical Support Engineer, Linux and HPC Admin

2 Months ago • 5 Years + • Administrative

Job Summary

Job Description

NVIDIA seeks a Technical Support Engineer specializing in Linux and HPC administration for their Base Command Manager (BCM) product. This role involves providing technical support to both internal and external customers using Linux-based cluster management software. Responsibilities include collaborating with the development team, troubleshooting issues, serving as a subject matter expert, conducting research and development, and ensuring BCM best practices are communicated. The ideal candidate will have in-depth Linux knowledge, experience with HPC and system administration, and excellent communication skills. Experience with parallel filesystems (Lustre, GPFS, WekaIO), Jupyter, ML frameworks, Spark, Kubernetes, and Ceph is highly desirable.
Must have:
  • 5+ years HPC support experience
  • In-depth Linux knowledge
  • Excellent communication skills
  • Customer-facing experience
  • Research & problem-solving skills
Good to have:
  • BCM/Bright Cluster Manager experience
  • Experience with parallel filesystems
  • Familiarity with ML frameworks (Spark, Kubernetes)
  • Experience with Ceph

Job Details

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for over 25 years. It’s a unique legacy of innovation fueled by great technology—and dynamic people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. NVIDIANS immerse themselves in a diverse, supportive environment that encourages everyone to do their best work. Join the team and see how you can make a lasting impact on the world.

NVIDIA Base Command Manager powers thousands of clusters worldwide, varying from a few to several thousands of nodes, and streamlines cluster provisioning, workload management, and infrastructure monitoring. It provides all the tools you need to deploy and run an AI data center. We take great pride in providing excellent, comprehensive support to our customers! The Technical Support Engineer in this role will significantly impact and contribute to the overall success of both external customers running their clusters with NVIDIA solutions AND internal clusters used for research, operations, and next-generation projects.

What you’ll be doing:

  • Support our internal and external customers using our Linux-based cluster management software product, ensuring everyone receives the help they require to support their clusters.

  • Collaborate with the development team to collect the correct information and escalate issues to the appropriate development team.

  • Become and serve as a subject-matter expert in several areas.

  • Research and development tasks for customers or internal use by our development team.

  • Participate in proactive discussions with internal stakeholders to ensure BCM best practices are widely communicated.

  • Work with the latest hardware (e.g. GPUs, AI accelerators, high-speed interconnects) and software technologies such as parallel filesystems (e.g. Lustre, GPFS, WekaIO), Jupyter, and various ML frameworks and tools, Spark, Kubernetes, and Ceph.

What we need to see:

  • BS degree or equivalent experience in Electrical Engineering or related field.

  • 5 years of relevant, aligned experience providing support in the HPC realm, ideally in a customer-facing role.

  • Proven research skills and interest in assisting customers to achieve their goals.

  • Experience in a technical customer-facing role.

  • Eagerness to learn and become an authority on our product.

  • Excellent written communication skills with the ability to easily convey complex technical information to consumable summaries.

  • In-depth knowledge of Linux.

  • Familiarity with typical Linux installations and their most common software elements.

Ways to stand out from the crowd:

  • Experience with high-performance computing and system administration would be an asset

  • Previous experience as a system admin running BCM/Bright Cluster Manager/Base Command Manager clusters is a definite plus. 

Similar Jobs

Epic Games - Security Engineer - Backend (Asset Integrity)

Epic Games

Porto Alegre, State Of Rio Grande Do Sul, Brazil (On-Site)
1 Month ago
GoTo Group - Senior Software Engineer (Backend) - Consumer Lending

GoTo Group

Jakarta, Jakarta, Indonesia (On-Site)
3 Months ago
CharacterAI - Software Engineer, Machine Learning Infrastructure

CharacterAI

New York, New York, United States (On-Site)
1 Month ago
Warner Bros Games - Staff Software Engineer - Fullstack developer (Backend)

Warner Bros Games

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
ByteDance - Site Reliability Engineer (Cloud) - Infrastructure Engineering

ByteDance

Singapore (On-Site)
5 Months ago
Nintendo - Senior Engineer, Installer (NTD)

Nintendo

Redmond, Washington, United States (On-Site)
9 Months ago
Ubisoft - Application Specialist

Ubisoft

Bucharest, Bucharest, Romania (Hybrid)
1 Month ago
Lucid Reality Labs - Infrastructure Engineer

Lucid Reality Labs

Poland (Hybrid)
1 Month ago
Extreme Network - SR PROGRAMMER 9489- EBS Applications/Oracle PL/SQL, SQL/Oracle Forms, Reports

Extreme Network

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Forescout Technologies  Inc  - Professional Services Engineer

Forescout Technologies Inc

United States (Hybrid)
5 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

PwC - IN-Senior Associate_React Developer_Data &Analytics_Advisory_PAN India

PwC

Bengaluru, Karnataka, India (On-Site)
6 Months ago
The Workshop - Data Software Engineer

The Workshop

Madrid, Community Of Madrid, Spain (Hybrid)
1 Month ago
Brillio - Azure DB Architect - Migration - R01531206

Brillio

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Sonar Source - Solutions Engineer - Dubai

Sonar Source

Dubai, Dubai, United Arab Emirates (Remote)
6 Months ago
PwC - ETIC, GCP Cloud Solution Architect - Senior Manager

PwC

Cairo, Cairo Governorate, Egypt (On-Site)
6 Months ago
Cargo Studio - Lead DevOps Engineer

Cargo Studio

(On-Site)
2 Months ago
Trend Micro - (Sr.) Cloud Backend Engineer

Trend Micro

Taipei City, Taiwan (On-Site)
6 Months ago
Genies - Machine Learning Infrastructure Engineer, 3D Model Inference & Deployment

Genies

Los Angeles, California, United States (On-Site)
1 Month ago
Kolibri Games - DevOps Engineer

Kolibri Games

Berlin, Berlin, Germany (Hybrid)
1 Month ago
Alaan - Backend Engineer

Alaan

Bengaluru, Karnataka, India (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Australia

The Walt Disney Company - Lighting Technical Director

The Walt Disney Company

Sydney, New South Wales, Australia (On-Site)
2 Months ago
Easygo - KICK Marketing Specialist

Easygo

Melbourne, Victoria, Australia (On-Site)
1 Month ago
VGW - CRM Marketing Specialist

VGW

Sydney, New South Wales, Australia (On-Site)
1 Month ago
Canva - Senior Backend Software Engineer - Java - RPC (Remote across ANZ)

Canva

Sydney, New South Wales, Australia (Remote)
3 Months ago
Canva - Head of Production

Canva

Sydney, New South Wales, Australia (Remote)
1 Month ago
Easygo - Sportsbook Manager

Easygo

Melbourne, Victoria, Australia (On-Site)
1 Month ago
VGW - Engineering Manager

VGW

Sydney, New South Wales, Australia (On-Site)
2 Months ago
Easygo - Internal Communications Lead

Easygo

Melbourne, Victoria, Australia (On-Site)
1 Month ago
Firemonkeys - Game Product Manager

Firemonkeys

Melbourne, Victoria, Australia (Hybrid)
2 Months ago
PlayStation Global - Senior Linux Network Software Engineer

PlayStation Global

Adelaide, South Australia, Australia (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Administrative Jobs

Tesla - Service Advisor

Tesla

England, United Kingdom (On-Site)
2 Months ago
Scientific Games  - Benefits Manager

Scientific Games

Alpharetta, Georgia, United States (On-Site)
2 Months ago
The Walt Disney Company - Technical Inspector

The Walt Disney Company

Papenburg, Lower Saxony, Germany (On-Site)
1 Month ago
Extreme Network - SR PROGRAMMER - EBS Applications/Oracle PL/SQL, SQL/Oracle Forms, Reports

Extreme Network

Chennai, Tamil Nadu, India (Hybrid)
6 Months ago
NVIDIA - System Software Engineer, Database and API Design

NVIDIA

Shanghai, Shanghai, China (On-Site)
1 Month ago
Enphase Energy - Field Service Technician - NorCal (6 month Temporary Assignment)

Enphase Energy

United States (On-Site)
6 Months ago
IGT - Senior IT Internal Auditor

IGT

Providence, Rhode Island, United States (On-Site)
4 Months ago
Next Level Business Services - IIB, DP, ODM Admin

Next Level Business Services

Burbank, California, United States (On-Site)
6 Months ago
OKX - Senior Agent, Customer Service (Italian Speaker)

OKX

Budapest, Hungary (On-Site)
5 Months ago
Evolution - HR Administration Junior Specialist

Evolution

Tbilisi, Tbilisi, Georgia (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug