Outscal Logooutscal logo

Technical Support Engineer, Linux and HPC Admin

1 Month ago • 5 Years + • Administrative

Job Summary

Job Description

NVIDIA seeks a Technical Support Engineer specializing in Linux and HPC administration for their Base Command Manager (BCM) product. This role involves providing technical support to both internal and external customers using Linux-based cluster management software. Responsibilities include collaborating with the development team, troubleshooting issues, serving as a subject matter expert, conducting research and development, and ensuring BCM best practices are communicated. The ideal candidate will have in-depth Linux knowledge, experience with HPC and system administration, and excellent communication skills. Experience with parallel filesystems (Lustre, GPFS, WekaIO), Jupyter, ML frameworks, Spark, Kubernetes, and Ceph is highly desirable.
Must have:
  • 5+ years HPC support experience
  • In-depth Linux knowledge
  • Excellent communication skills
  • Customer-facing experience
  • Research & problem-solving skills
Good to have:
  • BCM/Bright Cluster Manager experience
  • Experience with parallel filesystems
  • Familiarity with ML frameworks (Spark, Kubernetes)
  • Experience with Ceph

Job Details

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for over 25 years. It’s a unique legacy of innovation fueled by great technology—and dynamic people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. NVIDIANS immerse themselves in a diverse, supportive environment that encourages everyone to do their best work. Join the team and see how you can make a lasting impact on the world.

NVIDIA Base Command Manager powers thousands of clusters worldwide, varying from a few to several thousands of nodes, and streamlines cluster provisioning, workload management, and infrastructure monitoring. It provides all the tools you need to deploy and run an AI data center. We take great pride in providing excellent, comprehensive support to our customers! The Technical Support Engineer in this role will significantly impact and contribute to the overall success of both external customers running their clusters with NVIDIA solutions AND internal clusters used for research, operations, and next-generation projects.

What you’ll be doing:

  • Support our internal and external customers using our Linux-based cluster management software product, ensuring everyone receives the help they require to support their clusters.

  • Collaborate with the development team to collect the correct information and escalate issues to the appropriate development team.

  • Become and serve as a subject-matter expert in several areas.

  • Research and development tasks for customers or internal use by our development team.

  • Participate in proactive discussions with internal stakeholders to ensure BCM best practices are widely communicated.

  • Work with the latest hardware (e.g. GPUs, AI accelerators, high-speed interconnects) and software technologies such as parallel filesystems (e.g. Lustre, GPFS, WekaIO), Jupyter, and various ML frameworks and tools, Spark, Kubernetes, and Ceph.

What we need to see:

  • BS degree or equivalent experience in Electrical Engineering or related field.

  • 5 years of relevant, aligned experience providing support in the HPC realm, ideally in a customer-facing role.

  • Proven research skills and interest in assisting customers to achieve their goals.

  • Experience in a technical customer-facing role.

  • Eagerness to learn and become an authority on our product.

  • Excellent written communication skills with the ability to easily convey complex technical information to consumable summaries.

  • In-depth knowledge of Linux.

  • Familiarity with typical Linux installations and their most common software elements.

Ways to stand out from the crowd:

  • Experience with high-performance computing and system administration would be an asset

  • Previous experience as a system admin running BCM/Bright Cluster Manager/Base Command Manager clusters is a definite plus. 

Similar Jobs

PlayStation Global - Software (Backend) Engineer I

PlayStation Global

Aliso Viejo, California, United States (On-Site)
11 Hours ago
Egnyte - Sr. Test Performance Engineer

Egnyte

Poznań, Greater Poland Voivodeship, Poland (Remote)
1 Month ago
Rebellion - Senior DevOps Engineer (AWS/Azure)

Rebellion

England, United Kingdom (Hybrid)
11 Hours ago
Alaan - Backend Engineer

Alaan

Bengaluru, Karnataka, India (On-Site)
6 Months ago
Nielsen Holdings - STAFF SOFTWARE ENGINEER

Nielsen Holdings

Gurugram, Haryana, India (Hybrid)
5 Months ago
RoofStack - Senior Purchasing Specialist

RoofStack

İstanbul, İstanbul, Türkiye (On-Site)
1 Week ago
ION - Service Desk Analyst - 5600

ION

Singapore (On-Site)
5 Months ago
GoMotive - Associate Enterprise Systems Product Manager

GoMotive

Bengaluru, Karnataka, India (Remote)
1 Week ago
Visa - Staff Site Reliability Engineer - IT Disaster Recovery (ITDR)

Visa

Highlands Ranch, Colorado, United States (On-Site)
5 Months ago
Cadence - Lead FrontEnd Methodology Engineer

Cadence

Bengaluru, Karnataka, India (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Match Group - 機械学習エンジニア(Machine Learning Engineer)

Match Group

Tokyo, Japan (Hybrid)
5 Months ago
NVIDIA - Senior Solutions Architect, Global Partner Team

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
ByteDance - Site Reliability Engineer - Machine Learning Systems - Singapore

ByteDance

Singapore (On-Site)
4 Months ago
Trend Micro - Fullstack Development Engineer

Trend Micro

Manila, Metro Manila, Philippines (On-Site)
15 Years ago
ION - Senior Security Architect

ION

London, England, United Kingdom (On-Site)
5 Months ago
Crunchyroll - Principal Technical Product Manager - Application Security

Crunchyroll

Los Angeles, California, United States (On-Site)
3 Weeks ago
PwC - ETIC, Cloud Infrastructure - Manager

PwC

Cairo, Cairo Governorate, Egypt (On-Site)
4 Months ago
GameAnalytics - Senior Backend Developer (Python)

GameAnalytics

Capital Region Of Denmark, Denmark (On-Site)
1 Week ago
Super - Software Architect (Remote!)

Super

Toronto, Ontario, Canada (Remote)
5 Months ago
ByteDance - Site Reliability Engineer - Game

ByteDance

Singapore (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Australia

The Walt Disney Company - Generalist Supervisor

The Walt Disney Company

Sydney, New South Wales, Australia (On-Site)
1 Month ago
Flying Bark Productions - Senior 3D Modeller

Flying Bark Productions

New South Wales, Australia (Hybrid)
2 Weeks ago
Easygo - Software Development Engineer, Payments & Fraud

Easygo

Melbourne, Victoria, Australia (On-Site)
3 Months ago
Flying Bark Productions - Pipeline Technical Assistant

Flying Bark Productions

Sydney, New South Wales, Australia (Hybrid)
12 Hours ago
Framestore - Senior Compositor

Framestore

Melbourne, Victoria, Australia (On-Site)
2 Months ago
Axon - Technical Support Representative

Axon

Melbourne, Victoria, Australia (Remote)
1 Month ago
Canva - Backend Software Engineer - Product Quality

Canva

Surry Hills, New South Wales, Australia (Remote)
1 Week ago
Canva - Security Engineering Director - Detection & Response - Remote across ANZ

Canva

Adelaide, South Australia, Australia (Remote)
4 Months ago
PlayStation Global - Site Reliability Engineer

PlayStation Global

Adelaide, South Australia, Australia (On-Site)
1 Month ago
The Walt Disney Company - Senior Layout Artist

The Walt Disney Company

Sydney, New South Wales, Australia (On-Site)
1 Day ago

Get notifed when new similar jobs are uploaded

Administrative Jobs

Next Level Business Services - SAP PI/PO LEAD

Next Level Business Services

Scottsdale, Arizona, United States (On-Site)
4 Months ago
Next Level Business Services - SAP QM

Next Level Business Services

St. Louis, Missouri, United States (On-Site)
4 Months ago
Tesla - Receptionist

Tesla

Prüm, Rhineland-Palatinate, Germany (On-Site)
1 Month ago
Brillio - DB Migration Engineer - R01531207

Brillio

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Trek - Service Technician/Advisor

Trek

Alamo, California, United States (On-Site)
1 Month ago
Warner Bros Games - P2P Analyst, Invoice Processing

Warner Bros Games

Warsaw, Masovian Voivodeship, Poland (Hybrid)
1 Month ago
Garena - Garena - Data Ops Engineer

Garena

Taipei City, Taiwan (On-Site)
2 Months ago
Rockwell Automation - ITAM Analyst

Rockwell Automation

Pune, Maharashtra, India (Hybrid)
6 Months ago
Rackspace Technology - Support Engineer I - IN (AWS and Linux/Windows)

Rackspace Technology

Gurugram, Haryana, India (Hybrid)
2 Weeks ago
Stake Logic - Technical Support Specialist

Stake Logic

Birkirkara, Malta (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Hsinchu, Hsinchu City, Taiwan (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Seoul, South Korea (Hybrid)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Ra'anana, Center District, Israel (On-Site)

Shanghai, Shanghai, China (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Be'er Sheva, South District, Israel (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug