Technical Support Engineer, Linux and HPC Admin

1 Month ago • 5 Years + • Administrative

Job Summary

Job Description

NVIDIA seeks a Technical Support Engineer specializing in Linux and HPC administration for their Base Command Manager (BCM) product. This role involves providing technical support to both internal and external customers utilizing BCM for managing clusters ranging from a few to thousands of nodes. Responsibilities include troubleshooting issues, collaborating with the development team, becoming a subject matter expert, conducting research and development tasks, and promoting best practices. The ideal candidate will have 5+ years of experience in HPC support, strong Linux expertise, and familiarity with parallel filesystems, ML frameworks, and related technologies. The position is remote in New Zealand or Australia.
Must have:
  • 5+ years HPC support experience
  • Strong Linux knowledge
  • Customer-facing experience
  • Research and problem-solving skills
  • Excellent communication skills
Good to have:
  • BCM/Bright Cluster Manager experience
  • Experience with parallel filesystems (Lustre, GPFS, WekaIO)
  • Familiarity with ML frameworks (Spark, Kubernetes)
  • Experience with Ceph

Job Details

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for over 25 years. It’s a unique legacy of innovation fueled by great technology—and dynamic people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. NVIDIANS immerse themselves in a diverse, supportive environment that encourages everyone to do their best work. Join the team and see how you can make a lasting impact on the world.

NVIDIA Base Command Manager powers thousands of clusters worldwide, varying from a few to several thousands of nodes, and streamlines cluster provisioning, workload management, and infrastructure monitoring. It provides all the tools you need to deploy and run an AI data center. We take great pride in providing excellent, comprehensive support to our customers! The Technical Support Engineer in this role will significantly impact and contribute to the overall success of both external customers running their clusters with NVIDIA solutions AND internal clusters used for research, operations, and next-generation projects.

What you’ll be doing:

  • Support our internal and external customers using our Linux-based cluster management software product, ensuring everyone receives the help they require to support their clusters.

  • Collaborate with the development team to collect the correct information and escalate issues to the appropriate development team.

  • Become and serve as a subject-matter expert in several areas.

  • Research and development tasks for customers or internal use by our development team.

  • Participate in proactive discussions with internal stakeholders to ensure BCM best practices are widely communicated.

  • Work with the latest hardware (e.g. GPUs, AI accelerators, high-speed interconnects) and software technologies such as parallel filesystems (e.g. Lustre, GPFS, WekaIO), Jupyter, and various ML frameworks and tools, Spark, Kubernetes, and Ceph.

What we need to see:

  • BS degree or equivalent experience in Electrical Engineering or related field.

  • 5 years of relevant, aligned experience providing support in the HPC realm, ideally in a customer-facing role.

  • Proven research skills and interest in assisting customers to achieve their goals.

  • Experience in a technical customer-facing role.

  • Eagerness to learn and become an authority on our product.

  • Excellent written communication skills with the ability to easily convey complex technical information to consumable summaries.

  • In-depth knowledge of Linux.

  • Familiarity with typical Linux installations and their most common software elements.

Ways to stand out from the crowd:

  • Experience with high-performance computing and system administration would be an asset

  • Previous experience as a system admin running BCM/Bright Cluster Manager/Base Command Manager clusters is a definite plus. 

Similar Jobs

Hitachi - Solution Architect

Hitachi

San José, San José Province, Costa Rica (On-Site)
4 Months ago
Ludeo - Senior Back End Developer

Ludeo

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
6 Days ago
The Walt Disney Company - Senior Systems Engineer, Data Services [Database Administration]

The Walt Disney Company

Burbank, California, United States (On-Site)
1 Month ago
Onward Search - API Developer

Onward Search

North Arlington, New Jersey, United States (Remote)
2 Days ago
ION - Senior DevSecOps Engineer, Italy

ION

London, England, United Kingdom (On-Site)
4 Months ago
The Walt Disney Company - Disney Live Entertainment Production Assistant Intern

The Walt Disney Company

Kissimmee, Florida, United States (On-Site)
2 Days ago
Next Level Business Services - SFDC ARCHITECT

Next Level Business Services

Deerfield, Illinois, United States (On-Site)
4 Months ago
Trek - Seasonal Service Technician (Part-Time)

Trek

Ohio, United States (On-Site)
1 Month ago
Tesla - Receptionist

Tesla

Prüm, Rhineland-Palatinate, Germany (On-Site)
2 Days ago
ByteDance - Datacenter Operation (DCO) - Infrastructure Engineering

ByteDance

Kulai, Johor, Malaysia (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

PwC - Senior Data Scientist

PwC

Warsaw, Masovian Voivodeship, Poland (Hybrid)
4 Months ago
ByteDance - Machine Learning Engineer - Machine Learning Infrastructure

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
Activision - Data Engineering Co-op

Activision

Vancouver, British Columbia, Canada (Hybrid)
3 Weeks ago
Bungie - Marathon Senior Software Engineer - Commerce

Bungie

(Hybrid)
1 Month ago
Warner Bros Games - Manager, Software Engineering

Warner Bros Games

Hyderabad, Telangana, India (Hybrid)
4 Weeks ago
Ludeo - Senior Front End Developer

Ludeo

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
Tencent - Senior Site Reliability Engineer

Tencent

Shanghai, Shanghai, China (On-Site)
5 Months ago
Intrepid Studios,  Inc  - Senior Networking Engineer

Intrepid Studios, Inc

Canada (On-Site)
6 Months ago
Nagarro - Associate Staff Engineer - Cloud Infrastructure

Nagarro

Colombia (Remote)
1 Week ago
My Fitness Pal - Senior AI Engineer

My Fitness Pal

United States (Remote)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in New Zealand

Canva - Staff Backend Engineer - Canva Ecosystem

Canva

Auckland, Auckland, New Zealand (Remote)
1 Week ago
Zuru - Content Editor Executive

Zuru

Auckland, Auckland, New Zealand (On-Site)
2 Months ago
PTW - Corporate Social Responsibility Manager (Auckland, NZ)

PTW

Auckland, Auckland, New Zealand (Hybrid)
2 Months ago
Canva - Security Engineering Director - Detection & Response - Remote across ANZ

Canva

Auckland, Auckland, New Zealand (Remote)
3 Months ago
Zuru - Product Design Engineer

Zuru

Auckland, Auckland, New Zealand (On-Site)
4 Months ago
Tencent - Backend Development Intern

Tencent

Auckland, Auckland, New Zealand (On-Site)
1 Month ago
CAUSE AND FX - FX Artist

CAUSE AND FX

Auckland, Auckland, New Zealand (Hybrid)
5 Days ago
Salesforce - Business Development Representative

Salesforce

Auckland, Auckland, New Zealand (On-Site)
5 Months ago
Zuru - Senior Designer (Beauty Vertical)

Zuru

Auckland, Auckland, New Zealand (On-Site)
1 Month ago
Enphase Energy - Electronics Compliance Test Engineers

Enphase Energy

Christchurch, Canterbury, New Zealand (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Administrative Jobs

Trek - Service Technician/Advisor - Part Time

Trek

Sacramento, California, United States (On-Site)
2 Weeks ago
Buckman - Payroll & Benefits Specialist

Buckman

Belgium (On-Site)
3 Months ago
N-iX - Office Manager

N-iX

Medellín, Antioquia, Colombia (Hybrid)
1 Month ago
Nagarro - HR Generalist (m/f/d)

Nagarro

Frankfurt, Hessen, Germany (On-Site)
3 Months ago
Next Level Business Services - SAP HANA XS Consultant

Next Level Business Services

Palo Alto, California, United States (On-Site)
4 Months ago
bosh group india - SAP SD (Information to Order) - PUNE LOCATION

bosh group india

Maharashtra, India (On-Site)
1 Month ago
NVIDIA - Senior Systems Software Engineer, Data Center - CUDA

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
Take-Two Interactive - Data Archivist

Take-Two Interactive

Las Vegas, Nevada, United States (On-Site)
2 Months ago
AVER LLC - Ten- Print Examiner- Internship

AVER LLC

United States (On-Site)
4 Months ago
Granicus - Technical Support Representative - Tier 1 (Costa Rica)

Granicus

Costa Rica (Remote)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Yokne'am Illit, North District, Israel (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

United States (Remote)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Bengaluru, Karnataka, India (Hybrid)

Bengaluru, Karnataka, India (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug