Technical Support Engineer, Linux and HPC Admin

3 Months ago • 5 Years + • Administrative

Job Summary

Job Description

NVIDIA seeks a Technical Support Engineer specializing in Linux and HPC administration for their Base Command Manager (BCM) product. This role involves providing technical support to both internal and external customers utilizing BCM for managing clusters ranging from a few to thousands of nodes. Responsibilities include troubleshooting issues, collaborating with the development team, becoming a subject matter expert, conducting research and development tasks, and promoting best practices. The ideal candidate will have 5+ years of experience in HPC support, strong Linux expertise, and familiarity with parallel filesystems, ML frameworks, and related technologies. The position is remote in New Zealand or Australia.
Must have:
  • 5+ years HPC support experience
  • Strong Linux knowledge
  • Customer-facing experience
  • Research and problem-solving skills
  • Excellent communication skills
Good to have:
  • BCM/Bright Cluster Manager experience
  • Experience with parallel filesystems (Lustre, GPFS, WekaIO)
  • Familiarity with ML frameworks (Spark, Kubernetes)
  • Experience with Ceph

Job Details

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for over 25 years. It’s a unique legacy of innovation fueled by great technology—and dynamic people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. NVIDIANS immerse themselves in a diverse, supportive environment that encourages everyone to do their best work. Join the team and see how you can make a lasting impact on the world.

NVIDIA Base Command Manager powers thousands of clusters worldwide, varying from a few to several thousands of nodes, and streamlines cluster provisioning, workload management, and infrastructure monitoring. It provides all the tools you need to deploy and run an AI data center. We take great pride in providing excellent, comprehensive support to our customers! The Technical Support Engineer in this role will significantly impact and contribute to the overall success of both external customers running their clusters with NVIDIA solutions AND internal clusters used for research, operations, and next-generation projects.

What you’ll be doing:

  • Support our internal and external customers using our Linux-based cluster management software product, ensuring everyone receives the help they require to support their clusters.

  • Collaborate with the development team to collect the correct information and escalate issues to the appropriate development team.

  • Become and serve as a subject-matter expert in several areas.

  • Research and development tasks for customers or internal use by our development team.

  • Participate in proactive discussions with internal stakeholders to ensure BCM best practices are widely communicated.

  • Work with the latest hardware (e.g. GPUs, AI accelerators, high-speed interconnects) and software technologies such as parallel filesystems (e.g. Lustre, GPFS, WekaIO), Jupyter, and various ML frameworks and tools, Spark, Kubernetes, and Ceph.

What we need to see:

  • BS degree or equivalent experience in Electrical Engineering or related field.

  • 5 years of relevant, aligned experience providing support in the HPC realm, ideally in a customer-facing role.

  • Proven research skills and interest in assisting customers to achieve their goals.

  • Experience in a technical customer-facing role.

  • Eagerness to learn and become an authority on our product.

  • Excellent written communication skills with the ability to easily convey complex technical information to consumable summaries.

  • In-depth knowledge of Linux.

  • Familiarity with typical Linux installations and their most common software elements.

Ways to stand out from the crowd:

  • Experience with high-performance computing and system administration would be an asset

  • Previous experience as a system admin running BCM/Bright Cluster Manager/Base Command Manager clusters is a definite plus. 

Similar Jobs

Tesla - Distributed Systems Engineer, Autobidder Platform (Energy Software)

Tesla

North Holland, Netherlands (On-Site)
2 Months ago
ByteDance - Software Engineer in ML Engineering Platform

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Metyis - Lead Devops Engineer

Metyis

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Nielsen Holdings - Software Engineer - Platform

Nielsen Holdings

Mumbai, Maharashtra, India (Hybrid)
6 Months ago
NVIDIA - Technical Support Engineer, Linux and HPC Admin

NVIDIA

Australia (Remote)
2 Months ago
GamePoint - System Administrator

GamePoint

The Hague, South Holland, Netherlands (On-Site)
1 Month ago
Nielsen Holdings - Staff Sybase Architect

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Nintendo - Associate Account Administrator

Nintendo

Redmond, Washington, United States (Hybrid)
9 Months ago
Social Discovery Group - Deputy Head of Customer IT Infrastructure

Social Discovery Group

Poland (Remote)
6 Months ago
The Walt Disney Company - Business IT Project Management Internship

The Walt Disney Company

Île-de-France, France (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

NVIDIA - Senior DevOps Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
PlayStation Global - Platform Engineer

PlayStation Global

Adelaide, South Australia, Australia (On-Site)
1 Month ago
Tencent - Site Reliability Engineer Intern

Tencent

Los Angeles, California, United States (On-Site)
1 Month ago
Saviynt - Sr. Principal Software Engineer - Privileged Access Management (PAM)

Saviynt

El Segundo, California, United States (Hybrid)
6 Months ago
Wildlife Studios - Senior Game Engineer

Wildlife Studios

São Paulo, State Of São Paulo, Brazil (On-Site)
1 Month ago
NVIDIA - Senior Production Engineer - Storage

NVIDIA

Australia (Remote)
1 Month ago
Ajmera Infotech - Site Reliability Engineer (SRE) - Kubernetes

Ajmera Infotech

Austin, Texas, United States (On-Site)
2 Months ago
Next Level Games - Senior IT Administrator

Next Level Games

British Columbia, Canada (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in New Zealand

CAUSE AND FX - Casual Camera TD

CAUSE AND FX

Auckland, Auckland, New Zealand (Hybrid)
1 Month ago
Canva - Staff Software Engineer - Data Platform

Canva

Auckland, Auckland, New Zealand (Remote)
1 Month ago
Prismatic Studios - Art Production Manager (Lighting)

Prismatic Studios

Auckland, Auckland, New Zealand (On-Site)
1 Month ago
Canva - Security Engineering Director - Detection & Response - Remote across ANZ

Canva

Wellington, Wellington, New Zealand (Remote)
5 Months ago
Rocket Werkz - GAME ENGINE PROGRAMMER

Rocket Werkz

Auckland, Auckland, New Zealand (On-Site)
11 Months ago
Canva - Senior Software Engineer (Cloud Platform)

Canva

Auckland, Auckland, New Zealand (Remote)
2 Months ago
Canva - Staff Frontend Engineer - Editing Foundations (Rust)

Canva

Auckland, Auckland, New Zealand (Remote)
2 Months ago
Blind Squirrel Games - Technical Director

Blind Squirrel Games

Auckland, Auckland, New Zealand (On-Site)
4 Months ago
Axinous - Senior Manager, Sales Engineering

Axinous

Wellington, Wellington, New Zealand (Remote)
1 Month ago
Rocket Werkz - EXPRESSIONS OF INTEREST

Rocket Werkz

Auckland, Auckland, New Zealand (Hybrid)
6 Months ago

Get notifed when new similar jobs are uploaded

Administrative Jobs

Aristocrat Gaming - Field Engineering Supervisor

Aristocrat Gaming

Tulsa, Oklahoma, United States (Hybrid)
1 Month ago
Nintendo - Intern – Networking Software Engineer (NTD)

Nintendo

Redmond, Washington, United States (On-Site)
5 Months ago
Evolution - Studio Operations Manager

Evolution

Colombia (On-Site)
1 Month ago
ION - Application Support Engineer (Trading Systems)  - 5882

ION

New York, New York, United States (On-Site)
6 Months ago
The Walt Disney Company - Corporate Communication/CSR & Press Relations Internship

The Walt Disney Company

Paris, Île-de-France, France (On-Site)
1 Month ago
Scientific Games  - Field Service Technician I

Scientific Games

Gardiner, Maine, United States (On-Site)
2 Months ago
Sinch - Database Administrator (DBA)

Sinch

France (Remote)
1 Month ago
The Walt Disney Company - Executive Assistant

The Walt Disney Company

Burbank, California, United States (On-Site)
1 Month ago
ION - Technical Support Analyst, Chicago - 5849/9555

ION

Chicago, Illinois, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug