Senior Solutions Architect, Infiniband and Networking Ethernet

6 Minutes ago • 8 Years + • Network Engineering • DevOps

Job Summary

Job Description

NVIDIA seeks a Senior Networking (ETH/IB) Solutions Architect to design and implement large-scale networking projects for AI/HPC infrastructure. Responsibilities include supporting operational reliability, focusing on performance, monitoring, and alerting of AI clusters. The role involves the entire service lifecycle, from design and deployment to operation and refinement, and requires excellent customer interaction skills. This includes working with customers, partners, and internal teams to analyze, define, and implement solutions. Strong automation skills using tools like Ansible, Salt, and Python are essential.
Must have:
  • 8+ years networking experience (LAN, InfiniBand)
  • Linux system administration/DevOps expertise
  • Automation skills (Ansible, Salt, Python)
  • Customer-focused leadership
  • Strong communication skills
Good to have:
  • Linux or Networking Certifications
  • HPC architecture knowledge
  • Experience with Slurm/PBS
  • Python or Bash scripting
  • GPU/MPI experience
  • BCM (Base Command Manager) knowledge

Job Details

NVIDIA is the world leader in computer graphics, artificial intelligence, and accelerated computing. For over 25 years, we have been at the forefront of research and engineering around the greatest advances in technology. Our history of innovation drives us to solve the worlds hardest problems.

NVIDIA is looking for Senior Networking (ETH/IB) Solutions Architect to join its NVIDIA Infrastructure Specialst Team. Academic and commercial groups around the world are using NVIDIA products to revolutionize deep learning and data analytics, and to power data centers. Join the team building many of the largest and fastest AI/HPC systems in the world! We are looking for someone with the ability to work on a dynamic customer focused team that requires excellent interpersonal skills. This role will be interacting with customers, partners and internal teams, to analyze, define and implement large scale Networking projects. The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer!

What you'll be doing:

  • Primary responsibilities will include building AI/HPC infrastructure for new and existing customers.

  • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting.

  • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement.

  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.

  • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

What we need to see:

  • BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields with at least 8 years work or research experience in networking fundamentals, TCP/IP stack, and data center architecture.

  • 8+ years of experience with configuring, testing, validating, and issue resolution of LAN and InfiniBand networking, including use of validation tools for InfiniBand health and performance including medium to large scale HPC/AI network environments.

  • Knowledge and experience with Linux system administration/dev ops, process management, package management, task scheduling, kernel management, boot procedures, troubleshooting, performance reporting/optimization/logging, and network-routing/advanced networking (tuning and monitoring).

  • Driven focus on customer needs and satisfaction. Self-motivated with excellent leadership skills including working with customers.

  • Extensive knowledge of automation, delivering fully automated network provisioning solutions using Ansible, Salt, and Python.

  • Strong written, verbal, and listening skills in English are essential.

Ways to stand out from the crowd:

  • Linux or Networking Certifications.

  • Experience with High-performance computing architectures. Understanding of how job schedulers(Slurm, PBS) work.

  • Proven knowledge of Python or Bash. Infrastructure Specialist's delivery experience

  • luster management technologies knowledge (bonus credit for BCM (Base Command Manager).)

  • Experience with GPU (Graphics Processing Unit) focused hardware/software.

  • Experience with MPI (Message Passing Interface.)

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking individuals in the world working for us. If you're creative and autonomous, we want to hear from you.

Similar Jobs

N-iX - Middle GCP DevOps Engineer

N-iX

Ukraine (Remote)
1 Week ago
Next Level Business Services - Java UI Developer

Next Level Business Services

Tampa, Florida, United States (On-Site)
5 Months ago
ION - Markets Product Security Engineer - UK

ION

London, England, United Kingdom (On-Site)
5 Months ago
Velotio Technologies - Senior DevOps Engineer (GCP)

Velotio Technologies

Pune, Maharashtra, India (Remote)
1 Week ago
Roofstacks - Senior Platform Engineer

Roofstacks

İstanbul, İstanbul, Türkiye (On-Site)
1 Month ago
Activision - Lead Network Programmer

Activision

Warsaw, Masovian Voivodeship, Poland (On-Site)
5 Months ago
NVIDIA - Senior Solutions Architect, Networking - Cloud Service Providers

NVIDIA

California, United States (Hybrid)
2 Days ago
ByteDance - Software Engineer, Multi Cloud CDN - San Jose / Seattle / Boston

ByteDance

Seattle, Washington, United States (On-Site)
3 Months ago
ByteDance - Software Development Engineer Graduate (Network Monitoring & Alerts) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
ByteDance - Tech Lead - Data Tech Infrastructure- San Jose

ByteDance

San Jose, California, United States (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

NVIDIA - Senior Site Reliability Engineer - AI Research Clusters

NVIDIA

Austin, Texas, United States (Hybrid)
1 Month ago
NVIDIA - Senior Server Firmware Bringup Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
Playrix - Senior Release Engineer

Playrix

Armenia (Remote)
5 Months ago
E-Hireo - Cloud Engineer

E-Hireo

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Argus Labs - Site Reliability Engineer

Argus Labs

Calgary, Alberta, Canada (Remote)
1 Week ago
Kaedim - DevOps Engineer

Kaedim

San Francisco, California, United States (On-Site)
7 Months ago
Playrix - Senior Release Automation Engineer (Gardenscapes)

Playrix

Ireland (Remote)
2 Months ago
NVIDIA - Senior HPC AI Cluster Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
NVIDIA - Senior BMC Firmware Development Engineer - Platform Lead

NVIDIA

Taipei City, Taiwan (On-Site)
2 Days ago
NVIDIA - Senior Technical Instructor - AI and Data Center Infrastructure

NVIDIA

Texas, United States (Remote)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Singapore

NinjaVan - Delivery Attendant

NinjaVan

Singapore, Singapore (On-Site)
5 Months ago
ByteDance - Full-Stack Software Engineer - 2025 Start

ByteDance

Singapore (On-Site)
5 Months ago
IGG - Senior 2D Concept Artist

IGG

Singapore (On-Site)
5 Months ago
Garena - Associate/Senior Associate, Marketing

Garena

Singapore (On-Site)
5 Months ago
ByteDance - Payment Product Management Intern - Global Payment

ByteDance

Singapore (On-Site)
1 Week ago
ByteDance - Backend Engineer, Video-On-Demand - 2025 Start

ByteDance

Singapore (On-Site)
5 Months ago
PwC - Tax NewLaw - Associate

PwC

Singapore (On-Site)
5 Months ago
ByteDance - Integrated Marketing Operations Manager - Global Payment

ByteDance

Singapore (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

Network Engineering Jobs

Tesla - Network Administrator

Tesla

Prüm, Rhineland-Palatinate, Germany (On-Site)
1 Month ago
Larian Studios - Lead Security & Network Engineer

Larian Studios

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Week ago
ByteDance - Network Resource Management Specialist

ByteDance

Singapore (On-Site)
5 Months ago
Epic Games - Senior Network Programmer

Epic Games

Montreal, Quebec, Canada (On-Site)
1 Month ago
Next Level Business Services - Network Architecture and Operations

Next Level Business Services

Philadelphia, Pennsylvania, United States (On-Site)
5 Months ago
ION - Network Engineer - 7401

ION

Noida, Uttar Pradesh, India (On-Site)
5 Months ago
Intrepid Studios,  Inc  - Senior Network Engineer

Intrepid Studios, Inc

San Diego, California, United States (On-Site)
1 Month ago
ByteDance - Senior Software Development Engineer, SDN-Traffic Intelligence & Control

ByteDance

San Jose, California, United States (On-Site)
1 Week ago
Assystems - Network Administrator - L2

Assystems

Gurugram, Haryana, India (On-Site)
5 Months ago
Larian Studios - Lead Security & Network Engineer

Larian Studios

Guildford, England, United Kingdom (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Bengaluru, Karnataka, India (Hybrid)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (Remote)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug