Solutions Architect, Infrastructure - Research Computing

2 Months ago • 5 Years + • DevOps • $148,000 PA - $235,750 PA

Job Summary

Job Description

NVIDIA seeks a Solutions Architect for their higher education and research team. This role involves designing, building, and optimizing university-level research computing infrastructures using GPU-accelerated workflows. Responsibilities include working with universities to optimize hardware using tools like NVIDIA Base Command, Kubernetes, Slurm, and Jupyter; implementing system monitoring and telemetry; and creating documentation like white papers and training materials. The ideal candidate possesses a strong background in deploying AI workloads, cluster orchestration, and system performance optimization, along with experience with various containerization and monitoring tools.
Must have:
  • MS/PhD in relevant field or equiv. exp.
  • 5+ years relevant experience
  • GPU-accelerated computing infrastructure design & deployment
  • Cluster orchestration (Slurm, Kubernetes)
  • Container tools (Docker, Singularity)
  • System monitoring & optimization expertise
Good to have:
  • LLM training/inference workflow deployment
  • Experience with academic research computing customers
  • High-performance parallel file systems knowledge
  • OpenMPI and NCCL knowledge
  • Debugging and profiling tool experience
Perks:
  • Competitive salary
  • Comprehensive benefits package
  • Excellent engineering work culture
  • Equity

Job Details

Are you an experienced systems architect with an interest in advancing artificial intelligence (AI) and high-performance computing (HPC) in academic and research environments? We are looking for a Solutions Architect to join the higher education and research team! In this role you will work with universities and research institutions to optimize the design and deployment of AI infrastructure. Our team applies expertise in accelerated software and hardware systems to help enable groundbreaking advancements in AI, deep learning, and scientific research. This role requires a strong background in building and deploying research computing clusters, deploying AI workloads, and optimizing system performance at scale.

What you’ll be doing:

  • Technical advisor for the design, build-out, and optimization of university-level research computing infrastructures that include GPU-accelerated scientific workflows.

  • Work with university research computing to optimize hardware utilization with software orchestration tools such as NVIDIA Base Command, Kubernetes, Slurm, and Jupyter notebook environments.

  • Implement systems monitoring and telemetry tools to help optimize resource utilization, and track most demanding application workloads at research computing centers.

  • Document what you learn. This can include building targeted training, writing whitepapers, blogs, and wiki articles, and working through hard problems with a customer on a whiteboard.

  • Provide customer requirements and feedback to product and engineering teams.

What we need to see:

  • MS or PhD in Engineering, Mathematics, Physical Sciences, or Computer Science (or equivalent experience).

  • 5+ years of relevant work experience.

  • Strong experience in designing and deploying GPU-accelerated computing infrastructure.

  • In-depth knowledge of cluster orchestration and job scheduling technologies, e.g. Slurm, Kubernetes,Ansible and/or Open OnDemand. And experience with container tools (Docker, Singularity, Enroot/Pyxis) including at-scale deployment of containerized environments

  • Expertise in systems monitoring, telemetry, and systems performance optimization of research computing environments. Familiarity with tools like Prometheus, Grafana or NVIDIA DCGM.

  • Understanding of datacenter networking technologies (InfiniBand, Ethernet, OFED) and experience with network configuration.

  • Familiarity with power and cooling systems architecture for data center infrastructure.

Ways to stand out from the crowd:

  • Experience in deploying LLM training and inference workflows in a research computing environment.

  • Experience working with technical computing customers in the academic research computing space.

  • Practical knowledge of high-performance parallel file systems.

  • Applications and systems-level knowledge of OpenMPI and NCCL.

  • Experience with debugging and profiling tools. E.g. Nsight Systems, Nsight Compute, Compute Sanitizer, GDB or Valgrind.

With highly competitive salaries, a comprehensive benefits package, and an excellent engineering work culture, NVIDIA is widely considered to be one of the industry's most desirable employers.

The base salary range is 148,000 USD - 235,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

ByteDance - Software Engineer Intern (CDN/Edge/Traffic Platform)

ByteDance

San Jose, California, United States (On-Site)
1 Month ago
NinjaVan - Senior Software Engineer

NinjaVan

Hyderabad, Telangana, India (On-Site)
6 Months ago
Canva - Staff Backend Engineer - Product Insights Enablement - Java

Canva

Auckland, Auckland, New Zealand (Remote)
1 Month ago
Wargaming - DevOps Engineer

Wargaming

Belgrade, Serbia (On-Site)
4 Months ago
EXUSIA - Lead Data Engineers – Azure/Databricks/Snowflake

EXUSIA

United States (Remote)
3 Months ago
Inworld AI - Staff Cloud DevOps/Site Reliability Engineer (SRE) - USA

Inworld AI

Mountain View, California, United States (On-Site)
8 Months ago
PwC - ETIC, Cloud DevOps Lead - M

PwC

Cairo, Cairo Governorate, Egypt (On-Site)
6 Months ago
Ajmera Infotech - Senior DevOps - Azure Infrastructure + DevOps

Ajmera Infotech

Bengaluru, Karnataka, India (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

EvoPlay - Senior Java Developer

EvoPlay

Masovian Voivodeship, Poland (On-Site)
1 Month ago
GoTo Group - Lead Software Engineer - Engineering Platforms

GoTo Group

Bengaluru, Karnataka, India (On-Site)
5 Months ago
LSEG (London Stock Exchange Group) - DevOps Engineer

LSEG (London Stock Exchange Group)

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Dream Sports - SDE - 1 - DevOps

Dream Sports

Mumbai, Maharashtra, India (On-Site)
6 Months ago
Ajmera Infotech - Site Reliability Engineer - Kubernetes

Ajmera Infotech

San Jose, California, United States (On-Site)
2 Months ago
CloudHire - Senior Angular NestJS Developer

CloudHire

Karnataka, India (Remote)
1 Month ago
NVIDIA - Solutions Architect, Infrastructure - Research Computing

NVIDIA

Massachusetts, United States (Remote)
2 Months ago
Epic Games - Senior Backend Engineer

Epic Games

Cary, North Carolina, United States (On-Site)
1 Month ago
SmileGate - System Engineer (Private Cloud)

SmileGate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
3 Months ago
Blinkhealth - Senior Data Engineer

Blinkhealth

India (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in New Jersey, United States

Next Level Business Services - Salesforce Technical Lead

Next Level Business Services

San Jose, California, United States (On-Site)
6 Months ago
Next Level Business Services - SAP SM and MM (Full Time)

Next Level Business Services

Malvern, Pennsylvania, United States (On-Site)
6 Months ago
Zoox - Senior/Staff Software Engineer - 3D World Generation Pipelines

Zoox

Seattle, Washington, United States (Hybrid)
6 Months ago
Google - Software Engineer III, Infrastructure, Google Cloud Security and Privacy

Google

Sunnyvale, California, United States (On-Site)
5 Months ago
X Studios,  Inc  - Engineer, Django/Python (Contractor)

X Studios, Inc

Winter Park, Florida, United States (On-Site)
8 Months ago
PlayStation Global - Senior Program Manager, Ecommerce

PlayStation Global

California, United States (On-Site)
1 Month ago
Moonbug Entertainment - Client Success Manager

Moonbug Entertainment

California, United States (On-Site)
1 Month ago
Feld Entertainment - Monster Jam Truck Technician

Feld Entertainment

Ellenton, Florida, United States (On-Site)
6 Months ago
Sphere Entertainment Co - Senior Manager Data Science

Sphere Entertainment Co

Las Vegas, Nevada, United States (On-Site)
1 Month ago
Rivos - Silicon Logic Formal Verification - Full Time

Rivos

Portland, Oregon, United States (Hybrid)
6 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Tencent - Tencent Cloud - Technical Account Manager (South Korea)

Tencent

Seoul, South Korea (On-Site)
4 Months ago
Visa - Chief Systems Architect

Visa

Auckland, Auckland, New Zealand (Hybrid)
4 Months ago
Futurum Technology  - DevOps Engineer (Python Focus)

Futurum Technology

Kraków, Lesser Poland Voivodeship, Poland (Remote)
1 Month ago
NVIDIA - Senior Site Reliability Engineer - AI Research Clusters

NVIDIA

Santa Clara, California, United States (Hybrid)
3 Months ago
ION - Senior Technical Consultant – IT2

ION

Central Sulawesi, Indonesia (On-Site)
6 Months ago
Nagarro - Senior Engineer, DevOps

Nagarro

India (Remote)
6 Months ago
Avathon - DevOps Engineer

Avathon

Bengaluru, Karnataka, India (On-Site)
6 Months ago
Rackspace Technology - Security Engineer - Palo Alto

Rackspace Technology

India (Remote)
2 Months ago
Trend Micro - DevOps Engineer

Trend Micro

Manila, Metro Manila, Philippines (On-Site)
18 Years ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug