Senior Software Engineer – AI Infrastructure and Tooling

1 Month ago • 4 Years + • DevOps • $184,000 PA - $356,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior Software Engineer specializing in AI infrastructure and tooling. This role involves designing and implementing cutting-edge infrastructure solutions for large-scale cloud and on-premise computing clusters. Responsibilities include crafting production-grade software using strong programming skills and distributed systems design expertise. The engineer will design and implement Continuous Deployment (CD) pipelines and ensure efficient software delivery. A deep understanding of systems, tools, and approaches to solve complex problems is essential. This position directly impacts the efficiency of NVIDIA's Autonomous Vehicles development team.
Must have:
  • 4+ years k8s based computing platforms tooling/APIs development
  • 4+ years building cloud automation software (Terraform, Python, Go)
  • Strong AWS fundamentals (IAM, VPC, RDS, S3, CDN, EC2)
  • DevOps principles, tools, and methodologies expertise
  • Continuous Deployments (CD) pipelines experience
  • Understanding of Traffic Engineering solutions
  • Observability, Prometheus, large-scale log ingestion expertise
  • Linux proficiency
Good to have:
  • Experience with tooling and SRE automation on large GPU/CPU clusters
  • Experience with Agentic AI tools for infrastructure management
  • Artifactory Management at scale
  • Understanding of cloud and datacenter security concepts
Perks:
  • Equity
  • Benefits

Job Details

We are looking for a highly motivated AI infrastructure automation and tools development expert to join us. As a seasoned professional with a strong passion for designing and implementing cutting-edge infrastructure solutions, you will play a key role in architecting and driving advancements in our large-scale cloud and on-premise computing clusters. We are a small and fast moving team, and we own production excellence of everything we develop, on all layers from OS and up to the services. Please apply if you are passionate about operational reliability, building AWS infrastructure automation and deployment tools and working on new technologies and Cloud Native applications. The solutions you propose and build will directly impact the efficiency of the NVIDIA Autonomous Vehicles development team!

What you'll be doing:

  • You will be applying strong programming skills and a deep understanding of the  distributed systems design for crafting and building production-grade software.

  • Focus on designing and implementing Continuous Deployments (CD) pipelines to ensure flawless and efficient software delivery

  • Responsible for the big picture of how our systems relate to each other and utilizing a breadth of tools and approaches to tackle a broad spectrum of problems.

What we need to see:

  • BS or MS in the CS/CE/EE or equivalent experience

  • 4+ years of the k8s based computing platforms tooling/APIs development

  • At least 4 years building automation software for the cloud with Terraform, Python, Go

  • Strong AWS fundamentals: IAM, VPC, RDS, S3, CDN, EC2

  • Expert knowledge of DevOps principles, tools, and methodologies

  • Working experience with Continuous Deployments (CD) pipelines

  • Good understanding of the Traffic Engineering solutions. Load Balancing, Layer7 proxies

  • In depth understanding of all layers of the Internet protocols

  • Operational expertise with Observability, Prometheus eco system, logs ingestion at scale

  • Proficiency with Linux environment

  • Excellent written and verbal interpersonal skills

  • You'll be a fun and motivated teammate who enjoys a challenge and celebrates success

Ways to stand out from the crowd:

  • Previous experience with building sophisticated tooling and SRE automation on large GPU/CPU clusters

  • You have working experience with Agentic AI tools for the computing infrastructure management

  • Artifactory Management at scale

  • Good understanding of cloud and datacenter security concepts, AWS is preferred

  • Solid understanding of the large scale k8s observability platforms

NVIDIA is the leader in AI, machine learning and datacenter acceleration! NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to solve, that only we can tackle, and that matter to the world. This is our life’s work, to amplify human imagination and intelligence. Make the choice, join our diverse team today!

The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Gloss Genius - Senior Software Engineer, Infrastructure

Gloss Genius

New York, New York, United States (Hybrid)
8 Hours ago
Anavation - Senior Cloud Developer

Anavation

San Antonio, Texas, United States (Remote)
1 Week ago
Canva - Staff Backend Engineer - Product Insights Enablement - Java

Canva

Melbourne, Victoria, Australia (Remote)
1 Month ago
Google - Technical Solutions Engineer, Infrastructure, Serverless

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
2 Days ago
Google - Customer Engineer, Infrastructure Modernization, Google Cloud

Google

Gurugram, Haryana, India (On-Site)
2 Weeks ago
Canva - Senior Software Engineer - Cloud Access Team

Canva

Sydney, New South Wales, Australia (Remote)
1 Week ago
Animoca Brands - Senior DevOps Engineer

Animoca Brands

Hong Kong (On-Site)
7 Months ago
Canva - Senior Software Engineer (Cloud Platform)

Canva

Auckland, Auckland, New Zealand (Remote)
2 Months ago
Rackspace Technology - Azure Cloud Engineer III

Rackspace Technology

Bengaluru, Karnataka, India (Remote)
3 Weeks ago
ComeOn Group - DevOps Engineer

ComeOn Group

Stockholm, Stockholm County, Sweden (Hybrid)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Senior Backend Software Engineer - Global E-Commerce Supply Chain Operation Platform

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Anavation - Senior Cloud Developer

Anavation

Huntsville, Alabama, United States (Remote)
1 Week ago
Zscaler - Technical Support Engineer

Zscaler

Melbourne, Victoria, Australia (Hybrid)
7 Hours ago
ComeOn Group - DevOps Engineer

ComeOn Group

Stockholm, Stockholm County, Sweden (Hybrid)
3 Weeks ago
Next Level Business Services - Systems Engineer

Next Level Business Services

Redmond, Washington, United States (On-Site)
6 Months ago
ByteDance - Senior Backend Software Engineer - Global E-Commerce Supply Chain

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Google - Technical Solutions Engineer, Infrastructure, Serverless

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Week ago
ION - DBA Administrator

ION

Italy (Hybrid)
6 Months ago
Google - Software Engineer, Site Reliability Engineering

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
2 Days ago
The Walt Disney Company - Senior Systems Network Engineer

The Walt Disney Company

Papenburg, Lower Saxony, Germany (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in California, United States

Postman - Technical Trainer

Postman

San Francisco, California, United States (Hybrid)
1 Day ago
Biofire DX - Clinical Applications Specialist I - Clinical Microbiology

Biofire DX

Salt Lake City, Utah, United States (On-Site)
1 Week ago
Nintendo - Intern – CPU Debugger Software Engineer (NTD)

Nintendo

Redmond, Washington, United States (On-Site)
5 Months ago
Sbm management - Custodial Lead

Sbm management

Orlando, Florida, United States (On-Site)
2 Months ago
Riot Games - Manager, Software Engineering - Payments

Riot Games

Los Angeles, California, United States (On-Site)
1 Day ago
Guardian - Learning Specialist

Guardian

Pittsfield, Massachusetts, United States (Hybrid)
23 Hours ago
Meta - Research Scientist Intern, Smart Glasses in Wearables AI (PhD)

Meta

Menlo Park, California, United States (On-Site)
5 Months ago
Hawk Eye Innovations - Director of Basketball Operations

Hawk Eye Innovations

Atlanta, Georgia, United States (Hybrid)
1 Month ago
The Walt Disney Company - Investigative Reporter

The Walt Disney Company

Houston, Texas, United States (On-Site)
1 Month ago
The Walt Disney Company - Lead Machine Learning Engineer

The Walt Disney Company

New York, New York, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Google - Technical Solutions Engineer, Data, Google Cloud

Google

Seoul, South Korea (On-Site)
2 Weeks ago
Virtuos - Lead Software Engineer

Virtuos

Singapore (On-Site)
1 Month ago
Google - Principal Architect III, Retail, Google Cloud

Google

Ohio, United States (On-Site)
1 Week ago
Google - Customer Engineer I, Infrastructure Modernization, Google Cloud

Google

New York, New York, United States (On-Site)
2 Weeks ago
Google - Customer Engineer III, API and Integration

Google

San Francisco, California, United States (On-Site)
1 Week ago
DraftKings - Lead Site Reliability Engineer

DraftKings

Boston, Massachusetts, United States (On-Site)
1 Month ago
Paytm - DevOps Engineer/Senior DevOps-Paytm Money

Paytm

Bengaluru, Karnataka, India (On-Site)
5 Months ago
PwC - IN- Senior Associate_ DevOps_Advisory Corporate_Advisory _Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Scorewarrior - Senior System Engineer

Scorewarrior

Limassol, Limassol, Cyprus (On-Site)
1 Month ago
Revolgy - Customer Support Engineer—AWS, Kubernetes (remote Europe)

Revolgy

United Kingdom (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug