Senior DevOps Engineer, Deep Learning Frameworks

3 Months ago • 5 Years + • DevOps

Job Summary

Job Description

NVIDIA's Deep Learning Optimized Frameworks Group seeks a Senior DevOps Engineer to enhance their high-performing deep learning software stacks. Responsibilities include automating build, test, integration, and release processes for frameworks like TensorFlow and PyTorch; configuring and maintaining industry-standard DevOps tools (Gitlab, Jenkins, Docker, etc.); developing shared utilities; leading best practices; and identifying infrastructure needs. The ideal candidate will possess strong experience in CI/CD, SCM, and build systems, along with programming skills in Python (or similar).
Must have:
  • 5+ years relevant experience
  • CI/CD system automation
  • SCM & build systems expertise (Git, CMake, etc.)
  • Python (or Perl/Shell scripting)
  • Problem-solving & collaboration
Good to have:
  • CUDA & Deep Learning Software Stack experience
  • Container & cluster tech (Kubernetes, Jenkins, etc.)
  • GPU computing systems knowledge
  • Experience with new tech incorporation
Perks:
  • Highly competitive salaries
  • Extensive benefits package
  • Diverse and inclusive work environment

Job Details

NVIDIA's Deep Learning Optimized Frameworks Group is looking for an excellent DevOps Engineer to enable the next wave of NVIDIA’s highest performing deep learning software stacks. Your role spans multiple products such as TensorFlow and PyTorch and is instrumental for streamlining development, build, and releases with modern DevOps tools. Join our technically hardworking team of software engineers and infrastructure authorities to design the systems that enable NVIDIA to stay ahead of the competition as we deliver the world's fastest deep learning frameworks.

What you'll be doing:

  • Automating and optimizing build, test, integrate, and release processes for optimized NVIDIA Deep Learning Frameworks

  • Configuring, maintaining, and building upon deployments of industry-standard tools (e.g. Gitlab, Jenkins, Docker, LXC, HyperV, CMake, Bazel)

  • Developing shared utilities for setting up systems, running tests, and recording results

  • Lead best-practices for building, testing, and releasing software

  • Identifying infrastructure needs and translating them into action

What we need to see:

  • BS or higher degree in computer science (or equivalent experience)

  • 5+ years of relevant experience

  • Strong experience setting up, maintaining, and automating continuous integration systems

  • Fluency in SCM (e.g. Github, Gitlab, Git) and build systems (e.g. Make, CMake, Bazel, Docker)

  • Adept programming skills in Python (or Perl, Shell scripting, like bash, tcsh, sh)

  • Pragmatic approach to solving problems and collaboration

  • Real passion for “it just works” automation and enabling team members

Ways to stand out from the crowd:

  • Experience with CUDA and Deep Learning Software Stack

  • Good knowledge of container and cluster technologies like slurm, kubernetes, jenkins, gitlab-ci, and zabbix

  • Experience with GPU computing systems

  • Track record of identifying useful new technologies and incorporating them into SW development flows

  • Experience as an active contributor to a SW project involving many developers

NVIDIA is at the forefront of breakthroughs in Artificial Intelligence, High-Performance Computing, and Visualization. Our teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology. We offer highly competitive salaries, an extensive benefits package, and a work environment that promotes diversity, inclusion, and flexibility. As an equal opportunity employer, we are committed to fostering a supportive and empowering workplace for all.

Similar Jobs

NVIDIA - Senior Site Reliability Engineer - AI Research Clusters

NVIDIA

Westford, Massachusetts, United States (Hybrid)
1 Month ago
NVIDIA - Senior Signal Integrity Design Engineer

NVIDIA

Canada (On-Site)
1 Month ago
ByteDance - Software Engineer Intern (Doubao (Seed) - Machine Learning System) - 2025 Summer (MS)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
ByteDance - Research Scientist Graduate (Foundation Model - Vision and Language)

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago
Meta - Research Scientist Intern, Smart Glasses in Wearables AI (PhD)

Meta

Redmond, Washington, United States (On-Site)
5 Months ago
Google - Systems Development Engineer, Customer Deployments

Google

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
1 Week ago
Google - Technical Account Manager, Google Cloud Consulting

Google

Madrid, Community Of Madrid, Spain (On-Site)
1 Week ago
Kaedim - DevOps Engineer

Kaedim

San Francisco, California, United States (On-Site)
7 Months ago
CD PROJEKT RED - ML Ops Engineer

CD PROJEKT RED

Warsaw, Masovian Voivodeship, Poland (On-Site)
3 Weeks ago
Revolgy - Customer Support Engineer—AWS, Kubernetes (remote Europe)

Revolgy

United Kingdom (Remote)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Software Engineer, Model Inference

ByteDance

Seattle, Washington, United States (On-Site)
2 Months ago
ByteDance - Research Scientist in Foundation Model, Speech Understanding - 2024 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Microsoft - Senior Researcher

Microsoft

Redmond, Washington, United States (On-Site)
1 Week ago
PlayStation Global - Machine Learning Engineer for Game Technology

PlayStation Global

Aliso Viejo, California, United States (On-Site)
9 Months ago
Google - Senior Staff Software Engineer, Search, Ads Query Understanding

Google

Mountain View, California, United States (On-Site)
1 Week ago
NVIDIA - Senior Python Software Engineer, Security

NVIDIA

Bengaluru, Karnataka, India (Hybrid)
1 Week ago
ByteDance - Machine Learning Engineer-Model Serving Infrastructure (AML-Engine)

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
NVIDIA - Senior Site Reliability Engineer - AI Research Clusters

NVIDIA

Pune, Maharashtra, India (On-Site)
1 Week ago
Meta - Research Engineer (Robotics)

Meta

Menlo Park, California, United States (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

Jobs in Warsaw, Masovian Voivodeship, Poland

Evolution - UI/UX Designer

Evolution

Warsaw, Masovian Voivodeship, Poland (Hybrid)
3 Months ago
CD PROJEKT RED - DevOps Engineering Manager

CD PROJEKT RED

Warsaw, Masovian Voivodeship, Poland (On-Site)
8 Months ago
Fool's Theory - Employment & Payroll Specialist

Fool's Theory

Poland (Remote)
3 Months ago
Google - Software Engineer II, Chrome OS Commercial App Solutions

Google

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
1 Week ago
11 bit studios - Sound Designer

11 bit studios

Warsaw, Masovian Voivodeship, Poland (Hybrid)
4 Weeks ago
Hawk Eye Innovations - Match Operations Assistant - Kraków

Hawk Eye Innovations

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
1 Day ago
PwC - Starszy Konsultant / Starsza Konsultantka | Audyt

PwC

Warsaw, Masovian Voivodeship, Poland (Hybrid)
6 Months ago
N-iX - Senior .NET Fullstack Engineer

N-iX

Poland (Hybrid)
1 Week ago
Google - Senior Software Engineer, Engineering Productivity, Chrome OS

Google

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
1 Week ago
CD PROJEKT RED - Illustration Team Lead

CD PROJEKT RED

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

ION - Site Reliability Engineer

ION

Pisa, Tuscany, Italy (Hybrid)
6 Months ago
Crunchyroll - DevOps Engineer, Core Infrastructure Engineering

Crunchyroll

San Francisco, California, United States (Hybrid)
1 Month ago
Cadence - Senior Cloud Platform Architect

Cadence

San Jose, California, United States (On-Site)
6 Months ago
Cognite - Senior Back-end Engineer

Cognite

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
Revenera - Senior Site Reliability Engineer

Revenera

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
The Walt Disney Company - Lead Software Engineer - Big Data Infrastructure

The Walt Disney Company

Washington, United States (On-Site)
1 Week ago
Normalyze - Lead DevOps Engineer - Enterprise Cybersecurity - SaaS - Bay Area, CA

Normalyze

California, United States (Remote)
6 Months ago
ION - Senior DevSecOps Engineer, Italy

ION

London, England, United Kingdom (On-Site)
6 Months ago
Aristocrat Gaming - CI/CD Specialist

Aristocrat Gaming

Montreal, Quebec, Canada (Hybrid)
4 Weeks ago
Miniclip - Senior Cloud Database Engineer

Miniclip

Lisbon, Lisbon, Portugal (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Yokne'am Illit, North District, Israel (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug