Senior Software Engineer, Observability and AIOps

1 Month ago • 10 Years + • Network Engineering • $168,000 PA - $322,000 PA

Job Summary

Job Description

NVIDIA seeks a Senior Software Engineer for Observability and AIOps to design, develop, and deploy an AIOps platform leveraging machine learning and other AI techniques to address network operations challenges (anomaly detection, root cause analysis, incident management, automation). The role involves implementing observability principles (monitoring, logging, tracing, alerting), building automation for monitoring and triaging, integrating with various service APIs, and breaking down manual processes into reusable software modules. The ideal candidate possesses 10+ years of network architecture and automation experience, expertise in large-scale enterprise networks, and proficiency in Python, Bash, and SQL. This role is crucial for supporting NVIDIA's software development workflows and tools across various domains.
Must have:
  • 10+ years network architecture and automation experience
  • Experience with Arista, Fortinet, Juniper, Mellanox
  • Proficiency in Python, Bash, SQL
  • AIOps platform design & development
  • Machine learning application for network operations
Perks:
  • Equity
  • Benefits

Job Details

Imagine a world where the network is self-managed, and self-healing, and requires minimal manual intervention to sustain business operations. A world where the network learns from past events to recommend actions to users. Or better yet, a network that proactively prevents actions with high probability of causing disruption. This network is advanced and intelligent where disruptions are minimized and emerging technology is easily integrated to maintain a first-class service for our business. If that sounds exciting, NVIDIA is looking for a Network Software Engineer to develop a smart network infrastructure.

The goal is to craft a reliable, scalable and efficient network to support NVIDIA software development workflows and tools, including CI/CD pipelines, compute resource management flow and developer productivity tools. The network is serving the needs across the whole software stack for NVIDIA from Graphics Drivers to Autonomous Vehicles to Deep Learning frameworks. To achieve this goal, we are looking for an engineer who has a deep understanding of L3 underlay and overlay networks, outstanding design skills and a track record in automating and delivering large-scale networks.

What you'll be doing:

  • Lead the design, development, testing, and deployment of an AIOps platform

  • Apply machine learning, deep learning, natural language processing, and other AI techniques to solve network operations challenges such as anomaly detection, root cause analysis, incident management, and automation

  • Improve network operations by defining and measuring AIOps metrics such as accuracy, reliability, scalability, performance, and efficiency

  • Experience in implementing observability principles and practices such as monitoring, logging, tracing, and alerting

  • Deep Knowledge in data science engineering such as data collection, data cleaning, data analysis, data modeling, and data visualizations

  • Build services to automate monitoring and triaging activities and provide critical information to facilitate response and resolution of performance issues and incidents

  • Build automation which recognizes, troubleshoots, and analyzes system disruptions and develop solutions for improved reliability

  • Owning and driving integrations with various service APIs such as Cloud Service Providers, to automate creation of environments and auto populate data sources in turn. Breakdown targeted manual processes into reusable software modules that can be integrated as code

What we need to see:

  • 10+ years of network architecture and automation experience

  • PhD or equivalent experience plus proven track record in architecting and automating large scale enterprise grade networks for several types of organizations.

  • Familiarity and hands-on experience with Arista, Fortinet, Juniper, and Mellanox

  • Strong track record of implementing network services in a variety of distributed computing environments

  • Hands-on experience with high performance network and network optimization in highly-available, large-scale, multi-site, international environments

  • Hands-on experience with contributing to tooling and automation for provisioning, monitoring, and managing network infrastructure

  • Must be able to read, write and review automation code (Python, Bash, SQL, etc.) Uses independent judgment & a high level of innovation to set company-level technology strategies & processes to accomplish objectives

  • Must have strong interpersonal and organizational skills, including the ability to meet deadlines, work in a team environment, follow written policies and procedures, and maintain superior customer service at all times

We have some of the most forward-thinking people in the world working for us and, due to unprecedented growth, our business development teams are rapidly growing. If you're creative and autonomous with a real passion for your work, we want to hear from you.

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for great people like you to help us accelerate the next wave of artificial intelligence.

The base salary range is 168,000 USD - 322,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

ION - Senior DevSecOps Engineer, Italy

ION

Pisa, Tuscany, Italy (On-Site)
• 4 Months ago
King - Site Reliability Engineer | Core Platform

King

(On-Site)
• 4 Days ago
Rackspace Technology - Senior Cloud Engineer (AWS)

Rackspace Technology

Alexandria, Alexandria Governorate, Egypt (Remote)
• 1 Week ago
Nielsen Holdings - DevOps Engineer (Terraform, Jenkins, GitLab CI/CD, Python, Airflow)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
• 4 Months ago
NVIDIA - SOC Clock Distribution Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
• 1 Month ago
ByteDance - Senior Site Reliability Engineer

ByteDance

San Jose, California, United States (On-Site)
• 1 Month ago
ByteDance - Datacenter Operations Engineer (DCO), Infrastructure Engineering

ByteDance

Singapore (On-Site)
• 3 Months ago
ByteDance - Software Development Engineer, Network Automation - San Jose

ByteDance

San Jose, California, United States (On-Site)
• 3 Months ago
Cloud Imperium Games - Senior Network Programmer

Cloud Imperium Games

Manchester, England, United Kingdom (On-Site)
• 2 Months ago
Netflix - Data Center Deployment Engineer L4/L5

Netflix

Los Gatos, California, United States (On-Site)
• 1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

SSC Technologies - Principal SRE

SSC Technologies

New York, New York, United States (On-Site)
• 4 Months ago
ION - Senior DevSecOps Engineer, Italy

ION

Collecchio, Emilia-Romagna, Italy (On-Site)
• 4 Months ago
Playrix - Senior Release Support Engineer

Playrix

Armenia (Remote)
• 3 Months ago
Intel Corporation - Sr. Infrastructure Engineer - Storage

Intel Corporation

Hillsboro, Oregon, United States (On-Site)
• 2 Months ago
Rackspace Technology - Full Stack Developer

Rackspace Technology

Mexico City, Mexico City, Mexico (Remote)
• 1 Week ago
Rackspace Technology - Senior AWS Migration Engineer

Rackspace Technology

Gurugram, Haryana, India (Remote)
• 3 Days ago
Electronic Arts - System Engineer

Electronic Arts

Hyderabad, Telangana, India (On-Site)
• 6 Months ago
Hitachi - Solution Architect

Hitachi

San José, San José Province, Costa Rica (On-Site)
• 4 Months ago
Playrix - Senior Release Engineer

Playrix

Cyprus (Remote)
• 3 Months ago
NVIDIA - Senior System Level Testability Lead

NVIDIA

Santa Clara, California, United States (Hybrid)
• 1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in California, United States

ByteDance - Research Scientist, Reinforcement Learning

ByteDance

Seattle, Washington, United States (On-Site)
• 3 Months ago
Framestore - Los Angeles Launchpad Internship 2025 | CG

Framestore

Los Angeles, California, United States (Hybrid)
• 5 Days ago
Sophic Synergistics LLC - Human Factors Specialist Aerospace Focused

Sophic Synergistics LLC

Houston, Texas, United States (On-Site)
• 7 Months ago
Passive Logic - Electronics Lab Manager

Passive Logic

Salt Lake City, Utah, United States (On-Site)
• 4 Months ago
Netflix - Research Engineer L4/L5 -LLMs for Search, Recommendations, and Personalization

Netflix

Los Gatos, California, United States (On-Site)
• 3 Months ago
Nintendo - Machine Learning Operations Engineer

Nintendo

Redmond, Washington, United States (On-Site)
• 1 Week ago
Meta - Game Design Manager

Meta

New York, New York, United States (Remote)
• 3 Months ago
Trek - Sales Associate - Full Time

Trek

California, United States (On-Site)
• 1 Month ago
Onward Search - Inside Sales Representative

Onward Search

Raleigh, North Carolina, United States (On-Site)
• 1 Month ago
Axon - Senior Privacy Engineer

Axon

Scottsdale, Arizona, United States (Hybrid)
• 2 Months ago

Get notifed when new similar jobs are uploaded

Network Engineering Jobs

PlayStation Global - Senior Software Engineer (Network Solutions)

PlayStation Global

Aliso Viejo, California, United States (On-Site)
• 6 Months ago
Kojima Productions - Network Programmer

Kojima Productions

Tokyo, Japan (On-Site)
• 2 Months ago
ByteDance - Software Engineer (Distributed Block Storage), Cloud Infrastructure

ByteDance

Singapore (On-Site)
• 1 Week ago
ByteDance - Software Engineer Intern (Network Engineering) - 2025 Summer (PhD)

ByteDance

San Jose, California, United States (On-Site)
• 3 Months ago
Larian Studios - Lead Security & Network Engineer

Larian Studios

Guildford, England, United Kingdom (On-Site)
• 1 Day ago
Assystems - Network Administrator - L2

Assystems

Gurugram, Haryana, India (On-Site)
• 3 Months ago
NVIDIA - Senior HPC Technical Support Engineer - Ethernet

NVIDIA

Durham, North Carolina, United States (On-Site)
• 2 Weeks ago
Meta - Network Production Engineer, Network Infrastructure

Meta

Menlo Park, California, United States (On-Site)
• 3 Months ago
Activision - Senior Network Engineer

Activision

Vancouver, British Columbia, Canada (On-Site)
• 1 Week ago
CD PROJEKT RED - Engineering Director, Network

CD PROJEKT RED

Boston, Massachusetts, United States (On-Site)
• 6 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Yokne'am Illit, North District, Israel (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

United States (Remote)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Bengaluru, Karnataka, India (Hybrid)

Bengaluru, Karnataka, India (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug