Software Engineer L5, Model Observability & Lifecycle Management, Machine Learning Platform

4 Months ago • All levels • Artificial Intelligence • Research & Development • Backend Development • $100,000 PA - $619,000 PA

Job Summary

Job Description

As a Software Engineer L5 on the Model Observability & Lifecycle Management team, you'll build and enhance a centralized MLOps platform for managing ML models at Netflix. Responsibilities include developing observability dashboards, model registries, anomaly detection systems, and cost monitoring tools. You'll collaborate with various teams (engineers, product managers, ML engineers, data scientists) to improve ML/AI initiatives. Projects involve integrating with various MLP products, building API backends, SDK integrations, and enhancing user interfaces for improved usability. The role requires experience with backend distributed systems, object-oriented programming (Java preferred), web API frameworks (Spring Boot preferred), UI frameworks (React), and cloud platforms (AWS, Azure, or GCP).
Must have:
  • Backend distributed systems experience
  • Object-oriented programming (Java preferred)
  • Web API frameworks (Spring Boot preferred)
  • UI frameworks (React)
  • Cloud platforms (AWS, Azure, or GCP)
  • MLOps best practices
  • Cross-functional collaboration

Job Details

Netflix is one of the world's leading entertainment services, with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

The Model Observability & Lifecycle Management team’s centralized MLOps platform multiplies the productivity of both the Machine Learning Platform (MLP) organization and all ML practitioners across Netflix. We maintain the reliability of ML applications by building systems to catch and diagnose issues as soon as possible, sometimes before they even happen! 

We’re building a comprehensive and centralized system for managing ML models, featuring capabilities like visualization, observability, and performance benchmarking. Our paved path for MLOps will reduce redundancy, minimize operational overhead, and offer standardized workflows and UIs to researchers and infrastructure engineers throughout the company. 

​​We seek strong engineers to develop and expand our model observability and visualization workflows to support bandits, multi-task learning models, Large Language Models (LLMs), and other foundation models. Our tools and systems support and enable 100s of ML practitioners to develop some of Netflix’s most business-critical models across personalization, growth and commerce, ads, and studio algorithms. You will play a highly cross-functional role, partnering with other engineers, product managers, machine learning engineers, and data/research scientists to elevate our ML/AI initiatives and drive impactful innovation. 

Snapshot of projects you may work on:

  • Observability dashboard and corresponding backend system to integrate with various MLP products to enable ML practitioners to explore and discover ML entities (models, features, embeddings, pipelines, etc.) and monitor and operate them effectively

  • Model registry to catalog ML models and their versions to enable discoverability, including core model store functionality with an API backend and an SDK integration layer

  • Collaborate with cross-functional teams to implement anomaly and drift detection on models, features, embeddings, etc., automatically detecting and alerting on staleness and quality issues and suggesting or implementing fixes

  • Cost monitoring and chargeback dashboards to provide visibility into resource utilization and identify opportunities for efficiency improvements

  • Enhance our user interfaces to provide intuitive and seamless experiences for ML practitioners, incorporating feedback and best practices to improve usability and adoption.

We would love to work with you if:

  • You have experience building backend distributed systems and full-stack systems using object-oriented programming (preferably Java), web API frameworks (preferably Spring Boot), and UI frameworks like React.

  • You are experienced working with the public cloud like AWS, Azure, or GCP.

  • You have knowledge of ML model lifecycle management and MLOps best practices to support end-to-end development, deployment, and monitoring of ML models.

  • You proactively communicate with cross-functional teams to drive projects and promote best practices in observability and logging. 

  • You have a BS/MS in Computer Science, Applied Math, Engineering, or a related field.

Our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top-of-market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $100,000 - $619,000K

is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.

We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

Job is open for no less than 7 days and will be removed when the position is filled.

Similar Jobs

N-iX - Senior Full-Stack Engineer

N-iX

Ukraine (Hybrid)
4 Weeks ago
ION - Cloud Engineer Kubernetes

ION

Collecchio, Emilia-Romagna, Italy (Hybrid)
6 Months ago
ION - Lead Software Engineer, Italy

ION

Rome, Lazio, Italy (On-Site)
6 Months ago
Next Level Business Services - Full Stack Developer

Next Level Business Services

Jersey City, New Jersey, United States (On-Site)
6 Months ago
Fanatee - Data Science Intern

Fanatee

(On-Site)
9 Months ago
Zoox - Senior/Staff Motion Planning Engineer, Teleguidance

Zoox

Foster City, California, United States (Hybrid)
6 Months ago
NVIDIA - Senior Solution Architect, HPC and AI

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (Hybrid)
3 Months ago
ByteDance - Research Engineer Graduate (Vision AI Platform)

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
Canva - Senior Machine Learning Engineer - Canva UK

Canva

London, England, United Kingdom (Remote)
4 Months ago
NVIDIA - DGX Cloud Platform Software Engineer Intern - Fall 2025

NVIDIA

Santa Clara, California, United States (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

NinjaVan - Field Sales Executive West Java - Cirebon (Talent Pool)

NinjaVan

Cirebon, West Java, Indonesia (On-Site)
6 Months ago
Nielsen Holdings - Senior /Lead/ DOE-Full stack ( Java, Go lang, Ruby, Javascript, Reactjs, AWS, DBMS, Postgres)

Nielsen Holdings

Mumbai, Maharashtra, India (Hybrid)
6 Months ago
Niantic - Senior Software Engineer

Niantic

Zürich, Zurich, Switzerland (Hybrid)
4 Weeks ago
Next Level Business Services - Java Tech Lead -

Next Level Business Services

Scottsdale, Arizona, United States (On-Site)
5 Months ago
Nagarro - Associate Staff Engineer, Hybris

Nagarro

India (Remote)
6 Months ago
N-iX - Senior Fullstack Engineer (Focus on TypeScript)

N-iX

Ukraine (Hybrid)
4 Weeks ago
Nintendo - Associate Software Engineer

Nintendo

Redmond, Washington, United States (Hybrid)
3 Months ago
Google - Senior Software Engineer, Full Stack, Google Cloud

Google

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Next Level Business Services - UI Developer

Next Level Business Services

Redmond, Washington, United States (On-Site)
6 Months ago
The Walt Disney Company - Sr Systems Engineer

The Walt Disney Company

Celebration, Florida, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in United States

PlayStation Global - Production Director

PlayStation Global

United States (Remote)
1 Month ago
Universal Music - Senior Manager, Controls Assurance

Universal Music

California, United States (On-Site)
2 Months ago
PlayStation Global - Senior Director, IT Support

PlayStation Global

San Mateo, California, United States (On-Site)
4 Months ago
NVIDIA - System Architect

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
Glean - Data Science Lead, Product

Glean

Palo Alto, California, United States (On-Site)
5 Months ago
Next Level Business Services - Big Data Architect with IBM Big Insights

Next Level Business Services

St. Louis, Missouri, United States (Hybrid)
5 Months ago
ByteDance - Network Engineer Graduate (Tech Infra - IaaS) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
Nintendo - HRIS Analyst

Nintendo

Redmond, Washington, United States (Hybrid)
4 Weeks ago
AGS - American Gaming Systems - PR and Communications Manager

AGS - American Gaming Systems

Nevada, United States (On-Site)
2 Months ago
Thatgamecompany - ML Engineer

Thatgamecompany

United States (Remote)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Spell Brush - AI Anime Researcher

Spell Brush

Tokyo, Japan (On-Site)
3 Weeks ago
Quizizz - ML Engineer

Quizizz

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Google - Principal Engineer, Platform and Scale, Vertex AI

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
5 Months ago
NVIDIA - Senior Solutions Architect, Networking - Cloud Service Providers

NVIDIA

California, United States (Hybrid)
3 Weeks ago
Razer - Solutions Architect

Razer

Singapore (On-Site)
6 Months ago
Meta - AI Research Scientist, Language - Generative AI

Meta

New York, New York, United States (On-Site)
5 Months ago
DNEG - Head of Machine Learning

DNEG

London, England, United Kingdom (Remote)
1 Month ago
PlayStation Global - Sr. ML Software Engineer

PlayStation Global

United States (Remote)
4 Weeks ago
Microsoft - Member of Technical Staff, AI - Reinforcement Systems

Microsoft

London, England, United Kingdom (On-Site)
4 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Netflix is one of the world's leading entertainment services with over 247 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

London, England, United Kingdom (On-Site)

Berlin, Berlin, Germany (On-Site)

Milan, Lombardy, Italy (On-Site)

Paris, Île-de-France, France (On-Site)

Seoul, South Korea (On-Site)

Los Angeles, California, United States (On-Site)

Los Gatos, California, United States (On-Site)

Pennsylvania, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Netflix

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug