Senior Engineering Manager - Critical Operations and Reliability Engineering

2 Weeks ago • 4-8 Years • Operations • $480,000 PA - $1,200,000 PA

Job Summary

Job Description

As Senior Engineering Manager for Critical Operations and Reliability Engineering (C.O.R.E.), you'll lead Netflix's central SRE team, defining and driving reliability practices for all consumer-facing applications. Responsibilities include setting the strategic vision for system reliability, observability, and scalability; managing high-severity incidents; driving down operational costs; and collaborating with various teams to integrate reliability practices into the SDLC. You will mentor the C.O.R.E. team, ensuring services are reliable, scalable, and efficient, impacting member experience and revenue across multiple platforms (SVOD, Live, Ads, Games). The role requires strong leadership, incident management expertise, and a deep understanding of distributed systems and cloud platforms.
Must have:
  • Senior leadership experience in SRE
  • High-pressure incident management
  • Experience with high-scale cloud platforms
  • Distributed systems, networking, software engineering expertise
  • Collaboration and stakeholder influence
Perks:
  • Comprehensive health plans
  • Mental health support
  • 401(k) retirement plan
  • Stock options
  • Disability programs
  • Family-forming benefits
  • Paid time off

Job Details

Netflix is one of the world's leading entertainment services, with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

About Netflix

Netflix is revolutionizing entertainment by connecting people with movies and television globally through outstanding content and technological innovation. Our Infrastructure Engineering team provides the backbone for Netflix products, including Streaming Video on Demand (SVOD), Live, Ads, Games, and more, by building and operating an efficient, scalable, secure, and easy-to-use development platform and content delivery network. Join us as we push the boundaries of scale, performance, and resilience, empowering developers to create groundbreaking applications on a reliable platform.

Reliability engineering operates in a federated model at Netflix, with central teams building standard reliability practices and tooling that is leveraged across Streaming, Live, Ads, and Games teams. The federated model allows for a centralized approach to reliability while empowering domain-specific SRE teams to address unique challenges within their areas.

Role Overview

C.O.R.E is the central SRE team within Infrastructure Engineering that defines and drives reliability practices for all consumer-facing app development teams. The C.O.R.E team's mission is to improve the availability and reliability of Netflix's infrastructure while enhancing the operational readiness of its engineering culture, focusing on incident management and operational excellence. 

As the Senior Manager of the CORE Site Reliability Engineering (SRE) team, you will lead the integration of Netflix's SRE model with industry-leading best practices. You will define and drive reliability practices for all consumer-facing product teams, ensuring that our services are reliable, scalable, and efficient. This role is pivotal in ensuring the reliability and performance of Netflix's services, driving innovation, and optimizing system operations to support the company's mission of revolutionizing entertainment. 

Role Responsibilities

  • Strategic Leadership: You will lead & mentor the C.O.R.E SRE team while also setting the strategic vision and technical direction for worldclass system reliability, observability, and scalability.

  • Reliability: Your leadership will enable consumer-facing product teams to adopt standardized strategies for meeting reliability targets (eg SLO/SLI, error budgets etc).

  • Incident Management: You will manage high-severity incidents impacting Member Experience and/or Revenue across {SVOD, Live, Ads, Games}, conduct post-incident reviews, and provide ongoing incident trend analysis to prevent recurrence and improve system architecture.

  • Operational Excellence: You will drive down the operational cost of service ownership by optimizing system reliability and scalability via resilience experiments.

  • Automation and Tools: You will use both toward outcomes like easier deployment, monitoring, indicent response, alerting, resolution, etc.

  • Collaboration and Integration: You'll work closely with SREs, Dev teams, and Service owners to integrate reliability practices into SDLC and manage shared accountability for service health.

Requirements

  • Proven experience in a Senior Leadership Role within Site Reliability Engineering or a related domain.

  • Substantial experience commanding high-pressure and large-scale incidents.

  • Being open to participating in an on-call rotation, with shifts covering 24/7.

  • Extensive experience with high scale cloud platforms with a strong understanding of distributed systems, networking, and software engineering.

  • Experience working in a collaborative environment, influencing stakeholders across various levels of the organization. Ability to build strong relationships with engineering, product, and business teams.

  • Strong problem-solving abilities and a proactive approach to challenges.

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related fields or equivalent work experience.

Our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $480,000 - $1,200,000
 

Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more detail about our Benefits.

is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.

We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

Job is open for no less than 7 days and will be removed when the position is filled.

Similar Jobs

ByteDance - Technical Project Management Lead - Edge Cloud Infrastructure - San Jose / Seattle / Boston

ByteDance

Seattle, Washington, United States (On-Site)
6 Months ago
ByteDance - Technical Account Manager (Edge Cloud)

ByteDance

San Jose, California, United States (On-Site)
2 Weeks ago
Netflix - Systems Software Engineer L4

Netflix

United States (Remote)
2 Weeks ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

Boston, Massachusetts, United States (On-Site)
2 Months ago
ByteDance - Research Scientist Intern (Traffic Infrastructure Global Engineering)

ByteDance

San Jose, California, United States (On-Site)
2 Weeks ago
Google - Senior Partner Engineer, Device Platform Operations, YouTube

Google

Bengaluru, Karnataka, India (On-Site)
2 Weeks ago
The Walt Disney Company - Director, Business Operations - Disney Advertising Sales

The Walt Disney Company

New York, New York, United States (On-Site)
3 Days ago
Cyara - Sales Operations Analyst – Data

Cyara

Hyderabad, Telangana, India (Hybrid)
5 Months ago
Tesla - Delivery Advisor

Tesla

Vienna, Vienna, Austria (On-Site)
2 Months ago
The Walt Disney Company - Assistant Store Manager

The Walt Disney Company

Minato City, Tokyo, Japan (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Google - Staff Product Manager, Subsea Cable Network

Google

Dublin, County Dublin, Ireland (On-Site)
1 Week ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

Seattle, Washington, United States (On-Site)
2 Weeks ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

Boston, Massachusetts, United States (On-Site)
2 Weeks ago
Netflix - Site Reliability Engineer L5 - Open Connect

Netflix

United States (Remote)
2 Months ago
ByteDance - Technical Account Manager (Edge Cloud)

ByteDance

San Jose, California, United States (On-Site)
2 Weeks ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Brillio - Azure Kubernetes Architect - R01530963

Brillio

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Brillio - Enterprise Architect, Azure - R01535036

Brillio

Bengaluru, Karnataka, India (Hybrid)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in United States

Oculus VR - Senior Level Designer - Sanzaru Game Studio

Oculus VR

San Mateo, California, United States (Remote)
1 Month ago
Google - Senior Software Engineer, Infrastructure, Google Cloud NetInfra

Google

Sunnyvale, California, United States (On-Site)
2 Weeks ago
Scale AI - Software Engineer, Frontend - Enterprise Gen AI

Scale AI

San Francisco, California, United States (On-Site)
1 Day ago
Global Step - Junior Recruiter (Game Testers)

Global Step

Richardson, Texas, United States (On-Site)
2 Weeks ago
Snloker AI - Software Engineer — Frontend

Snloker AI

San Francisco, California, United States (Hybrid)
1 Day ago
The Walt Disney Company - Broadcast Maintenance Engineer

The Walt Disney Company

Washington, District Of Columbia, United States (On-Site)
2 Months ago
ByteDance - Software Engineer, Multi Cloud CDN

ByteDance

San Jose, California, United States (On-Site)
3 Days ago
Supercell - Marketing Manager, LATAM

Supercell

San Francisco, California, United States (Hybrid)
6 Months ago
The EW. Scripps Company - Studio Tech II (PT)

The EW. Scripps Company

Denver, Colorado, United States (On-Site)
19 Hours ago
Inworld AI - Senior Software Development Engineer in Test (SDET)

Inworld AI

Mountain View, California, United States (On-Site)
8 Hours ago

Get notifed when new similar jobs are uploaded

Operations Jobs

Google - Global Vendor Operations Lead, Google Cloud

Google

Hyderabad, Telangana, India (On-Site)
2 Days ago
Google - Strategy and Operations Manager, Performance

Google

Tokyo, Japan (On-Site)
2 Weeks ago
Tesla - Service Team Lead, Order Preparation & Remote Diagnostics

Tesla

Hanover, Lower Saxony, Germany (On-Site)
2 Months ago
Google - Global Vendor Operations Lead, Google Cloud

Google

Taguig, Metro Manila, Philippines (On-Site)
2 Days ago
OKX - Senior Agent, Customer Service (German Speaker)

OKX

Budapest, Hungary (On-Site)
6 Months ago
Tesla - Delivery Supervisor - Hisings Backa, Gothenburg

Tesla

Västra Götaland County, Sweden (On-Site)
2 Months ago
Google - Risk Compliance Lead, Privacy and Security

Google

Austin, Texas, United States (On-Site)
2 Days ago
Hawk Eye Innovations - Match Operations Assistant - Newcastle

Hawk Eye Innovations

Newcastle Upon Tyne, England, United Kingdom (On-Site)
1 Week ago
Hawk Eye Innovations - Match Operations Assistant - Almaty

Hawk Eye Innovations

Almaty, Almaty Region, Kazakhstan (On-Site)
1 Week ago
Netflix - DSP Partner Operations Manager

Netflix

New York, New York, United States (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Netflix is one of the world's leading entertainment services with over 247 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

London, England, United Kingdom (On-Site)

Berlin, Berlin, Germany (On-Site)

Milan, Lombardy, Italy (On-Site)

Paris, Île-de-France, France (On-Site)

Seoul, South Korea (On-Site)

Los Angeles, California, United States (On-Site)

Los Gatos, California, United States (On-Site)

Pennsylvania, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Netflix

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug