Senior Engineering Manager - Critical Operations and Reliability Engineering

1 Month ago • 4-8 Years • Operations • $480,000 PA - $1,200,000 PA

Job Summary

Job Description

As Senior Engineering Manager for Critical Operations and Reliability Engineering (C.O.R.E.), you'll lead Netflix's central SRE team, defining and driving reliability practices for all consumer-facing applications. Responsibilities include setting the strategic vision for system reliability, observability, and scalability; managing high-severity incidents; driving down operational costs; and collaborating with various teams to integrate reliability practices into the SDLC. You will mentor the C.O.R.E. team, ensuring services are reliable, scalable, and efficient, impacting member experience and revenue across multiple platforms (SVOD, Live, Ads, Games). The role requires strong leadership, incident management expertise, and a deep understanding of distributed systems and cloud platforms.
Must have:
  • Senior leadership experience in SRE
  • High-pressure incident management
  • Experience with high-scale cloud platforms
  • Distributed systems, networking, software engineering expertise
  • Collaboration and stakeholder influence
Perks:
  • Comprehensive health plans
  • Mental health support
  • 401(k) retirement plan
  • Stock options
  • Disability programs
  • Family-forming benefits
  • Paid time off

Job Details

Netflix is one of the world's leading entertainment services, with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

About Netflix

Netflix is revolutionizing entertainment by connecting people with movies and television globally through outstanding content and technological innovation. Our Infrastructure Engineering team provides the backbone for Netflix products, including Streaming Video on Demand (SVOD), Live, Ads, Games, and more, by building and operating an efficient, scalable, secure, and easy-to-use development platform and content delivery network. Join us as we push the boundaries of scale, performance, and resilience, empowering developers to create groundbreaking applications on a reliable platform.

Reliability engineering operates in a federated model at Netflix, with central teams building standard reliability practices and tooling that is leveraged across Streaming, Live, Ads, and Games teams. The federated model allows for a centralized approach to reliability while empowering domain-specific SRE teams to address unique challenges within their areas.

Role Overview

C.O.R.E is the central SRE team within Infrastructure Engineering that defines and drives reliability practices for all consumer-facing app development teams. The C.O.R.E team's mission is to improve the availability and reliability of Netflix's infrastructure while enhancing the operational readiness of its engineering culture, focusing on incident management and operational excellence. 

As the Senior Manager of the CORE Site Reliability Engineering (SRE) team, you will lead the integration of Netflix's SRE model with industry-leading best practices. You will define and drive reliability practices for all consumer-facing product teams, ensuring that our services are reliable, scalable, and efficient. This role is pivotal in ensuring the reliability and performance of Netflix's services, driving innovation, and optimizing system operations to support the company's mission of revolutionizing entertainment. 

Role Responsibilities

  • Strategic Leadership: You will lead & mentor the C.O.R.E SRE team while also setting the strategic vision and technical direction for worldclass system reliability, observability, and scalability.

  • Reliability: Your leadership will enable consumer-facing product teams to adopt standardized strategies for meeting reliability targets (eg SLO/SLI, error budgets etc).

  • Incident Management: You will manage high-severity incidents impacting Member Experience and/or Revenue across {SVOD, Live, Ads, Games}, conduct post-incident reviews, and provide ongoing incident trend analysis to prevent recurrence and improve system architecture.

  • Operational Excellence: You will drive down the operational cost of service ownership by optimizing system reliability and scalability via resilience experiments.

  • Automation and Tools: You will use both toward outcomes like easier deployment, monitoring, indicent response, alerting, resolution, etc.

  • Collaboration and Integration: You'll work closely with SREs, Dev teams, and Service owners to integrate reliability practices into SDLC and manage shared accountability for service health.

Requirements

  • Proven experience in a Senior Leadership Role within Site Reliability Engineering or a related domain.

  • Substantial experience commanding high-pressure and large-scale incidents.

  • Being open to participating in an on-call rotation, with shifts covering 24/7.

  • Extensive experience with high scale cloud platforms with a strong understanding of distributed systems, networking, and software engineering.

  • Experience working in a collaborative environment, influencing stakeholders across various levels of the organization. Ability to build strong relationships with engineering, product, and business teams.

  • Strong problem-solving abilities and a proactive approach to challenges.

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related fields or equivalent work experience.

Our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $480,000 - $1,200,000
 

Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more detail about our Benefits.

is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.

We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

Job is open for no less than 7 days and will be removed when the position is filled.

Similar Jobs

ByteDance - Technical Project Management Lead - Edge Cloud Infrastructure - San Jose / Seattle / Boston

ByteDance

Seattle, Washington, United States (On-Site)
7 Months ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

San Jose, California, United States (On-Site)
1 Month ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

San Jose, California, United States (On-Site)
7 Months ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

Boston, Massachusetts, United States (On-Site)
1 Month ago
Paytm - Associate - Logistics - Operation & Support

Paytm

Hyderabad, Telangana, India (On-Site)
5 Months ago
Hawk Eye Innovations - Head of Projects - Growth Sports

Hawk Eye Innovations

Basingstoke, England, United Kingdom (Hybrid)
2 Months ago
Google - Program Manager II, Compliance and Risk Management, Telecommunications

Google

Reston, Virginia, United States (On-Site)
1 Month ago
PlayStation Global - Program Manager - Studio Operations

PlayStation Global

Canada (Remote)
1 Month ago
Evolution - Norwegian Speaking Game Presenter

Evolution

Birkirkara, Malta (On-Site)
1 Year ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Brillio - Enterprise Architect, Azure - R01535036

Brillio

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
ByteDance - Research Scientist Intern (Traffic Infrastructure Global Engineering)

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
ByteDance - SRE and DevOps Tech Lead - Edge Cloud Infrastructure

ByteDance

London, England, United Kingdom (On-Site)
1 Month ago
Google - OTT Live Video Engineer

Google

San Bruno, California, United States (On-Site)
1 Month ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago
Netflix - CDN Site Reliability Engineer (SRE) L4/L5

Netflix

California, United States (Remote)
5 Months ago
Brillio - .NET Azure Architect - R01525011

Brillio

Pune, Maharashtra, India (Hybrid)
7 Months ago
Netflix - Software Engineer L5 - Linux Kernel Developer

Netflix

United States (Remote)
3 Months ago
ByteDance - Research Scientist Intern (Traffic Infrastructure Global Engineering)

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago
Google - Software Engineering Manager II, Google Cloud CDN

Google

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in United States

The EW. Scripps Company - Multimedia Journalist

The EW. Scripps Company

Tucson, Arizona, United States (On-Site)
1 Month ago
NVIDIA - Senior Software Architect, AI Networking

NVIDIA

Santa Clara, California, United States (Remote)
2 Months ago
Anavation LLC - Cybersecurity Analyst (Mobile)

Anavation LLC

Clarksburg, West Virginia, United States (On-Site)
2 Months ago
Drake Cooper - Social Media Specialist

Drake Cooper

Boise, Idaho, United States (Hybrid)
1 Month ago
Snorkel AI - Head of Applied AI

Snorkel AI

New York, New York, United States (Hybrid)
1 Month ago
Progres - Junior Tax Accountant

Progres

Burlington, Massachusetts, United States (Hybrid)
1 Month ago
Interactive Brokers - Cashiering Associate

Interactive Brokers

Greenwich, Connecticut, United States (Hybrid)
7 Months ago
Dassault Systèmes - Sr. Financial Analyst, FP&A

Dassault Systèmes

New York, New York, United States (Hybrid)
1 Month ago
Turtle Rock Studios - Audio Producer

Turtle Rock Studios

Irvine, California, United States (Hybrid)
2 Months ago
Netflix - Manager, International Corporate Legal

Netflix

Los Gatos, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Operations Jobs

Tesla - Service Manager - Leiria

Tesla

Leiria, Leiria District, Portugal (On-Site)
3 Months ago
Krafton  - Senior Product Manager - Live Operations

Krafton

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Patreon - Creator Strategy & Operations Manager

Patreon

San Francisco, California, United States (Hybrid)
2 Months ago
Microsoft - Customer Operations Manager

Microsoft

Tokyo, Japan (On-Site)
1 Month ago
Playtech - Online Games Assistant

Playtech

Magdalena Del Mar, Lima Province, Peru (On-Site)
4 Months ago
Playtech - Games Assistant

Playtech

Magdalena Del Mar, Lima Province, Peru (On-Site)
1 Month ago
Activate Games - Store Leader (Store Manager)

Activate Games

Sterling, Virginia, United States (On-Site)
2 Months ago
Hawk Eye Innovations - Baseball Tracking Systems Operator

Hawk Eye Innovations

Mexico City, Mexico City, Mexico (On-Site)
1 Month ago
Tesla - Delivery Advisor

Tesla

Bristol, England, United Kingdom (On-Site)
3 Months ago
People Can Fly - Live Operations Technician

People Can Fly

New York, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Paris, Île-de-France, France (On-Site)

Seoul, South Korea (On-Site)

Bogota, Colombia (On-Site)

Singapore, Singapore (On-Site)

Los Angeles, California, United States (On-Site)

Los Angeles, California, United States (On-Site)

Seoul, South Korea (On-Site)

Los Gatos, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Netflix

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug