Senior Engineering Manager - Critical Operations and Reliability Engineering

58 Minutes ago • 4-8 Years • Operations • $480,000 PA - $1,200,000 PA

Job Summary

Job Description

As Senior Engineering Manager for Critical Operations and Reliability Engineering (C.O.R.E.), you'll lead Netflix's central SRE team, defining and driving reliability practices for all consumer-facing applications. Responsibilities include setting the strategic vision for system reliability, observability, and scalability; managing high-severity incidents; driving down operational costs; and collaborating with various teams to integrate reliability practices into the SDLC. You will mentor the C.O.R.E. team, ensuring services are reliable, scalable, and efficient, impacting member experience and revenue across multiple platforms (SVOD, Live, Ads, Games). The role requires strong leadership, incident management expertise, and a deep understanding of distributed systems and cloud platforms.
Must have:
  • Senior leadership experience in SRE
  • High-pressure incident management
  • Experience with high-scale cloud platforms
  • Distributed systems, networking, software engineering expertise
  • Collaboration and stakeholder influence
Perks:
  • Comprehensive health plans
  • Mental health support
  • 401(k) retirement plan
  • Stock options
  • Disability programs
  • Family-forming benefits
  • Paid time off

Job Details

Netflix is one of the world's leading entertainment services, with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

About Netflix

Netflix is revolutionizing entertainment by connecting people with movies and television globally through outstanding content and technological innovation. Our Infrastructure Engineering team provides the backbone for Netflix products, including Streaming Video on Demand (SVOD), Live, Ads, Games, and more, by building and operating an efficient, scalable, secure, and easy-to-use development platform and content delivery network. Join us as we push the boundaries of scale, performance, and resilience, empowering developers to create groundbreaking applications on a reliable platform.

Reliability engineering operates in a federated model at Netflix, with central teams building standard reliability practices and tooling that is leveraged across Streaming, Live, Ads, and Games teams. The federated model allows for a centralized approach to reliability while empowering domain-specific SRE teams to address unique challenges within their areas.

Role Overview

C.O.R.E is the central SRE team within Infrastructure Engineering that defines and drives reliability practices for all consumer-facing app development teams. The C.O.R.E team's mission is to improve the availability and reliability of Netflix's infrastructure while enhancing the operational readiness of its engineering culture, focusing on incident management and operational excellence. 

As the Senior Manager of the CORE Site Reliability Engineering (SRE) team, you will lead the integration of Netflix's SRE model with industry-leading best practices. You will define and drive reliability practices for all consumer-facing product teams, ensuring that our services are reliable, scalable, and efficient. This role is pivotal in ensuring the reliability and performance of Netflix's services, driving innovation, and optimizing system operations to support the company's mission of revolutionizing entertainment. 

Role Responsibilities

  • Strategic Leadership: You will lead & mentor the C.O.R.E SRE team while also setting the strategic vision and technical direction for worldclass system reliability, observability, and scalability.

  • Reliability: Your leadership will enable consumer-facing product teams to adopt standardized strategies for meeting reliability targets (eg SLO/SLI, error budgets etc).

  • Incident Management: You will manage high-severity incidents impacting Member Experience and/or Revenue across {SVOD, Live, Ads, Games}, conduct post-incident reviews, and provide ongoing incident trend analysis to prevent recurrence and improve system architecture.

  • Operational Excellence: You will drive down the operational cost of service ownership by optimizing system reliability and scalability via resilience experiments.

  • Automation and Tools: You will use both toward outcomes like easier deployment, monitoring, indicent response, alerting, resolution, etc.

  • Collaboration and Integration: You'll work closely with SREs, Dev teams, and Service owners to integrate reliability practices into SDLC and manage shared accountability for service health.

Requirements

  • Proven experience in a Senior Leadership Role within Site Reliability Engineering or a related domain.

  • Substantial experience commanding high-pressure and large-scale incidents.

  • Being open to participating in an on-call rotation, with shifts covering 24/7.

  • Extensive experience with high scale cloud platforms with a strong understanding of distributed systems, networking, and software engineering.

  • Experience working in a collaborative environment, influencing stakeholders across various levels of the organization. Ability to build strong relationships with engineering, product, and business teams.

  • Strong problem-solving abilities and a proactive approach to challenges.

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related fields or equivalent work experience.

Our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $480,000 - $1,200,000
 

Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more detail about our Benefits.

is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.

We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

Job is open for no less than 7 days and will be removed when the position is filled.

Similar Jobs

Brillio - Enterprise Architect, Azure - R01535036

Brillio

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Netflix - Software Engineer L5 - Linux Kernel Developer

Netflix

United States (Remote)
1 Month ago
Netflix - Software Engineer L6 - Server Platform Architect

Netflix

United States (Remote)
52 Minutes ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

Boston, Massachusetts, United States (On-Site)
1 Day ago
ByteDance - Research Scientist Intern (Traffic Infrastructure Global Engineering)

ByteDance

San Jose, California, United States (On-Site)
3 Weeks ago
Crunchyroll - Customer Experience Operations Analyst

Crunchyroll

San Francisco, California, United States (On-Site)
2 Months ago
Rank group - F & B Host

Rank group

Sunderland, England, United Kingdom (On-Site)
3 Months ago
Blazesoft - Online Casino Program Manager

Blazesoft

Concord, California, United States (On-Site)
9 Months ago
Tesla - Territory Manager, Energy Service, EMEA West

Tesla

North Holland, Netherlands (On-Site)
2 Months ago
Tesla - Area Sales Manager / Regionalleitung, Nürnberg (m/w/d)

Tesla

Nuremberg, Bavaria, Germany (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Senior Technical Project Management - Edge Cloud Infrastructure - San Jose / Seattle / Boston

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

Boston, Massachusetts, United States (On-Site)
1 Day ago
Netflix - CDN Site Reliability Engineer (SRE) L4/L5

Netflix

California, United States (Remote)
3 Months ago
ByteDance - Technical Project Management Lead - Edge Cloud Infrastructure - San Jose / Seattle / Boston

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
ByteDance - Research Scientist Intern (Traffic Infrastructure Global Engineering)

ByteDance

San Jose, California, United States (On-Site)
1 Day ago
ByteDance - SRE and DevOps Tech Lead - Edge Cloud Infrastructure - London

ByteDance

London, England, United Kingdom (On-Site)
4 Months ago
ByteDance - Software Engineer Intern (Traffic Infrastructure Global Engineering-CDN)

ByteDance

Seattle, Washington, United States (On-Site)
1 Day ago
Netflix - Software Engineer L6 - Server Platform Architect

Netflix

United States (Remote)
52 Minutes ago
Netflix - Analytics Engineer (L5) - Product

Netflix

United States (Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in United States

The Walt Disney Company - Senior Systems Engineer

The Walt Disney Company

New York, New York, United States (On-Site)
4 Months ago
Twitch - Data Scientist - Analytics

Twitch

San Francisco, California, United States (On-Site)
4 Days ago
NVIDIA - Networking Architect

NVIDIA

Santa Clara, California, United States (On-Site)
1 Week ago
Nintendo - Instructional Designer

Nintendo

Redmond, Washington, United States (On-Site)
1 Day ago
Netflix - Product Manager, Production Operations Core - Production Data & Workflows

Netflix

Los Angeles, California, United States (On-Site)
1 Hour ago
Scientific Games  - Field Service Technician

Scientific Games

North Carolina, United States (On-Site)
21 Hours ago
Valve corporation - 3D Environment Artist

Valve corporation

Bellevue, Washington, United States (On-Site)
5 Months ago
Canva - Revenue Accounting Manager, Enterprise Sales

Canva

Seattle, Washington, United States (Remote)
1 Month ago
Luxoft - Senior Software Support Engineer

Luxoft

Italy, New York, United States (Remote)
4 Months ago
ByteDance - Machine Learning Engineer Intern (Knowledge Graph) - 2024 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Operations Jobs

SmileGate - General Affairs Manager (Team Leader)

SmileGate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
1 Day ago
ByteDance - Individual Income Tax Lead - HR Operations - Singapore

ByteDance

Singapore (On-Site)
4 Months ago
Keywords Studios - Senior Operations Manager

Keywords Studios

Suginami City, Tokyo, Japan (Hybrid)
2 Weeks ago
Tesla - Senior Business Planning Coordinator

Tesla

Zug, Zug, Switzerland (On-Site)
2 Months ago
Probably Monsters - Lead Site Reliability Engineer

Probably Monsters

Dallas, Texas, United States (On-Site)
1 Month ago
Tesla - Service Advisor

Tesla

Berlin, Berlin, Germany (On-Site)
2 Months ago
AGS - American Gaming Systems - Field Service Supervisor

AGS - American Gaming Systems

Reno, Nevada, United States (On-Site)
2 Days ago
CD PROJEKT RED - Producer, Franchise Management

CD PROJEKT RED

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Week ago
Tesla - Senior Operations Specialist (Accounts Receivable), Leasing

Tesla

Manchester, England, United Kingdom (On-Site)
2 Months ago
Tesla - Delivery Manager

Tesla

Maurepas, Île-de-France, France (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Netflix is one of the world's leading entertainment services with over 247 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

Singapore, Singapore (On-Site)

Amsterdam, North Holland, Netherlands (On-Site)

Sydney, New South Wales, Australia (On-Site)

London, England, United Kingdom (On-Site)

Seoul, South Korea (On-Site)

Singapore, Singapore (On-Site)

Warsaw, Masovian Voivodeship, Poland (Hybrid)

Warsaw, Masovian Voivodeship, Poland (On-Site)

Rome, Lazio, Italy (On-Site)

Mexico City, Mexico City, Mexico (On-Site)

View All Jobs

Get notified when new jobs are added by Netflix

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug