Senior Site Reliability Engineering Manager, Production Engineering

2 Months ago • All levels • Product Management

Job Summary

Job Description

As the Senior Engineering Manager for our Production Engineering SRE team, you will lead a group of skilled engineers responsible for the design and management of large-scale, highly available distributed systems in the cloud. You will collaborate directly with application development teams to enhance the reliability, performance, and security of our platform. The role involves leading and mentoring a high-performing team, developing and implementing strategies to improve platform reliability, security, and performance, and overseeing the design and implementation of scalable operations tooling. You will also ensure effective management of incident response, lead efforts to automate production operations, and partner with teams to enhance the security posture of systems, while working closely with software development teams to optimize architecture and services for availability and performance.
Must have:
  • Lead and scale SRE teams in a fast-paced environment.
  • Deep knowledge of site reliability principles, including incident response and SLOs.
  • Expert-level knowledge of Kubernetes and its ecosystem.
  • Strong understanding of cloud platforms, preferably AWS.
  • Experience with microservices architecture and distributed systems.
Good to have:
  • Strong communication and leadership skills.
  • Demonstrated ability in SRE/DevOps.
  • Background in security engineer or DevSecOps
  • Familiarity with CNCF tools.

Job Details

Please note that we have a hybrid approach to work and would like to find someone who can come into the office in London at least one day a week

Who We Are

Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud, Internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues – before they impact end-user experiences.

ThousandEyes is deeply integrated across the entire Cisco technology portfolio and beyond, helping customers deploy at scale while also delivering AI-powered assurance insights within Cisco’s leading Networking, Security, Collaboration, and Observability portfolios.

About The Role

As the Senior Engineering Manager for our Production Engineering SRE team, you will lead a group of skilled engineers responsible for the design and management of large-scale, highly available distributed systems in the cloud, collaborating directly with application development teams to enhance the reliability, performance, and security of our platform.. You'll focus on enhancing the reliability, performance, and security of our platform while collaborating with cross-functional teams to drive operational excellence.

What You’ll Do

Team Leadership and Development:

  • Build and mentor a high-performing team of Site Reliability Engineers that embed with application development teams
  • Foster a culture of continuous learning, innovation, and best practices
  • Manage performance, set goals, and provide career development opportunities

Strategic Planning and Execution:

  • Develop and implement strategies to improve platform reliability, security, and performance
  • Collaborate with other engineering leaders to align SRE initiatives with overall business objectives
  • Establish and execute on a roadmap to build common platform solutions to reliability, security, and scale challenges engineering teams at ThousandEyes face.

Operational Excellence:

  • Oversee the design and implementation of scalable operations tooling for SREs and Developers
  • Ensure the effective management of our 24x7 incident response and on-call rotation
  • Lead efforts to automate production operations and adopt robust monitoring solutions

Security and Compliance:

  • Partner with application development teams and other platform engineering teams to enhance the security posture of our containerized and cloud-native systems
  • Ensure compliance with Cisco and industry standards for data protection, scanning, and system security

Cross-functional Collaboration:

  • Work closely with software development teams to optimize architecture and services for availability and performance
  • Collaborate with product management to align SRE initiatives with product roadmaps
  • Represent the Production Engineering SRE team in cross-functional meetings and initiatives

Minimum Qualifications

  • Proven track record of leading and scaling SRE teams in a fast-paces environment
  • Deep knowledge of site reliability principles, including incident response, change management, and SLOs
  • Expert-level knowledge of Kubernetes and its ecosystem
  • Strong understanding of cloud platforms, preferably AWS
  • Experience with microservices architecture and distributed systems

Preferred Qualifications

  • Strong communication and leadership skills, with the ability to influence cross-function stakeholders
  • Demonstrated ability in SRE, DevOps, or related fields, with at least 3 years in a management role
  • Background in security engineer, DevSecOps or a strong understanding of security best practices in cloud-native environments
  • Familiarity with CNCF tools such as Prometheus, OpenTelemetry, and ArgoCD

Cisco values the perspectives and skills that emerge from employees with diverse backgrounds. That's why Cisco is expanding the boundaries of discovering top talent by not only focusing on candidates with educational degrees and experience but also placing more emphasis on unlocking potential. We believe that everyone has something to offer and that diverse teams are better equipped to solve problems, innovate, and create a positive impact.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification. Research shows that people from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy. We urge you not to prematurely exclude yourself and to apply if you're interested in this work.

Similar Jobs

fortis games - Sr Analytics Engineer

fortis games

Hungary (Remote)
1 Month ago
Wrike - Customer Success Manager (German)

Wrike

Prague, Prague, Czechia (Hybrid)
1 Month ago
Eventbrite - Account Executive

Eventbrite

United States (Remote)
2 Months ago
Speedway games - Senior Gamedev Recruiter

Speedway games

(Remote)
1 Month ago
Blinkhealth - Supervisor, Pharmacy Operations (Claims and Patient Outreach)

Blinkhealth

Pittsburgh, Pennsylvania, United States (On-Site)
2 Months ago
Fandom  - Product Analyst

Fandom

New York, New York, United States (Remote)
2 Months ago
attentive - Lead Product Manager, Data Monetization

attentive

United States (Remote)
1 Month ago
Sporty - CS Saas Product Manager

Sporty

(Remote)
1 Year ago
Mattel Inc - Associate Manager Marketing - Product Marketing

Mattel Inc

El Segundo, California, United States (On-Site)
2 Months ago
G5 games - Strategic Product Analyst

G5 games

(Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Site Core - Field Marketing, Marketing Coordinator - Asia

Site Core

Singapore (On-Site)
1 Month ago
Amber - 3D Level Designer (Project Based)

Amber

Brazil (On-Site)
11 Months ago
tic toe games - Senior Game Producer

tic toe games

Burbank, California, United States (On-Site)
1 Month ago
Zinnia - Senior Director - Client Account & Services – Life & Annuity

Zinnia

Alpharetta, Georgia, United States (Hybrid)
3 Weeks ago
Sonar Source - Atlassian Administrator

Sonar Source

Bochum, North Rhine-Westphalia, Germany (On-Site)
5 Months ago
Capgemini - Associate Consultant

Capgemini

Hyderabad, Telangana, India (On-Site)
2 Months ago
Interactive Brokers - Compliance/Legal Associate – Agreements and Disclosure Management (Temp)

Interactive Brokers

Chicago, Illinois, United States (Hybrid)
1 Month ago
Wargaming - Lead Level Artist

Wargaming

Prague, Czechia (On-Site)
4 Weeks ago
Adyen - Demand Generation Manager

Adyen

Shanghai, China (On-Site)
1 Month ago
NinjaVan - Key Account Management Manager

NinjaVan

Ho Chi Minh City, Vietnam (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

Reddit - Content Strategist Weekend, India Focus (Contractor)

Reddit

London, England, United Kingdom (Remote)
1 Month ago
SEGA - Junior Licensing Artist

SEGA

England, United Kingdom (Hybrid)
3 Weeks ago
Ion - Trading Systems Engineer - 9320

Ion

London, England, United Kingdom (On-Site)
9 Months ago
Marsh McLennan - Energy and Power Insurance Client Executive

Marsh McLennan

London, England, United Kingdom (Hybrid)
1 Month ago
Qualcomm - Staff RFIC Physical Design Engineer

Qualcomm

Farnborough, England, United Kingdom (On-Site)
1 Month ago
Haleon - Team Assistant

Haleon

United Kingdom (On-Site)
2 Weeks ago
d3t - Compliance Analyst

d3t

Gateshead, England, United Kingdom (Hybrid)
3 Months ago
Ansys - QA Engineer

Ansys

Cambridge, England, United Kingdom (On-Site)
2 Months ago
Sega (UK) - Senior AI Designer

Sega (UK)

Horsham, England, United Kingdom (Hybrid)
1 Week ago
Apexon - Data Engineer

Apexon

Birmingham, England, United Kingdom (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Product Management Jobs

Corsair gaming - Product Manager

Corsair gaming

New Taipei City, Taiwan (On-Site)
2 Months ago
bytedance - Senior Product Manager - Cloud Security

bytedance

Singapore (On-Site)
8 Months ago
Sonar Source - Product Marketing Manager (Code Security)

Sonar Source

Geneva, Geneva, Switzerland (On-Site)
2 Months ago
Tesla - Automation Technician in Maintenance for Battery Cell Production

Tesla

Brandenburg, Germany (On-Site)
5 Months ago
Tencent - Production Director

Tencent

Palo Alto, California, United States (On-Site)
8 Months ago
Jane Street - Production Engineer

Jane Street

Singapore (On-Site)
1 Month ago
Clearwater Analytics - Senior Product Manager

Clearwater Analytics

Noida, Uttar Pradesh, India (On-Site)
1 Month ago
cirrus logic - Product Engineer - Reliability

cirrus logic

Austin, Texas, United States (On-Site)
6 Months ago
CharacterAI - Software Engineer, Core Product

CharacterAI

Menlo Park, California, United States (On-Site)
3 Months ago
Monzo - Group Product Manager, Business Banking

Monzo

London, England, United Kingdom (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

The name ThousandEyes was born from two big ideas: the power to see things not ordinarily possible and the ability to collect insights from a multitude of vantage points. As organizations rely more on cloud services and the Internet, the network has become a black box they can't understand. ThousandEyes gives organizations visibility into the now borderless network, arming them with an accurate understanding of how the network impacts their applications, users and customers. ThousandEyes is used by some of the world's largest and fastest growing brands, including all of the top 5 global software companies, 5 of the top 6 US banks, and 45 of the Fortune 500.

Lisbon, Lisbon, Portugal (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

London, England, United Kingdom (Hybrid)

Lisbon, Lisbon, Portugal (On-Site)

Lisbon, Lisbon, Portugal (Hybrid)

London, England, United Kingdom (Hybrid)

Sydney, New South Wales, Australia (On-Site)

View All Jobs

Get notified when new jobs are added by Thousand Eyes

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug