Software Engineering Manager, Site Reliability, Cloud Incident Response

3 Months ago • 8-11 Years • DevOps

Job Summary

Job Description

As a Software Engineering Manager for Site Reliability and Cloud Incident Response at Google, you'll play a vital role in ensuring the dependability of Google Cloud Platform (GCP) for our customers. You will lead a team dedicated to responding to and mitigating major incidents across GCP, working closely with product teams, customer-facing teams, and stakeholders. Your responsibilities will include participating in on-call rotations for critical incident response, collaborating to ensure high-quality customer outcomes, developing incident management training and processes, building systems and tools to support the team, and proactively identifying and mitigating potential risks within Cloud infrastructure.
Must have:
  • Bachelor's degree or equivalent practical experience
  • 8 years of experience with software development
  • 3 years of experience in a technical leadership role
  • 2 years of experience in people management
Good to have:
  • Master's degree or PhD in Computer Science
  • Experience working in a changing organization

Job Details

Minimum qualifications:

  • Bachelor's degree or equivalent practical experience.
  • 8 years of experience with software development in one or more programming languages (e.g., Python, C, C++, Java, JavaScript).
  • 3 years of experience in a technical leadership role; overseeing projects, with 2 years of experience in a people management, supervision/team leadership role.

Preferred qualifications:

  • Master's degree or PhD in Computer Science, or a related technical field.
  • Experience working in a changing organization.

About the job

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance.

Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

The team's mission is to create a dependable experience for GCP customers. In this role, you will be responding to and helping to coordinate, mitigate, or resolve major incidents across all of GCP. The Cloud Incident Response Team supports the responders, tooling, and outcomes for GCP Major Incidents. The team collaborates across GCP products, customer facing teams, and a wide range of stakeholders.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

Responsibilities

  • Participate in on-call rotation supporting Critical Incident Response for GCP.
  • Focus on high-quality customer outcomes and continuous collaboration across GCP teams.
  • Create IMAG training and processes for incident management life-cycle and partnering with Cloud SRE UTLs, and Cloud Support leadership team.
  • Build systems and tooling to support the team, improve visibility, detection of issues, communications to customers, stakeholders, and customer facing teams.
  • Define and escalate risks in Cloud, reduce incident probabilities with strategic and tactical/pragmatic approaches as needed.

Similar Jobs

Warner Bros Games - Software Engineer II - Backend Developer (Release Engineering Team), Bangalore

Warner Bros Games

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
Google - Software Engineer III, GCP Foundation Services

Google

Kirkland, Washington, United States (On-Site)
3 Months ago
Sinch - Manager, Mediation Data Engineering

Sinch

Chicago, Illinois, United States (Hybrid)
3 Months ago
Resideo - Sr Engineer - Software QA

Resideo

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
Evolution - Game Mathematician Evolution Live

Evolution

Gothenburg, Västra Götaland County, Sweden (On-Site)
3 Months ago
Three Space Lab - DevOps/ Cloud Engineer

Three Space Lab

(Remote)
3 Months ago
Trend Micro - DevOps Engineer

Trend Micro

Manila, Metro Manila, Philippines (On-Site)
18 Years ago
Luxoft - DevOps + Java Engineer

Luxoft

Pune, Maharashtra, India (On-Site)
3 Months ago
Razer - Senior AWS Systems Administrator

Razer

Shah Alam, Selangor, Malaysia (On-Site)
4 Months ago
Luxoft - Senior/Lead DevOps Engineer

Luxoft

(Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

White Hat Gaming  - Scala Developer

White Hat Gaming

(Remote)
4 Months ago
Luxoft - Full Stack Developer

Luxoft

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (On-Site)
3 Months ago
ICE - Premier Services Engineer

ICE

Pune, Maharashtra, India (Hybrid)
6 Months ago
Paypal - Software Engineer- AI Data Governance

Paypal

San Jose, California, United States (Hybrid)
4 Months ago
Deliveroo - Software Engineer, Security

Deliveroo

Hyderabad, Telangana, India (On-Site)
4 Months ago
ByteDance - LLM Coding Trainer - Specialist

ByteDance

Singapore (On-Site)
3 Months ago
Solventum - Software Quality Test Engineer

Solventum

Bengaluru, Karnataka, India (On-Site)
6 Months ago
eBay - Senior Staff Backend Engineer (d/f/m)

eBay

Kleinmachnow, Brandenburg, Germany (Hybrid)
4 Months ago
PwC - IN-Senior Associate_Tech Lead Payments _FS tech_Advisory_Mumbai

PwC

Mumbai, Maharashtra, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

Playground Games - Gameplay Animation Engineer (All Levels)

Playground Games

Royal Leamington Spa, England, United Kingdom (Hybrid)
5 Months ago
PublicisGroupe - Senior Product Manager

PublicisGroupe

London, England, United Kingdom (On-Site)
3 Months ago
ESL FACEIT Group - EFG - Senior Director, Global Brand Partnerships

ESL FACEIT Group - EFG

London, England, United Kingdom (Remote)
3 Months ago
SEGA - Senior HR Business Partner

SEGA

Brentford, England, United Kingdom (Hybrid)
3 Months ago
Assystems - Senior Estimator

Assystems

Glasgow, Scotland, United Kingdom (Hybrid)
3 Months ago
Frontier Developments - PR Manager - 12-Month FTC (Maternity Cover)

Frontier Developments

Cambridge, England, United Kingdom (Hybrid)
3 Months ago
Kwalee - Senior Game Programmer

Kwalee

Royal Leamington Spa, England, United Kingdom (Hybrid)
4 Months ago
Keywords Studios (Player Support) - Lead Programmer

Keywords Studios (Player Support)

Gateshead, England, United Kingdom (Hybrid)
5 Months ago
Steel City Interactive - QA Manager

Steel City Interactive

Sheffield, England, United Kingdom (Hybrid)
7 Months ago
ION - Manager/Director of Mid-Markets - 9779

ION

London, England, United Kingdom (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Siemens Digital Industries Software - Teamcenter Release Manager

Siemens Digital Industries Software

Pune, Maharashtra, India (Hybrid)
3 Months ago
Extreme Network - Principal Cloud Operations Engineer - Infrastructure (9534)

Extreme Network

San Jose, California, United States (Hybrid)
4 Months ago
GoGuardian - Site Reliability Engineer

GoGuardian

India (Remote)
5 Months ago
The Walt Disney Company - Lead Software Engineer, Scala

The Walt Disney Company

Seattle, Washington, United States (On-Site)
3 Months ago
Luxoft - ServiceNow Engineer

Luxoft

Pune, Maharashtra, India (On-Site)
3 Months ago
Canva - Senior Software Engineer (Cloud FinOps) - remote across ANZ

Canva

Sydney, New South Wales, Australia (Remote)
3 Months ago
Alphaserve Technologies®, an ECI Company - Senior Software Engineer

Alphaserve Technologies®, an ECI Company

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Ajmera Infotech - Sr. AWS DevOps Engineer

Ajmera Infotech

India (On-Site)
3 Months ago
Sumo Logic - Senior Software Engineer-I, Machine Learning

Sumo Logic

Bengaluru, Karnataka, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug