Jobs Courses Resources Companies Placements

Home >

Jobs >

Software Engineering Manager, Site Reliability, Cloud Incident Response

Google

England, United Kingdom (On-site)

Software Engineering Manager, Site Reliability, Cloud Incident Response

9 Months ago • 8-11 Years • Devops

Job Summary

Job Description

As a Software Engineering Manager for Site Reliability and Cloud Incident Response at Google, you'll play a vital role in ensuring the dependability of Google Cloud Platform (GCP) for our customers. You will lead a team dedicated to responding to and mitigating major incidents across GCP, working closely with product teams, customer-facing teams, and stakeholders. Your responsibilities will include participating in on-call rotations for critical incident response, collaborating to ensure high-quality customer outcomes, developing incident management training and processes, building systems and tools to support the team, and proactively identifying and mitigating potential risks within Cloud infrastructure.

Must have:

Bachelor's degree or equivalent practical experience
8 years of experience with software development
3 years of experience in a technical leadership role
2 years of experience in people management

Good to have:

Master's degree or PhD in Computer Science
Experience working in a changing organization

6 skills required

6 skills required for this role

Add these skills to join the top 1% applicants for this job

java

javascript

cpp

algorithms

python

incident-response

Job Details

Minimum qualifications:

Bachelor's degree or equivalent practical experience.
8 years of experience with software development in one or more programming languages (e.g., Python, C, C++, Java, JavaScript).
3 years of experience in a technical leadership role; overseeing projects, with 2 years of experience in a people management, supervision/team leadership role.

Preferred qualifications:

Master's degree or PhD in Computer Science, or a related technical field.
Experience working in a changing organization.

About the job

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance.

Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

The team's mission is to create a dependable experience for GCP customers. In this role, you will be responding to and helping to coordinate, mitigate, or resolve major incidents across all of GCP. The Cloud Incident Response Team supports the responders, tooling, and outcomes for GCP Major Incidents. The team collaborates across GCP products, customer facing teams, and a wide range of stakeholders.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

Responsibilities

Participate in on-call rotation supporting Critical Incident Response for GCP.
Focus on high-quality customer outcomes and continuous collaboration across GCP teams.
Create IMAG training and processes for incident management life-cycle and partnering with Cloud SRE UTLs, and Cloud Support leadership team.
Build systems and tooling to support the team, improve visibility, detection of issues, communications to customers, stakeholders, and customer facing teams.
Define and escalate risks in Cloud, reduce incident probabilities with strategic and tactical/pragmatic approaches as needed.

Similar Jobs

Software Engineer II - Backend Developer (Release Engineering Team), Bangalore

Warner Bros Games

Bengaluru, Karnataka, India (Hybrid)

• 10 Months ago

Software Engineer III, GCP Foundation Services

Google

Kirkland, Washington, United States (On-Site)

• 9 Months ago

Manager, Mediation Data Engineering

Sinch

Chicago, Illinois, United States (Hybrid)

• 9 Months ago

Sr Engineer - Software QA

Resideo

Bengaluru, Karnataka, India (Hybrid)

• 10 Months ago

Game Mathematician Evolution Live

Evolution

Gothenburg, Västra Götaland County, Sweden (On-Site)

• 9 Months ago

DevOps/ Cloud Engineer

Three Space Lab

(Remote)

• 9 Months ago

DevOps Engineer

Trend Micro

Manila, Metro Manila, Philippines (On-Site)

• 18 Years ago

DevOps + Java Engineer

Luxoft

Pune, Maharashtra, India (On-Site)

• 9 Months ago

Senior AWS Systems Administrator

Razer

Shah Alam, Selangor, Malaysia (On-Site)

• 11 Months ago

Senior/Lead DevOps Engineer

Luxoft

(Remote)

• 9 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Scala Developer

White Hat Gaming

(Remote)

• 10 Months ago

Full Stack Developer

Luxoft

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (On-Site)

• 9 Months ago

Technical Product Manager, Support Systems, YouTube

Google

(On-Site)

• 9 Months ago

Premier Services Engineer

ICE

Pune, Maharashtra, India (Hybrid)

• 1 Year ago

Software Engineer- AI Data Governance

Paypal

San Jose, California, United States (Hybrid)

• 11 Months ago

Software Engineer, Security

Deliveroo

Hyderabad, Telangana, India (On-Site)

• 10 Months ago

LLM Coding Trainer - Specialist

ByteDance

Singapore (On-Site)

• 9 Months ago

Software Quality Test Engineer

Solventum

Bengaluru, Karnataka, India (On-Site)

• 1 Year ago

Senior Staff Backend Engineer (d/f/m)

eBay

Kleinmachnow, Brandenburg, Germany (Hybrid)

• 11 Months ago

IN-Senior Associate_Tech Lead Payments _FS tech_Advisory_Mumbai

PwC

Mumbai, Maharashtra, India (On-Site)

• 10 Months ago

Get notifed when new similar jobs are uploaded

Jobs in London, United Kingdom

Gameplay Animation Engineer (All Levels)

Playground Games

Royal Leamington Spa, England, United Kingdom (Hybrid)

• 1 Year ago

Senior Product Manager

PublicisGroupe

London, England, United Kingdom (On-Site)

• 10 Months ago

Senior Director, Global Brand Partnerships

ESL FACEIT Group - EFG

London, England, United Kingdom (Remote)

• 9 Months ago

Senior HR Business Partner

SEGA

Brentford, England, United Kingdom (Hybrid)

• 9 Months ago

Senior Estimator

Assystems

Glasgow, Scotland, United Kingdom (Hybrid)

• 10 Months ago

PR Manager - 12-Month FTC (Maternity Cover)

Frontier Developments

Cambridge, England, United Kingdom (Hybrid)

• 10 Months ago

Senior Game Programmer

Kwalee

Royal Leamington Spa, England, United Kingdom (Hybrid)

• 11 Months ago

Lead Programmer

Keywords Studios (Player Support)

Gateshead, England, United Kingdom (Hybrid)

• 11 Months ago

QA Manager

Steel City Interactive

Sheffield, England, United Kingdom (Hybrid)

• 1 Year ago

Manager/Director of Mid-Markets - 9779

ION

London, England, United Kingdom (On-Site)

• 10 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Teamcenter Release Manager

Siemens Digital Industries Software

Pune, Maharashtra, India (Hybrid)

• 9 Months ago

Principal Cloud Operations Engineer - Infrastructure (9534)

Extreme Network

San Jose, California, United States (Hybrid)

• 10 Months ago

Senior Staff Software Engineer, Site Reliability Engineering

Google

(On-Site)

• 9 Months ago

Site Reliability Engineer

GoGuardian

India (Remote)

• 11 Months ago

Lead Software Engineer, Scala

The Walt Disney Company

Seattle, Washington, United States (On-Site)

• 9 Months ago

ServiceNow Engineer

Luxoft

Pune, Maharashtra, India (On-Site)

• 9 Months ago

Senior Software Engineer (Cloud FinOps) - remote across ANZ

Canva

Sydney, New South Wales, Australia (Remote)

• 10 Months ago

Senior Software Engineer

Alphaserve Technologies®, an ECI Company

Bengaluru, Karnataka, India (On-Site)

• 9 Months ago

Sr. AWS DevOps Engineer

Ajmera Infotech

India (On-Site)

• 10 Months ago

Senior Software Engineer-I, Machine Learning

Sumo Logic

Bengaluru, Karnataka, India (On-Site)

• 10 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Google

108 Active Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Software Engineering Manager, Site Reliability, Cloud Incident Response

Job Summary

Job Description

6 skills required

6 skills required for this role

Job Details

Minimum qualifications:

Preferred qualifications:

About the job

Responsibilities

Similar Jobs

Software Engineer II - Backend Developer (Release Engineering Team), Bangalore

Software Engineer III, GCP Foundation Services

Manager, Mediation Data Engineering

Sr Engineer - Software QA

Game Mathematician Evolution Live

DevOps/ Cloud Engineer

DevOps Engineer

DevOps + Java Engineer

Senior AWS Systems Administrator

Senior/Lead DevOps Engineer

Similar Skill Jobs

Scala Developer

Full Stack Developer

Technical Product Manager, Support Systems, YouTube

Premier Services Engineer

Software Engineer- AI Data Governance

Software Engineer, Security

LLM Coding Trainer - Specialist

Software Quality Test Engineer

Senior Staff Backend Engineer (d/f/m)

IN-Senior Associate_Tech Lead Payments _FS tech_Advisory_Mumbai

Jobs in London, United Kingdom

Gameplay Animation Engineer (All Levels)

Senior Product Manager

Senior Director, Global Brand Partnerships

Senior HR Business Partner

Senior Estimator

PR Manager - 12-Month FTC (Maternity Cover)

Senior Game Programmer

Lead Programmer

QA Manager

Manager/Director of Mid-Markets - 9779

Devops Jobs

Teamcenter Release Manager

Principal Cloud Operations Engineer - Infrastructure (9534)

Senior Staff Software Engineer, Site Reliability Engineering

Site Reliability Engineer

Lead Software Engineer, Scala

ServiceNow Engineer

Senior Software Engineer (Cloud FinOps) - remote across ANZ

Senior Software Engineer

Sr. AWS DevOps Engineer

Senior Software Engineer-I, Machine Learning

About The Company

Software Engineer, AI/ML, AI Innovation and Research

Software Engineer, Google Store Catalog, Serving Infrastructure

Software Engineer, Photos, Android

Software Engineer, Chrome Accessibility

Software Engineer, Computer Vision, Machine Learning

Software Engineer, Network Infrastructure

Software Engineer, TV Systems Performance

Software Engineer, Machine Learning, Core

Software Engineer, AI Innovation and Research

Software Engineer, Backend Scale Performance of Search

Level Up Your Career in Game Development!