Senior Systems Engineer, Site Reliability Engineering

1 Month ago • 5 Years + • DevOps

Job Summary

Job Description

The Senior Systems Engineer, Site Reliability Engineering (SRE) role at Google combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. Responsibilities include improving the service lifecycle (design, deployment, operation, refinement), guiding team members on availability and performance, building automation, and leading incident response. The position requires managing availability, latency, and system health, scaling systems sustainably through automation, and consulting on system design, capacity planning, and launch reviews. The ideal candidate will have experience with programming, distributed systems, administration/networking, and project leadership. The SRE team ensures the reliability and uptime of Google Cloud services.
Must have:
  • Bachelor's degree in CS or related field
  • 5+ years programming experience
  • 3+ years experience with distributed systems
  • 2+ years project leadership experience
  • Incident response experience
Good to have:
  • Master's degree in CS or Engineering

Job Details

Minimum qualifications:

  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
  • 5 years of experience with programming in one or more programming languages.
  • 3 years of experience designing, analyzing, and troubleshooting distributed systems and working with administration (e.g., filesystems, inodes, system calls) or networking (e.g., TCP/IP, routing, network topologies and hardware, SDN).
  • 2 years of experience leading projects, and providing technical leadership.
  • Experience working with incident response.

Preferred qualifications:

  • Master's degree in Computer Science or Engineering, or a related field.

About the job

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance.

Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.

Responsibilities

  • Improve the whole lifecycle of services from inception and design, through deployment, operation, and refinement.
  • Provide guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and on building automated responses for non-exceptional service conditions.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. Lead sustainable incident response and blameless postmortems.
  • Scale systems sustainably through mechanisms like automation and evolve systems by driving changes that improve reliability and velocity.
  • Manage support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.

Similar Jobs

Google - Senior Software Engineer, Infrastructure, Platforms Infrastructure Engineering

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
Impact Analytics - Data Science Lead

Impact Analytics

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Activision - Software Developer Co-op

Activision

Dublin, County Dublin, Ireland (On-Site)
2 Months ago
Google - Software Engineer III, Embedded Systems/Firmware, Google Cloud Platforms

Google

Sunnyvale, California, United States (On-Site)
3 Months ago
Zuru - Unreal Engine C++ Software Engineer

Zuru

Modena, Emilia-Romagna, Italy (Hybrid)
3 Months ago
Litera - Site Reliability Engineer

Litera

Ahmedabad, Gujarat, India (On-Site)
3 Months ago
Miniclip - Senior Cloud Database Engineer

Miniclip

Lisbon, Lisbon, Portugal (On-Site)
2 Months ago
ION - Site Reliability Engineer

ION

Pisa, Tuscany, Italy (Hybrid)
4 Months ago
Google - Partner Engineer, Google Cloud

Google

Singapore (On-Site)
1 Month ago
Avalara - Sr. Site Reliability Engineer

Avalara

India (Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Fluence - Battery Data Scientist

Fluence

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
seeking alpha - Senior Data Scientist

seeking alpha

Poland (Remote)
4 Months ago
ByteDance - Software Engineer in Machine Learning Systems

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
AliveCor - Data Engineer

AliveCor

Karnataka, India (Hybrid)
3 Months ago
Virtuos - Technical Director

Virtuos

China (On-Site)
1 Month ago
HP - Machine Learning Engineer

HP

Palo Alto, California, United States (On-Site)
4 Months ago
Enphase Energy - Staff Data Scientist

Enphase Energy

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Microsoft - Member of Technical Staff, AI - Pre-Training

Microsoft

London, England, United Kingdom (On-Site)
1 Month ago
Varonis  - Cloud Security Researcher

Varonis

Herzliya, Tel Aviv District, Israel (On-Site)
3 Months ago
ByteDance - Research Scientist Graduates, Large Language Model (Doubao) - 2025 Start

ByteDance

San Jose, California, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

Sledgehammer Games - Expert Cinematics Engineer

Sledgehammer Games

Manchester, England, United Kingdom (On-Site)
1 Month ago
PlayStation Global - Senior Analyst, Strategic Analytics

PlayStation Global

London, England, United Kingdom (Hybrid)
3 Months ago
Mattel  Inc  - Brand Manager (Fixed-Term Temp Contract)

Mattel Inc

Slough, England, United Kingdom (On-Site)
1 Month ago
Funko - Senior National Account Manager UK

Funko

London, England, United Kingdom (On-Site)
2 Months ago
Unity - Senior Engineering Program Manager, Console Platforms

Unity

London, England, United Kingdom (On-Site)
4 Months ago
Blue Zoo Animation Studio - Junior 2D Compositor

Blue Zoo Animation Studio

London, England, United Kingdom (On-Site)
6 Months ago
Unity - Program Manager Customer Success Analytics

Unity

Brighton And Hove, England, United Kingdom (On-Site)
5 Months ago
Octopus Deploy - Senior Field Marketing Manager

Octopus Deploy

England, United Kingdom (Remote)
2 Months ago
Gunzilla - Senior Technical Animator

Gunzilla

London, England, United Kingdom (On-Site)
1 Month ago
Bally's Interactive - Commercial Accounts Administrator

Bally's Interactive

London, England, United Kingdom (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Luxoft - Lead Software Solution Architect

Luxoft

Ukrainka, Kyiv Oblast, Ukraine (Remote)
2 Months ago
Omnissa - Senior Member of Technical Staff (C++ Windows Internals)

Omnissa

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Playtech - Product Operations Team Leader

Playtech

(On-Site)
3 Months ago
Visa - Staff Systems Engineer - DevEx

Visa

Singapore, Singapore (On-Site)
3 Months ago
The Walt Disney Company - Senior Systems Engineer

The Walt Disney Company

New York, New York, United States (On-Site)
1 Month ago
bosh group india - Database Consultant

bosh group india

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Immutable - Senior Site Reliability Engineer

Immutable

Singapore (Hybrid)
3 Months ago
Thrasio - Cloud Engineer II

Thrasio

Kolkata, West Bengal, India (Remote)
4 Months ago
Info Stretch - Lead Data Engineer

Info Stretch

Chennai, Tamil Nadu, India (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug