Site Reliability Engineer | Core Platform

1 Week ago • All levels • DevOps

Job Summary

Job Description

As a Site Reliability Engineer (SRE) on the Observability team at King, you'll engineer and manage the monitoring and observability environments for a massive gaming platform processing over 100 billion events daily. You'll build and maintain the metrics, logs, and tracing stack, creating self-service solutions for developers to enhance performance and reliability. This role demands expertise in distributed systems monitoring, incident response, and automation using tools like Prometheus, Loki, OpenTelemetry, and Kubernetes. You'll collaborate closely with developers, improve on-call workflows, and drive best practices for service instrumentation.
Must have:
  • Strong software development background (Python, Java, Go)
  • Experience with observability tools (Prometheus, Loki, OpenTelemetry)
  • Understanding of distributed systems monitoring and incident response
  • Collaboration with developers & best practices
  • Kubernetes and cloud environments (GCP/AWS/Azure)
  • Linux performance debugging and network troubleshooting

Job Details

Craft:

Technology & Development

Job Description:

At King millions of players connect to our games every day and expect to continue playing from where they left off. All this user and game progression data is stored in our infrastructure. We are looking to find someone eager to help us engineer and manage the monitoring and observability environments at the heart of this ecosystem.
We believe that you share our passion for learning new things, coding, quality, automation, continuous improvements, and actively building and upholding a great culture. Above all, we would like to see that you have a genuine interest in high performance observability.
Your role within our Kingdom
We are looking for a Site Reliability Engineer (SRE) with a strong development background to join our Observability team and help shape the future of how we monitor, debug, and optimize our platform, services, and applications at scale.. Our mission is to empower developers with the right tools and insights to keep our services running smoothly, efficiently, and reliably.

As part of the Observability team, you will build and maintain our monitoring, logging, and tracing platform, working closely with developers to create self-service solutions that enhance performance, reliability, and troubleshooting capabilities. Our platform processes over 100 billion events per day, requiring innovative approaches to scalability, automation, and efficiency.

We care deeply about our culture and believe in:
● An inclusive and diverse workplace
● Continuous improvement of everything we do
● Automation and coding as much as possible
● Collaboration and blame-free respectful problem solving
● Asking for help and sharing ideas openly
What you will work on:
Observability Platform – Engineer and operate our metrics, logs, and tracing stack, ensuring it scales reliably across the organization.

Developer-Focused Monitoring – Build APIs, tools, and dashboards that give teams insight into their services with minimal friction.

Automation & Self-Service – Drive automation efforts for alerting, event correlation, and proactive anomaly detection.

Scalable Infrastructure – Work on distributed monitoring systems that handle high-throughput data ingestion and querying.

Incident Response & Troubleshooting – Improve on-call workflows, alerting systems, and root cause analysis processes.

Our Observability Stack

We use a combination of open-source and cloud-native technologies, including:

Metrics & Tracing: OpenTelemetry, Prometheus, Mimir, InfluxDB

Log Management: Loki, Elasticsearch

Alerting & Incident Response: Grafana OnCall, PagerDuty, Alertmanager

Infrastructure as Code: Terraform, Ansible

Automation & Scripting: Python, Go, Bash

Skills to create thrills
Strong software development background – Comfortable writing production-quality Python, Java, Go, or similar languages.

Experience with observability tools (Prometheus, Loki, OpenTelemetry, etc.).

Deep understanding of distributed systems monitoring and incident response.

Ability to collaborate with developers and drive best practices for instrumenting services.

Familiarity with Kubernetes and cloud environments (GCP/AWS/Azure).

Solid knowledge of Linux performance debugging and network troubleshooting.

Strong problem-solving skills and a proactive mindset for improving reliability.

Excellent communication skills in English (both written and spoken).

Why Join Us?

We believe in:
Autonomy & Ownership – We enable developers to self-serve monitoring solutions and own their observability needs.
Collaboration – We work closely across teams to improve reliability and troubleshoot challenges blame-free.
Continuous Learning – We experiment, iterate, and improve everything we do.
Impact – Your work will directly affect how all our games and platforms operate at scale.

We think that you are a curious, humble, driven, collaborative, and responsible person who loves to work with infrastructure as code.

About King

With a mission of Making the World Playful, King is a leading interactive entertainment company with more than 20 years of history of delivering some of the world’s most iconic games in the mobile gaming industry, including the world-famous Candy Crush franchise, as well as other mobile game hits such as Farm Heroes Saga. King games are played by more than 200 million monthly active users. King, part of Microsoft (NASDAQ: MSFT), has Kingsters in Stockholm, Malmö, London, Barcelona, Berlin, Dublin, San Francisco, New York, Los Angeles and Malta. More information can be found at King.com or by following us on LinkedIn, @lifeatking on Instagram, or @king_games on X.

Similar Jobs

PwC - Guidewire DataHub Senior Associate (Sr. Analyst)

PwC

Bengaluru, Karnataka, India (Remote)
1 Month ago
NVIDIA - Senior CPU Implementation Methodology Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
4 Weeks ago
NVIDIA - System Software Engineer - CUDA Driver

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
ByteDance - Machine Learning Engineer - AML Algorithm

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
Salesforce - MuleSoft Senior Technical Consultant - Public Sector - Must be located in the DC Metro area

Salesforce

McLean, Virginia, United States (Remote)
3 Weeks ago
Fortis Games - DevOps Engineer II

Fortis Games

Portugal (On-Site)
3 Months ago
N-iX - Senior DevOps Engineer (Azure AD B2C)

N-iX

Ukraine (Remote)
3 Weeks ago
Next Level Business Services - Cloud Architect

Next Level Business Services

Jersey City, New Jersey, United States (On-Site)
4 Months ago
Google - Technical Solutions Engineer, Google Distributed Cloud (Airgapped)

Google

Frankfurt, Hessen, Germany (On-Site)
3 Months ago
PwC - Senior Associate _ Automation Tester_ Emerging  Technologies_ Advisory_ Bengaluru

PwC

Bengaluru, Karnataka, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

SKYDANCE - CFX Trainee

SKYDANCE

Madrid, Community Of Madrid, Spain (On-Site)
1 Week ago
Rovio Entertainment Corporation - Senior Producer, External Projects

Rovio Entertainment Corporation

Barcelona, Catalonia, Spain (Hybrid)
13 Hours ago
Activision - Senior Technical Artist

Activision

Malmö, Skåne County, Sweden (Hybrid)
1 Week ago
NVIDIA - Senior DevOps Engineer - Accelerated Computing

NVIDIA

Westford, Massachusetts, United States (Hybrid)
1 Month ago
Milk Visual Effects - CG Supervisor

Milk Visual Effects

(On-Site)
2 Months ago
CD PROJEKT RED - Principal Engine Programmer

CD PROJEKT RED

Boston, Massachusetts, United States (Hybrid)
6 Days ago
Gaming Innovation Group  - Java Tech Lead

Gaming Innovation Group

Community Of Madrid, Spain (Remote)
4 Days ago
NVIDIA - Senior C++ Software Engineer

NVIDIA

Ra'anana, Center District, Israel (On-Site)
1 Month ago
NVIDIA - Senior Mixed-Signal Design Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
NVIDIA - NVIDIA 2025 Internships: MBA Product Management

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in undefined

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

DevOps Jobs

Onward Search - DevOps/Automation Engineer

Onward Search

New York, New York, United States (Remote)
2 Weeks ago
Offworld - DevOps Engineer

Offworld

New Westminster, British Columbia, Canada (On-Site)
6 Hours ago
Luxoft - .NET and Azure API Developer

Luxoft

Bengaluru, Karnataka, India (On-Site)
3 Months ago
The Walt Disney Company - Senior Manager, Storage Systems Engineering

The Walt Disney Company

New York, New York, United States (On-Site)
1 Month ago
Nagarro - Senior Staff Engineer (Python Azure Synapse)

Nagarro

India (On-Site)
4 Months ago
NVIDIA - Senior Software and Cloud Architect

NVIDIA

Ra'anana, Center District, Israel (On-Site)
1 Month ago
ZeniMax Media - Sr. Systems Engineer

ZeniMax Media

Rockville, Maryland, United States (On-Site)
5 Months ago
Teradata - Senior Cloud Engineer

Teradata

Pune, Maharashtra, India (On-Site)
3 Months ago
Egnyte - Senior Technical Program Manager

Egnyte

India (Remote)
3 Months ago
Tencent - Cloud Engineer

Tencent

(On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

At King, we’re Making the World Playful. Heard of Candy Crush? We’re the creators behind it and loads of other sweet games like Farm Heroes.

London, England, United Kingdom (Hybrid)

Barcelona, Catalonia, Spain (On-Site)

Barcelona, Catalonia, Spain (On-Site)

London, England, United Kingdom (On-Site)

Barcelona, Catalonia, Spain (On-Site)

London, England, United Kingdom (On-Site)

Barcelona, Catalonia, Spain (On-Site)

Barcelona, Catalonia, Spain (On-Site)

View All Jobs

Get notified when new jobs are added by King

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug