Site Reliability Engineer

3 Weeks ago • 3-6 Years • Devops

Job Summary

Job Description

As a Site Reliability Engineer at Xsolla, you will be responsible for ensuring high reliability and availability of systems. You will monitor the system for issues, respond to incidents, and drive incident resolution. This includes developing comprehensive monitoring solutions, supporting services before they go live, and engaging in service capacity planning. Collaboration with development teams to enhance operational stability and building automation systems are also key responsibilities. The role requires strong problem-solving skills and excellent communication.
Must have:
  • 3+ years of experience as a Site Reliability Engineer.
  • Proficiency in scripting languages like Python, Bash.
  • Deep knowledge of monitoring systems like Datadog, Prometheus.
  • Experience with Docker, Kubernetes, or container orchestration.
  • Experience with infrastructure automation tools like Terraform.
  • Experience with Linux-based infrastructures.
Good to have:
  • Strong understanding of Go and PHP.
  • Experience with Helm.
  • IT professional certifications.

Job Details

ABOUT US


At Xsolla, we believe that great games begin as ideas, driven by the curiosity, dedication, and grit of creators around the world. Our mission is to empower these visionaries by providing the support and resources they need to bring their games to life. We are committed to leveling the playing field, ensuring that every creator has the opportunity to share their passion with the world. 


Headquartered in Los Angeles, with offices in Berlin, Seoul, and beyond, we partner with industry leaders like Valve, Twitch, and Ubisoft to clear the paths for innovation in gaming. Our global reach spans over 200 geographies, offering more than 700 payment methods in 130+ currencies.


Longevity Opportunity Vision Enjoy the game!


Requirements
  • Proven experience as a Site Reliability Engineer, or similar Software Engineering role in a large-scale production environment (3+ years). 6+ years
  • overall in IT area (as Ops or Developer).
  • Proficiency in scripting languages such as Python, Bash. Strong understanding of Go and PHP will be a plus.
  • Deep knowledge of monitoring systems such as Datadog, Prometheus, Grafana.
  • Good understanding of continuous integration/continuous delivery processes and platforms (Gitlab preferred). Experience with Helm.
  • Experience with Docker, Kubernetes, or other container orchestration systems.
  • Familiarity with infrastructure automation tools like Terraform.
  • Experience with automation, system administration, and system hardening.
  • Experience with Linux-based infrastructures, Linux/Unix administration.
  • Demonstrated problem-solving skills, particularly debugging and troubleshooting complex software systems. Ability to work under pressure.
  • Excellent communication skills with a capacity to articulate and solve complex technical problems
  • Xsolla Technology Stack:Ubuntu, Kubernetes, Gitlab, Terraform, Terragrunt, Puppet, Nginx, Google Cloud Platform, Datadog, Prometheus, Grafana,
  • ELK, Zabbix and Harbor.


Responsibilities
  • Ensure high reliability and availability and meet SLAs, SLOs, and SLIs.
  • Monitor the system for issues and respond to incidents, ensuring quick resolution to maintain high system availability.
  • Drive incident resolution and process improvements to minimize downtime and increase operational transparency.
  • Ensure all key services are measured, monitored and raising alerts when needed.
  • Develop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like
  • Kubernetes, Datadog, Prometheus, Grafana and others.
  • Support services before they go live through activities such as capacity planning, monitoring setup, logging, and production readiness reviews.
  • Engage in service capacity planning and demand forecasting, performance analysis, and system tuning.
  • Collaborate with the development teams to enhance the product's operational stability.
  • Build and drive the automation systems that maintain system health


Education
  • IT professional certifications are not required, but it will be a plus
  • Certified Kubernetes Administrator or Developer
  • HashiCorp Certifications
  • GCP Certifications


Similar Jobs

gym class vr  - Gameplay Engineer - Game Modes

gym class vr

(Remote)
2 Months ago
Tesla - Engineering Technician - Manufacturing Test Engineering

Tesla

Brandenburg, Germany (On-Site)
4 Months ago
Marsh McLennan - Data Engineering Analyst

Marsh McLennan

Mexico City, Mexico (Hybrid)
4 Weeks ago
USE Insider - Senior Security Engineer - Blue Team

USE Insider

Istanbul, İstanbul, Türkiye (Remote)
4 Months ago
Pattern - Senior Marketplace Manager

Pattern

Lehi, Utah, United States (Hybrid)
9 Months ago
Canva - Staff Frontend Engineer - Apps API Platform

Canva

Auckland, Auckland, New Zealand (Remote)
1 Month ago
NVIDIA - Solutions Architect for NCP

NVIDIA

Dubai, Dubai, United Arab Emirates (On-Site)
2 Months ago
AiDash - Software Development Engineer - III (DevOps)

AiDash

Bengaluru, Karnataka, India (On-Site)
1 Week ago
London stock Exchange - Senior Solution Engineer

London stock Exchange

Bucharest, Bucharest, Romania (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Accurate - Healthcare Vertical Strategist

Accurate

United States (Remote)
3 Months ago
Cadence - Sr Staff Business Systems Analyst

Cadence

San Jose, California, United States (On-Site)
9 Months ago
cirrus logic - Fall Co-Op, Analog Design Engineer

cirrus logic

Austin, Texas, United States (On-Site)
2 Months ago
160over90 - Director, Technology Lead

160over90

Philadelphia, Pennsylvania, United States (On-Site)
2 Months ago
Redhorse Corp - Requirements & Reporting Specialist

Redhorse Corp

Chantilly, Virginia, United States (On-Site)
1 Month ago
Netflix - Title Social Manager, Social Marketing - Korea

Netflix

Seoul, South Korea (On-Site)
2 Months ago
Cadence - AE Director

Cadence

Zhubei, Hsinchu County, Taiwan (On-Site)
1 Month ago
Electronic Arts - Director, Finance - Catalyst Studios & EA SPORTS Technology

Electronic Arts

Vancouver, British Columbia, Canada (Hybrid)
1 Month ago
HCL Tech - Senior Developer

HCL Tech

Colorado, United States (On-Site)
1 Month ago
WME IMG - Senior Manager, Fleet & Bus Operations

WME IMG

Milan, Lombardy, Italy (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Raleigh, North Carolina, United States

CAE - AVIATION SIMULATOR SOFTWARE ENGINEER I

CAE

Las Vegas, Nevada, United States (On-Site)
1 Year ago
NVIDIA - Solutions Architect, HPC Systems Engineer

NVIDIA

Santa Clara, California, United States (Remote)
2 Months ago
Visa - Staff Software Engineer - ServiceNow/ITOM

Visa

Ashburn, Virginia, United States (Hybrid)
2 Weeks ago
Hawkeye Innovations - Live Operations Coordinator - SEC College Sport

Hawkeye Innovations

Birmingham, Alabama, United States (On-Site)
3 Months ago
Lead Venture - Accounts Receivable Specialist

Lead Venture

Lake Oswego, Oregon, United States (On-Site)
1 Month ago
Sony pictures animation  - Character Designer

Sony pictures animation

Culver City, California, United States (On-Site)
9 Months ago
bytedance - Research Scientist, Foundation Model, Speech & Audio

bytedance

Seattle, Washington, United States (On-Site)
8 Months ago
Techland - AI Designer

Techland

Cary, North Carolina, United States (On-Site)
1 Month ago
VVater - Financial Controller

VVater

Austin, Texas, United States (On-Site)
2 Months ago
Apple - Brand Manager, Marcom

Apple

Los Angeles, California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Mashgin - Deployment Engineer - Texas

Mashgin

Austin, Texas, United States (Remote)
8 Months ago
bytedance - Software Engineer Graduate (Multi-Cloud CDN)

bytedance

San Jose, California, United States (On-Site)
3 Months ago
zoox - Software Engineer, Analytics Platform

zoox

Foster City, California, United States (Hybrid)
1 Week ago
CME Group - Site Reliability Engineer III - Markets

CME Group

Belfast, Northern Ireland, United Kingdom (Hybrid)
3 Weeks ago
King - Staff Platform Solutions Engineer

King

New York, United States (On-Site)
3 Weeks ago
bytedance - Site Reliability Engineer (Cloud) - Infrastructure Engineering

bytedance

Singapore (On-Site)
8 Months ago
GoReel - OnCall Site Reliability Engineer

GoReel

(Remote)
2 Weeks ago
Rackspace Technology - AWS Devops III

Rackspace Technology

Bengaluru, Karnataka, India (Hybrid)
1 Month ago
Escape Velocity Entertainment - Site Reliability Engineer

Escape Velocity Entertainment

(Remote)
3 Months ago
Drive mode - Infrastructure Platform Engineer

Drive mode

Tokyo, Japan (Hybrid)
1 Week ago

Get notifed when new similar jobs are uploaded

About The Company

Baku, Azerbaijan (Hybrid)

Baku, Azerbaijan (Hybrid)

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (Hybrid)

Montreal, Quebec, Canada (Hybrid)

Los Angeles, California, United States (Hybrid)

Los Angeles, California, United States (On-Site)

London, England, United Kingdom (Hybrid)

Berlin, Berlin, Germany (Hybrid)

Raleigh, North Carolina, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Xsolla

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug