Site Reliability Engineer

1 Month ago • 3-6 Years • Devops

Job Summary

Job Description

As a Site Reliability Engineer at Xsolla, you will be responsible for ensuring high reliability and availability of systems. You will monitor the system for issues, respond to incidents, and drive incident resolution. This includes developing comprehensive monitoring solutions, supporting services before they go live, and engaging in service capacity planning. Collaboration with development teams to enhance operational stability and building automation systems are also key responsibilities. The role requires strong problem-solving skills and excellent communication.
Must have:
  • 3+ years of experience as a Site Reliability Engineer.
  • Proficiency in scripting languages like Python, Bash.
  • Deep knowledge of monitoring systems like Datadog, Prometheus.
  • Experience with Docker, Kubernetes, or container orchestration.
  • Experience with infrastructure automation tools like Terraform.
  • Experience with Linux-based infrastructures.
Good to have:
  • Strong understanding of Go and PHP.
  • Experience with Helm.
  • IT professional certifications.

Job Details

ABOUT US


At Xsolla, we believe that great games begin as ideas, driven by the curiosity, dedication, and grit of creators around the world. Our mission is to empower these visionaries by providing the support and resources they need to bring their games to life. We are committed to leveling the playing field, ensuring that every creator has the opportunity to share their passion with the world. 


Headquartered in Los Angeles, with offices in Berlin, Seoul, and beyond, we partner with industry leaders like Valve, Twitch, and Ubisoft to clear the paths for innovation in gaming. Our global reach spans over 200 geographies, offering more than 700 payment methods in 130+ currencies.


Longevity Opportunity Vision Enjoy the game!


Requirements
  • Proven experience as a Site Reliability Engineer, or similar Software Engineering role in a large-scale production environment (3+ years). 6+ years
  • overall in IT area (as Ops or Developer).
  • Proficiency in scripting languages such as Python, Bash. Strong understanding of Go and PHP will be a plus.
  • Deep knowledge of monitoring systems such as Datadog, Prometheus, Grafana.
  • Good understanding of continuous integration/continuous delivery processes and platforms (Gitlab preferred). Experience with Helm.
  • Experience with Docker, Kubernetes, or other container orchestration systems.
  • Familiarity with infrastructure automation tools like Terraform.
  • Experience with automation, system administration, and system hardening.
  • Experience with Linux-based infrastructures, Linux/Unix administration.
  • Demonstrated problem-solving skills, particularly debugging and troubleshooting complex software systems. Ability to work under pressure.
  • Excellent communication skills with a capacity to articulate and solve complex technical problems
  • Xsolla Technology Stack:Ubuntu, Kubernetes, Gitlab, Terraform, Terragrunt, Puppet, Nginx, Google Cloud Platform, Datadog, Prometheus, Grafana,
  • ELK, Zabbix and Harbor.


Responsibilities
  • Ensure high reliability and availability and meet SLAs, SLOs, and SLIs.
  • Monitor the system for issues and respond to incidents, ensuring quick resolution to maintain high system availability.
  • Drive incident resolution and process improvements to minimize downtime and increase operational transparency.
  • Ensure all key services are measured, monitored and raising alerts when needed.
  • Develop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like
  • Kubernetes, Datadog, Prometheus, Grafana and others.
  • Support services before they go live through activities such as capacity planning, monitoring setup, logging, and production readiness reviews.
  • Engage in service capacity planning and demand forecasting, performance analysis, and system tuning.
  • Collaborate with the development teams to enhance the product's operational stability.
  • Build and drive the automation systems that maintain system health


Education
  • IT professional certifications are not required, but it will be a plus
  • Certified Kubernetes Administrator or Developer
  • HashiCorp Certifications
  • GCP Certifications


Similar Jobs

Shield AI - Quality Engineering Inspector (R3488)

Shield AI

Dallas, Texas, United States (On-Site)
1 Week ago
BlueJeans - Lead Engineer - API/Platform

BlueJeans

Bengaluru, Karnataka, India (On-Site)
9 Months ago
NVIDIA - System Debug Lead Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
3 Months ago
Survay Monkey - Staff Software Engineer [fullstack]

Survay Monkey

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
Sporty - IN Lead- Customer Success (Gurugram)

Sporty

Delhi, India (On-Site)
11 Months ago
velotio technologies  - DevOps Engineer (OpenShift & Infrastructure)

velotio technologies

Pune, Maharashtra, India (Remote)
1 Month ago
kaizen gaming  - Site Reliability | DevOps Engineer

kaizen gaming

Athens, Greece (Hybrid)
1 Month ago
Sierra - Software Engineer, Platform

Sierra

San Francisco, California, United States (On-Site)
11 Months ago
Intel  - Systems and Solutions Architect

Intel

Santa Clara, California, United States (On-Site)
1 Year ago
Adyen - Solutions Engineer

Adyen

Warsaw, Masovian Voivodeship, Poland (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

binance - Product Lead, Compliance

binance

Taipei City, Taiwan (Remote)
2 Years ago
Nice - Senior Automation Engineer, Actimize

Nice

Pune, Maharashtra, India (Hybrid)
1 Month ago
Xsolla - Technical Support Specialist

Xsolla

Los Angeles, California, United States (On-Site)
1 Month ago
GHX - Content Specialist - Intern

GHX

Hyderabad, Telangana, India (On-Site)
1 Month ago
Pomelo - Regional Medical Director, Obstetrics

Pomelo

United States (Remote)
2 Weeks ago
Social Discovery Ventures - Senior AI Content Manager

Social Discovery Ventures

Poland (Remote)
2 Weeks ago
Penumbrainc - Senior International Marketing Manager

Penumbrainc

Alameda, California, United States (On-Site)
5 Months ago
Kavalirio - AVP, Training & Development Facilitator

Kavalirio

Santa Rosa, California, United States (On-Site)
1 Month ago
Tesla - Service Technician Apprentice

Tesla

Aarhus, Denmark (On-Site)
5 Months ago
Oliver Agency - SEO Lead Strategist

Oliver Agency

Manila, Metro Manila, Philippines (On-Site)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Raleigh, North Carolina, United States

GungHo - Senior Lighter / Look Development Artist

GungHo

Redondo Beach, California, United States (Hybrid)
2 Months ago
Next Level Business Services - Teamcenter Solution Architect

Next Level Business Services

Houston, Texas, United States (On-Site)
9 Months ago
GoDaddy - Freelance Photographer

GoDaddy

Florence, South Carolina, United States (On-Site)
2 Months ago
Noetic - All-Source Intelligence Analyst

Noetic

Quantico, Virginia, United States (On-Site)
2 Months ago
TFL Group - Ticket Operations/Service Coordinator

TFL Group

Overland Park, Kansas, United States (On-Site)
6 Months ago
Apple - Engineering Program Manager, Privacy

Apple

Seattle, Washington, United States (On-Site)
2 Months ago
Inkittt - Principal Product Designer

Inkittt

San Francisco, California, United States (Hybrid)
8 Months ago
project white card - Street Sense wins Gold from International Serious Play Awards!

project white card

Orlando, Florida, United States (On-Site)
2 Years ago
Illumina - Senior Staff Software Technical Product Manager Oncology

Illumina

San Diego, California, United States (Hybrid)
1 Month ago
Nintendo - Assistant Manager - Nintendo San Francisco Store

Nintendo

San Francisco, California, United States (On-Site)
11 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

luxsoft - Senior/Lead DevOps Engineer

luxsoft

Gurugram, India (On-Site)
1 Month ago
powtoon - DevOps Team Lead

powtoon

Tel Aviv-Yafo, Tel Aviv District, Israel (Hybrid)
1 Month ago
Rackner - DevSecOps Engineer, Sr.

Rackner

Huntsville, Alabama, United States (On-Site)
2 Months ago
London stock Exchange - Cloud DevOps Engineer

London stock Exchange

Bucharest, Romania (On-Site)
2 Weeks ago
Zelis  - Senior Snowflake Platform Engineer

Zelis

Atlanta, Georgia, United States (Hybrid)
2 Months ago
HCL Tech - Enterprise solution architect

HCL Tech

New Jersey, United States (On-Site)
2 Months ago
bytedance - Software Engineer Intern (Cloud Native Infrastructure)

bytedance

San Jose, California, United States (On-Site)
4 Months ago
ALTEN - Solutions Architect

ALTEN

Toulouse, Occitanie, France (On-Site)
1 Month ago
Patreon - iOS Platform Engineer

Patreon

San Francisco, California, United States (Hybrid)
2 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Los Angeles, California, United States (Remote)

Montreal, Quebec, Canada (Remote)

Montreal, Quebec, Canada (Remote)

Los Angeles, California, United States (Hybrid)

Baku, Azerbaijan (Hybrid)

Baku, Azerbaijan (Hybrid)

Los Angeles, California, United States (Remote)

State Of São Paulo, Brazil (Remote)

Georgia (Hybrid)

Mexico City, Mexico (Remote)

View All Jobs

Get notified when new jobs are added by Xsolla

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug