Senior Site Reliability Engineer

2 Months ago • 4 Years + • DevOps

Job Summary

Job Description

Gearbox Entertainment seeks a Senior Site Reliability Engineer to design, engineer, and develop solutions for the observability and reliability of their online platform. This hands-on role involves deep immersion in Go and Python observability stacks, AWS, Terraform, and more. Responsibilities include driving implementation of cloud architectures, mentoring developers, defining project roadmaps, and participating in 24/7 on-call support. The ideal candidate is a self-starter with strong Go and AWS experience, proficient in observability stack management, and comfortable with automation and complex problem-solving.
Must have:
  • 4+ years experience instrumenting observability stacks in Go
  • Proficiency in AWS container management (ECS, Fargate, etc.)
  • AWS access & security services experience (IAM, KMS, etc.)
  • Experience with Terraform/CloudFormation
  • 2+ years experience with Docker containers
  • Adept understanding of observability stack management (otel, tracing, etc.)
  • Excellent communication skills
Good to have:
  • Extensive OpenTelemetry experience
  • CI/CD pipeline experience (Git/GitLab)
  • Understanding of RESTful and Websocket APIs
  • Bachelor's degree in CS or related field
  • Familiarity with Datadog
  • Atlassian product experience (OpsGenie, Jira, Confluence)
  • Agile environment experience
  • Games industry experience
Perks:
  • Company-paid mobile device

Job Details

The Gearbox Entertainment Company is an award-winning creator and distributor of entertainment for people around the world. Gearbox Entertainment develops and publishes products through its subsidiaries, Gearbox Software and Gearbox Publishing. Gearbox Entertainment has become widely known for successful game franchises including Brothers in Arms and Borderlands, as well as acquired properties Duke Nukem and Homeworld. Gearbox’s ambition is to entertain the world and its key driving objectives include the pursuit of happiness for our talent, partners and customers, the prioritization of entertainment and creativity and a measured respect for profitability. For more information, visit www.Gearbox.com.

To further drive our vision of premier stability and rapid feature delivery, we are looking for a Senior Site Reliability Engineer to join our team. As a Senior SRE, you should feel exceptionally comfortable bringing architectural design proposals to the table for consideration among your colleagues on our platform and infrastructure development teams. You will be one of the principal technical designers helping push our cloud-native platform toward the future. You will be responsible for driving the implementation of flexible cloud architectures with an automation-first emphasis; manual user intervention likely makes you uneasy and maybe even a little twitchy. We would expect a successful candidate for this position to be a self-starter with the ability to complete tasks independently. Though you will have access to technical leadership and senior engineers at your disposal, you should feel well acquainted with tackling complex problems without significant oversight. Observability is paramount. If we can't measure it, we can't prove it works; if we can't prove it works, it must be assumed it doesn't work. This is a philosophy you hopefully love (and preferably obsess over). If we can't observe how a new feature is behaving, our SRE team is excited to dive into the application code and make the necessary improvements. Typical Day Tl;dr: You will be deeply immersed in Go and Python observability stacks; plenty of AWS and Terraform sprinkled in as well. This is a very hands-on Senior Engineering role where your days will be filled with building solutions to technical challenges in the observability and availability of our SHiFT online services. You will evangelize for and be obsessed with user experience as it relates to the services you support. You will help manage and orchestrate each of these by leaning heavily on technologies like Go, Terraform, Docker, and Bash. On any given day, you should expect to spend at least 80% of your time actively engineering and developing solutions; the rest will be a mixture of planning, reviewing code from your colleagues, participating in design meetings, documentation, and self-development. This position will eventually require you to carry a company-paid mobile device and participate in 24/7 on-call rotations alongside your engineering colleagues. Don't worry though, our on-call experience doesn't suck. Core Responsibilities: Design, engineer, and develop solutions for ensuring the observability and reliability of our online platform Be a trusted voice in the evangelism of reliability engineering throughout the team with an eagerness for mentoring other developers on the team Help define and oversee short and mid-term project roadmaps for the future of our SRE team Participate in after-hours on-call support rotations Must Have (the non-negotiable parts): Candidates must have at least 4 years of professional experience instrumenting complex observability stacks in object oriented programming languages, preferably Go. Proficiency in AWS container management, orchestration, and observability features (ECS, Fargate, Aurora, AppConfig, CloudWatch, etc.) Professional Experience managing AWS access and security services (IAM, kms, Secrets Manager, WAFv2, etc.) Professional Experience in Terraform and/or CloudFormation Minimum of 2 years experience with containers in a professional setting, preferably Docker Adept understanding of observability stack management (otel, tracing, monitoring, alerting, structured logging, APM, etc.) Comfortable communicator, able to clearly detail designs and implementations on an individual level and in large group settings Should Have (some wiggle room): Extensive hands-on experience with OpenTelemetry Hands-on experience developing and maintaining CI/CD pipelines, preferably in git/GitLab Understanding of RESTful and Websocket based APIs Bachelor's degree in computer science, related field, or equivalent training and professional experience Now you're just showing off: Familiarity with Datadog Familiarity with Atlassian products (OpsGenie, JIRA, Confluence) Experience working with developers in an agile environment Experience in the games industry, preferably launching multiple online-enabled AAAs Knowledge about Gearbox-owned IPs

Gearbox Entertainment believes that all team members should be able to enjoy a work environment free from all forms of discrimination and harassment. We are committed to reflecting the diversity of the world we strive to entertain. As an Equal Opportunity Employer, we provide fair and equal treatment to all team members and applicants. We do not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity or expression, national origin, disability, genetic information, pregnancy or maternity, veteran status, or any other status protected by applicable national, federal, state or local law.

Similar Jobs

The Walt Disney Company - Senior Manager, Software Engineer, Quality Engineering

The Walt Disney Company

Santa Monica, California, United States (Hybrid)
• 2 Weeks ago
The Walt Disney Company - Senior Product Manager

The Walt Disney Company

Burbank, California, United States (On-Site)
• 2 Months ago
PlayStation Global - Network Control Technician - Contract

PlayStation Global

London, England, United Kingdom (On-Site)
• 1 Month ago
Social Discovery Group - QA Engineer

Social Discovery Group

Tbilisi, Tbilisi, Georgia (Remote)
• 3 Days ago
Nielsen Holdings - QA Engineer

Nielsen Holdings

Mumbai, Maharashtra, India (Hybrid)
• 1 Month ago
Nintendo - Sr Manager, Engineering Infrastructure and IT

Nintendo

Redmond, Washington, United States (On-Site)
• 1 Month ago
Netomi - Software Engineer II - Python

Netomi

Gurugram, Haryana, India (Hybrid)
• 2 Months ago
Scopely - Lead DevOps/SRE - Unannounced Project

Scopely

Dublin, County Dublin, Ireland (Hybrid)
• 2 Weeks ago
Rackspace Technology - Senior Big Data Hadoop ML Engineer (GCP)

Rackspace Technology

San Antonio, Texas, United States (Remote)
• 3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

NOVOMATIC - Technical Writer

NOVOMATIC

Lesser Poland Voivodeship, Poland (Hybrid)
• 3 Days ago
Evolution - CI/CD (DevOps) Engineer

Evolution

Riga, Latvia (On-Site)
• 1 Month ago
Fluence - Controls Software Engineer-II(m/f/d)

Fluence

Erlangen, Bavaria, Germany (Hybrid)
• 3 Months ago
Next Level Games - Senior Linux Administrator

Next Level Games

British Columbia, Canada (On-Site)
• 6 Days ago
Nielsen Holdings - QA Engineer (Automation/Manual/Python/Java)- P2

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
• 1 Month ago
Samsara - Product Manager - GTM Systems (Salesforce)

Samsara

Bengaluru, Karnataka, India (Hybrid)
• 3 Months ago
GamePoint - Lead Unity Developer

GamePoint

The Hague, South Holland, Netherlands (On-Site)
• 7 Months ago
Trimble  Inc  - Site Reliability Engineer

Trimble Inc

Chennai, Tamil Nadu, India (On-Site)
• 3 Months ago
Assystems - PMO IT confirmé H/F

Assystems

Courbevoie, ĂŽle-de-France, France (Hybrid)
• 3 Months ago
Super - Product Designer II

Super

San Francisco, California, United States (On-Site)
• 3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Frisco, Texas, United States

Netflix - Senior Android Software Engineer (L5) - DVX Mobile Core

Netflix

Los Gatos, California, United States (On-Site)
• 1 Week ago
ByteDance - Software Engineer Graduate (Multi Cloud CDN) - 2025 Start (BS/MS)

ByteDance

Seattle, Washington, United States (On-Site)
• 3 Months ago
ION - Trading Support Analyst, Jersey City - 9546

ION

Jersey City, New Jersey, United States (On-Site)
• 3 Months ago
Onward Search - Account Director

Onward Search

Los Angeles, California, United States (On-Site)
• 4 Weeks ago
Palo Alto Networks - Domain Consultant - Security Operations Transformation

Palo Alto Networks

Bridgeport, Connecticut, United States (Remote)
• 2 Months ago
PlayStation Global - Corporate Counsel, Product

PlayStation Global

Aliso Viejo, California, United States (Hybrid)
• 3 Days ago
Onward Search - Inside Sales Representative

Onward Search

Pittsburgh, Pennsylvania, United States (On-Site)
• 3 Weeks ago
Match Group - Staff Product Manager, Machine Learning and Recommendations

Match Group

San Francisco, California, United States (Hybrid)
• 3 Months ago
Keywords Studios (Player Support) - FCE Test Tech III

Keywords Studios (Player Support)

Colorado, United States (On-Site)
• 1 Month ago
The Walt Disney Company - Lead Software Engineer, Android

The Walt Disney Company

Santa Monica, California, United States (On-Site)
• 2 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

USE Insider - DevOps Engineer

USE Insider

Ä°stanbul, Ä°stanbul, TĂĽrkiye (Remote)
• 2 Months ago
Unity - Senior Data Infrastructure Engineer

Unity

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
• 3 Months ago
The Walt Disney Company - Lead Software Engineer (Identity)

The Walt Disney Company

San Francisco, California, United States (On-Site)
• 2 Months ago
ION - Site Reliability Engineer

ION

Collecchio, Emilia-Romagna, Italy (Hybrid)
• 3 Months ago
Omnissa - Member of technical staff (C++,iOS)

Omnissa

Bengaluru, Karnataka, India (Hybrid)
• 3 Months ago
PwC - IN_Associate_Infrastructure Engineer_OneCloud_Advisory_Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
• 4 Months ago
Microsoft - Senior Technical Program Manager

Microsoft

Prague, Prague, Czechia (On-Site)
• 1 Month ago
Warner Bros Discovery - Staff Software Engineer - AWS Architecture (Observability Team),Bangalore

Warner Bros Discovery

Bengaluru, Karnataka, India (On-Site)
• 2 Months ago
Garena - Garena - Data Ops Engineer

Garena

Taipei City, Taiwan (On-Site)
• 3 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Explore gaming industy jobs in one of the leading Game Studios.

Frisco, Texas, United States (On-Site)

Frisco, Texas, United States (On-Site)

Texas, United States (On-Site)

Frisco, Texas, United States (On-Site)

Frisco, Texas, United States (Remote)

Frisco, Texas, United States (On-Site)

Frisco, Texas, United States (On-Site)

Frisco, Texas, United States (On-Site)

Frisco, Texas, United States (On-Site)

Frisco, Texas, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Gearbox Software

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug