Senior Site Reliability Engineer, Compute SRE

1 Month ago • 6 Years + • Devops • $192,890 PA - $238,520 PA

Job Summary

Job Description

Roblox is seeking skilled Site Reliability Engineers to join the Infra Compute SRE team. The role involves owning and managing the operation of the cell infrastructure system, including service discovery and secrets management. Responsibilities include designing and developing fault-tolerant systems, promoting reliability best practices, building automation for the Roblox ecosystem, creating tooling for production guardrails, and developing performance monitoring services. Candidates should have at least 6 years of experience as an SRE or Software Engineer, fluency in programming languages like Go, Java, and C#, and experience with Kubernetes or similar orchestration systems.
Must have:
  • Bachelor's degree in CS or related field, or equivalent experience
  • At least 6 years as an SRE or Software Engineer
  • Fluency in Go, Java, and C#
  • Experience with Kubernetes or similar orchestration systems
  • Experience in building and adopting software and tools
  • Deeply reliable code systems
Good to have:
  • Experience in Nomad, Vault, and Consul
Perks:
  • Equity compensation
  • Benefits

Job Details

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators. 

At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device. We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there. 

A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone.

What You’ll Do:

The Infra Compute SRE mission is to own and manage the successful operation of our underlying cell infrastructure system, along with elements of service discovery, secrets management and related software layers. We’re looking for skilled Site Reliability Engineers with strong programming skills to help us build Roblox's private cloud, productionize our growing Kubernetes-based infrastructure, and institute reliability best practices across the Roblox Compute team.

You Will:

  • Design and Develop systems & libraries that promote fault-tolerance and resilience, automate much of the management and lifecycle of our clusters, and ensure systems are observable.
  • Promote and Institute reliability best practices across the Infra Compute group, drive common reliability initiatives, provide collaborative technical reviews and operational guidance to strengthen system reliability.
  • Build, Automate and Standardize process automation to create a "golden path" of tooling and platform support that powers the fundamental Roblox ecosystem.
  • Create Tooling that provides production guardrails by evaluating release candidate capacity with load testing tooling before deploying to production.
  • Create Performance Monitoring Services and observability towards understanding capacity issues and platform degradations, monitoring production services and their changes, like generalized canarying services with alerting.
  • Analyze systems and system designs for production readiness

You Have:

  • A Bachelor's degree (or equivalent professional experience) in Computer Science or related engineering field with a proven track record including at least 6 years as an SRE or Software Engineer.
  • Fluency with high-level programming languages like Go, Java, and C#.
  • Experience with Kubernetes, or similar orchestration systems. Experience in Nomad, Vault, and Consul is strongly desired.
  • Experience and good habits around building software and tools and getting them adopted. Your system's focus advises a view of code needing to be deeply reliable.

You Are:

  • A Partner: You know that the best tools integrate broadly with the tooling ecosystem. You approach partners and processes with curiosity and seek to understand a problem deeply before you start coding.
  • A Developer: You love building durable and reliable complex systems.
  • Passionate about problem-solving, finding creative work solutions, and addressing unexpected challenges as part of a team.
  • Problem Solver: You ask the right questions to tackle issues within your expertise and you use data to test your theories.
  • Planner: You have experience in large project lifecycles. You have experience working in sprints, breaking down complex tasks into achievements, and reporting status to keep project scheduling accurate.

For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits as described on this page.

Annual Salary Range
$192,890$238,520 USD

Roles that are based in our San Mateo, CA Headquarters are in-office Tuesday, Wednesday, and Thursday, with optional in-office on Monday and Friday (unless otherwise noted).

Roblox provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. Roblox also provides reasonable accommodations for all candidates during the interview process.

Similar Jobs

hogarth - QA Analyst

hogarth

Hyderabad, Telangana, India (Hybrid)
1 Month ago
cyara - Sales Value Consultant

cyara

United States (Remote)
4 Months ago
cyara - Associate Customer Success Manager

cyara

United States (Remote)
10 Months ago
Niantic - Software Engineer, Server

Niantic

Bellevue, Washington, United States (Hybrid)
1 Month ago
cyara - Customer Success Manager - West

cyara

United States (Remote)
10 Months ago
Glean - Partner Solutions Engineer

Glean

Japan (Remote)
2 Months ago
Granicus - Senior Site Reliability Engineer (AWS, AI/ML, & APM)

Granicus

United States (Remote)
1 Month ago
SSC Technologies - Cloud Architect

SSC Technologies

Basildon, England, United Kingdom (On-Site)
1 Month ago
appier - Senior Software Engineer, Frontend Development (Personalization Cloud)

appier

Taipei City, Taiwan (On-Site)
2 Months ago
binance - Senior DevOps Engineer (Blockchain)

binance

Bangkok, Thailand (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Lockwood - QA Engineer

Lockwood

Nottingham, England, United Kingdom (On-Site)
2 Months ago
Accenture - Test Automation Lead

Accenture

Bengaluru, Karnataka, India (On-Site)
3 Months ago
1047 games - Infrastructure Engineer

1047 games

(Remote)
2 Weeks ago
QuinStreet - QA Engineer - Automation - Contract - 12 months

QuinStreet

Pune, Maharashtra, India (On-Site)
4 Weeks ago
cyara - Customer Success Manager

cyara

United States (Remote)
10 Months ago
WerPlay - Senior QA Engineer

WerPlay

Islamabad, Islamabad Capital Territory, Pakistan (On-Site)
7 Months ago
Ajmera Infotech - Senior .NET Developer with Cloud Expertise

Ajmera Infotech

Ahmedabad, Gujarat, India (On-Site)
1 Month ago
Abridge - Senior Software Engineer, SRE

Abridge

San Francisco, California, United States (Hybrid)
2 Months ago
Ajmera Infotech - Senior .NET Developer with Cloud Expertise (On-site only)

Ajmera Infotech

Ahmedabad, Gujarat, India (On-Site)
1 Month ago
Wolters Kluwer - Software Test Automation Engineer

Wolters Kluwer

Chennai, Tamil Nadu, India (Hybrid)
1 Week ago

Get notifed when new similar jobs are uploaded

Jobs in San Mateo, California, United States

Shield AI - Staff Engineer, Autonomy Integration (R3492)

Shield AI

Washington, District Of Columbia, United States (On-Site)
4 Days ago
Harvey - Engineering Manager, Product

Harvey

New York, United States (On-Site)
6 Days ago
Reddit - Data Science Manager, User Growth

Reddit

United States (Remote)
2 Months ago
Hedra - Lead Product Engineer

Hedra

San Francisco, California, United States (On-Site)
4 Months ago
Rippling - Software Engineer II - Spend Management

Rippling

New York, United States (Hybrid)
2 Days ago
2K - Senior UI/UX Designer

2K

San Mateo, California, United States (On-Site)
3 Weeks ago
Mindtickle - Strategic Account Executive

Mindtickle

United States (Remote)
2 Weeks ago
WebFX - Jr. Content Marketer

WebFX

Harrisburg, Pennsylvania, United States (On-Site)
9 Months ago
broadcom - Mainframe Software Technical Support Engineer

broadcom

Plano, Texas, United States (On-Site)
1 Year ago
Nordson Corporation - Facilities/Maintenance Technician II (Second Shift)

Nordson Corporation

Salem, New Hampshire, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

bytedance - Software Engineer (ElasticSearch / OpenSearch) - Cloud Infrastructure- San Jose

bytedance

San Jose, California, United States (On-Site)
8 Months ago
Capgemini - AWS Engineer

Capgemini

Gurugram, Haryana, India (On-Site)
3 Months ago
Nagarro - SAP SuccessFactors Solution Architect with German

Nagarro

Romania (Remote)
9 Months ago
 Many Chat  Inc  - Cloud Infrastructure Engineer (AWS / Kubernetes / SRE)

Many Chat Inc

Amsterdam, North Holland, Netherlands (Hybrid)
5 Days ago
bounteous - Salesforce Marketing Cloud Architect

bounteous

Chennai, Tamil Nadu, India (Hybrid)
1 Year ago
Rackner - Solution Architect Lead

Rackner

(Remote)
2 Months ago
Crunchyroll - Staff Site Reliability Engineer

Crunchyroll

Mexico City, Mexico City, Mexico (On-Site)
8 Months ago
Next Level Business Services - Salesforce Marketing cloud Developer

Next Level Business Services

Boston, Massachusetts, United States (On-Site)
9 Months ago
Canonical - Software Engineer - Cross-platform C++ - Multipass

Canonical

(Remote)
2 Months ago
bytedance - Cloud Solution Architect, BytePlus

bytedance

Singapore (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

About The Company

San Mateo, California, United States (Hybrid)

San Mateo, California, United States (On-Site)

Gurugram, Haryana, India (On-Site)

San Mateo, California, United States (Hybrid)

San Mateo, California, United States (Hybrid)

San Mateo, California, United States (Remote)

San Mateo, California, United States (On-Site)

San Mateo, California, United States (Hybrid)

San Mateo, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Roblox

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug