Senior Site Reliability Engineer

1 Month ago • 5 Years + • Devops

Job Summary

Job Description

Reddit is seeking a Senior Site Reliability Engineer to join their Infrastructure SRE team. The role involves improving the reliability and performance of Reddit's engineering platforms and services by leveraging knowledge of distributed systems and architecture. Responsibilities include advising engineering teams on system design, amplifying capabilities of infrastructure and platform services, automating repetitive tasks, diagnosing and fixing system issues, and optimizing performance and cost. The engineer will also own risk management, ensuring system resilience and implementing best practices. This position offers an opportunity to impact one of the internet's largest sources of information.
Must have:
  • 5+ years of experience in SRE or DevOps
  • Proficiency in Go or Python
  • Experience with Kubernetes and Cloud systems
  • Knowledge of distributed systems
  • Experience debugging and optimizing code
  • Troubleshooting skills (applications, networking, systems)
  • Strong Linux and container knowledge
  • Excellent communication and collaboration skills
Good to have:
  • Familiarity with Prometheus, Thanos, Grafana, Vector, Clickhouse, Otel, Loki
  • Experience with high-traffic backend systems
Perks:
  • Pension Savings plan
  • Medical Plan
  • Short term sickness benefits
  • WIA excess and WGA gap insurance
  • Workspace benefits for your home office
  • Personal & Professional development funds
  • Family Planning Support
  • Flexible Vacation & Reddit Global Days Off

Job Details

Reddit is a community of communities. It’s built on shared interests, passion, and trust and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 101M+ daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit redditinc.com.

Reddit SRE is rapidly innovating and our teams are working to meet the needs of infrastructure and development teams as they evolve our product faster than ever before. This is a unique opportunity to leave your mark on one of the most influential and trafficked corners of the internet.

As a Senior Site Reliability Engineer on Reddit’s Infrastructure SRE team, you’ll use your knowledge of distributed systems and architecture to improve the reliability and performance of Reddit’s engineering platforms and services. We are looking for someone who thrives at the intersection of infrastructure and software development. This team will work very closely with the Compute, Traffic, and Observability infrastructure teams. They will own a suite of tools for allowing engineers to understand their creations, based primarily on open-source solutions at scale. We’re active users of and contributors to Prometheus, Thanos, Grafana, Vector and more.

In this role, you will also take ownership of risk management, ensuring the reliability and performance of our systems. You will collaborate with cross-functional teams to identify, assess, and mitigate risks, implementing best practices to enhance system resilience. Your expertise will drive proactive measures to maintain uptime and optimize service delivery, making a significant impact on our operational excellence.

Join us and help build the future of Reddit!

Responsibilities:

  • Advise
    • Work closely with engineering teams in designing and developing systems that are resilient and highly performant at a tremendous scale, and maintaining the foundational platform for running Reddit’s infrastructure.
  • Amplify
    • Identify and build capabilities into our foundational Infrastructure and Platform services, which are used by Reddit engineering teams to build, deploy, and operate Reddit. 
    • Deliver software to improve the availability, scalability, latency, and efficiency of observability components.
    • Identify and engineer away risk across Reddit’s systems.
  • Automate
    • Take repetitive, manual, or risky tasks and automate them out of existence. Build tools and integrate systems to support Reddit’s evolution.
    • Automate critical aspects of the event driven development process
  • Diagnose
    • Draw on your knowledge of distributed systems to identify and fix network, system, and service-level issues. Practice sustainable incident response, and drive structural improvement with blameless postmortem.
    • Share on-call responsibilities. 
  • Optimize:
    • Observe and improve performance, reduce cost, and improve the experience for millions of users
    • Contribute upstream changes to the open source projects we use

Qualifications

  • 5+ years of experience in Software Engineering, Site Reliability Engineering, or a development-focused DevOps role.
  • Proficiency in one or more programming languages. We’re predominantly writing code in Go and Python.
  • Experience with Kubernetes and Cloud systems.
  • Familiarity with distributed systems development, bonus if familiar with any of the specific tools (Prometheus, Thanos, Grafana, Vector, Clickhouse, Otel, Loki)
  • Experience with the development and operation of high-traffic backend systems.
  • A demonstrated ability to debug, fix, and optimize code.
  • Troubleshooting skills that span applications, networking (TCP/IP), and systems.
  • Strong working knowledge of Linux and containers.
  • Excellent communication and collaborative skills.

Benefits:

  • Pension Savings plan 
  • Medical Plan
  • Short term sickness benefits 
  • WIA excess and WGA gap insurance 
  • Workspace benefits for your home office 
  • Personal & Professional development funds
  • Family Planning Support 
  • Flexible Vacation & Reddit Global Days Off

Reddit is proud to be an equal opportunity employer, and is committed to building a workforce representative of the diverse communities we serve.  Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If, due to a disability, you need an accommodation during the interview process, please let your recruiter know.

Similar Jobs

ElevenLabs - Risk & Compliance

ElevenLabs

United Kingdom (Remote)
4 Months ago
Rippling - Revenue Transformation Lead

Rippling

San Francisco, California, United States (On-Site)
1 Month ago
Tide - Lead Product Manager - Banking Payments

Tide

Sofia, Sofia City Province, Bulgaria (Hybrid)
2 Months ago
Qualcomm - Synthesis Engineer, Staff

Qualcomm

Bengaluru, Karnataka, India (On-Site)
2 Months ago
truecaller - Data Architect

truecaller

Stockholm, Stockholm County, Sweden (On-Site)
8 Months ago
ShyftLabs - Senior Oracle Fusion Cloud Integration Developer

ShyftLabs

Noida, Uttar Pradesh, India (Hybrid)
2 Months ago
Visa - Sr. Site Reliability Engineer

Visa

Ashburn, Virginia, United States (Hybrid)
2 Months ago
Loft Orbital - Space Infrastructure Software Engineer

Loft Orbital

San Francisco, California, United States (On-Site)
5 Months ago
London stock Exchange - Principal Platform Engineer

London stock Exchange

London, England, United Kingdom (On-Site)
2 Months ago
Canva - Senior Frontend Engineer - Apps API Platform

Canva

Auckland, Auckland, New Zealand (Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Roof Stacks - Scrum Master

Roof Stacks

Istanbul, İstanbul, Türkiye (Hybrid)
3 Months ago
GoTo Group - Senior Collection Strategy System Analyst

GoTo Group

Jakarta, Indonesia (On-Site)
2 Months ago
Greenworks Sunrise Global Marketing - Regional Sales Manager - Central

Greenworks Sunrise Global Marketing

United States (On-Site)
1 Month ago
OKX - Head of FinCrime, Internal Audit

OKX

San Jose, California, United States (On-Site)
2 Months ago
Super.com - Senior Full-Stack Software Engineer ( Remote! )

Super.com

Chicago, Illinois, United States (Remote)
9 Months ago
bytedance - Country Director, Financial Services - Global Payment (TH)

bytedance

Bangkok, Bangkok, Thailand (On-Site)
3 Months ago
TTC Global - Test Architect

TTC Global

Naperville, Illinois, United States (On-Site)
2 Months ago
Ion - Product Analyst/Associate, Italy

Ion

Italy (Hybrid)
9 Months ago
Google - Vice President, Product Management and Engagement, Core Developer

Google

Sunnyvale, California, United States (On-Site)
3 Months ago
Sporty - Director of IT & Security

Sporty

(Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Amsterdam, North Holland, Netherlands

Beyond Sports - Office Manager

Beyond Sports

Alkmaar, North Holland, Netherlands (On-Site)
3 Months ago
PUBG EMEA - Publishing Manager

PUBG EMEA

Amsterdam, North Holland, Netherlands (On-Site)
3 Months ago
Adyen - Senior Database Engineer (PostgreSQL)

Adyen

Amsterdam, North Holland, Netherlands (On-Site)
1 Month ago
Mattel Inc - Associate Manager, Licensing

Mattel Inc

Amstelveen, North Holland, Netherlands (On-Site)
1 Month ago
grendel games - Marketing Manager

grendel games

Leeuwarden, Friesland, Netherlands (Hybrid)
2 Months ago
Mendix - Business Development Representative - EMEA

Mendix

Rotterdam, South Holland, Netherlands (Hybrid)
1 Month ago
Palo Alto Networks - Consulting Director, Proactive Services

Palo Alto Networks

Netherlands (Remote)
3 Weeks ago
Palo Alto Networks - Principal Consultant, Incident Preparedness

Palo Alto Networks

Netherlands (Remote)
2 Months ago
GamePoint - International Marketing Internship - (Mobile) Games

GamePoint

The Hague, South Holland, Netherlands (On-Site)
2 Months ago
YouGov - Account Manager

YouGov

Breda, North Brabant, Netherlands (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Intel  - Senior Infrastructure Engineer - Storage

Intel

Phoenix, Arizona, United States (On-Site)
1 Month ago
Capgemini - AWS Architect

Capgemini

Mumbai, Maharashtra, India (On-Site)
2 Months ago
Nagarro - Associate Principal Engineer, DevOps

Nagarro

India (Remote)
9 Months ago
Extreme Inc. - Infrastructure Engineer (Headquarters)

Extreme Inc.

Toshima City, Tokyo, Japan (On-Site)
3 Months ago
Accurate - Cloud Engineering Architect

Accurate

United States (Remote)
2 Months ago
Epic Games - Senior DevOps Programmer

Epic Games

Canada (On-Site)
3 Months ago
Dream Sports - Software Development Engineer 3 - Backend (Platform)

Dream Sports

Mumbai, Maharashtra, India (On-Site)
4 Months ago
Simcorp - System Architect DevOps

Simcorp

Warsaw, Masovian Voivodeship, Poland (Hybrid)
1 Month ago
Riot Games - Senior Software Engineer, Services - Esports Platform & Experiences

Riot Games

Los Angeles, California, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded