Senior Site Reliability Engineer

1 Month ago • 5 Years + • Devops • $190,800 PA - $267,100 PA

Job Summary

Job Description

Reddit is seeking a Senior Site Reliability Engineer to join their Infrastructure SRE team. This role involves improving the reliability and performance of Reddit's engineering platforms and services by working with distributed systems and architecture. The SRE will be responsible for owning a suite of tools for engineers to understand their creations, primarily using open-source solutions like Prometheus, Thanos, Grafana, and Vector. Key responsibilities include advising on system design, amplifying capabilities of infrastructure services, automating manual tasks, diagnosing and fixing system issues, and optimizing performance and cost. The role requires experience in software engineering, site reliability engineering, or DevOps, proficiency in programming languages (Go, Python), Kubernetes, cloud systems, and distributed systems development.
Must have:
  • 5+ years of experience in SRE or DevOps
  • Proficiency in Go or Python
  • Experience with Kubernetes and Cloud
  • Experience with high-traffic backend systems
  • Linux and container knowledge
  • Troubleshooting skills (applications, networking, systems)
Good to have:
  • Familiarity with Prometheus, Thanos, Grafana, Vector, Clickhouse, Otel, Loki
  • Experience with distributed systems development
Perks:
  • Comprehensive Healthcare Benefits
  • 401k Matching
  • Workspace benefits for home office
  • Personal & Professional development funds
  • Family Planning Support
  • Flexible Vacation
  • Reddit Global Wellness Days
  • 4+ months paid Parental Leave
  • Paid Volunteer time off

Job Details

Reddit is a community of communities. It’s built on shared interests, passion, and trust and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 101M+ daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit redditinc.com.

Reddit SRE is rapidly innovating and our teams are working to meet the needs of infrastructure and development teams as they evolve our product faster than ever before. This is a unique opportunity to leave your mark on one of the most influential and trafficked corners of the internet.

As a Senior Site Reliability Engineer on Reddit’s Infrastructure SRE team, you’ll use your knowledge of distributed systems and architecture to improve the reliability and performance of Reddit’s engineering platforms and services. We are looking for someone who thrives at the intersection of infrastructure and software development. This team will work very closely with the Compute, Traffic, and Observability infrastructure teams. They will own a suite of tools for allowing engineers to understand their creations, based primarily on open-source solutions at scale. We’re active users of and contributors to Prometheus, Thanos, Grafana, Vector and more.

In this role, you will also take ownership of risk management, ensuring the reliability and performance of our systems. You will collaborate with cross-functional teams to identify, assess, and mitigate risks, implementing best practices to enhance system resilience. Your expertise will drive proactive measures to maintain uptime and optimize service delivery, making a significant impact on our operational excellence.

Join us and help build the future of Reddit!

Responsibilities:

  • Advise
    • Work closely with engineering teams in designing and developing systems that are resilient and highly performant at a tremendous scale, and maintaining the foundational platform for running Reddit’s infrastructure.
  • Amplify
    • Identify and build capabilities into our foundational Infrastructure and Platform services, which are used by Reddit engineering teams to build, deploy, and operate Reddit. 
    • Deliver software to improve the availability, scalability, latency, and efficiency of observability components.
    • Identify and engineer away risk across Reddit’s systems.
  • Automate
    • Take repetitive, manual, or risky tasks and automate them out of existence. Build tools and integrate systems to support Reddit’s evolution.
    • Automate critical aspects of the event driven development process
  • Diagnose
    • Draw on your knowledge of distributed systems to identify and fix network, system, and service-level issues. Practice sustainable incident response, and drive structural improvement with blameless postmortem.
    • Share on-call responsibilities. 
  • Optimize:
    • Observe and improve performance, reduce cost, and improve the experience for millions of users
    • Contribute upstream changes to the open source projects we use

Qualifications

  • 5+ years of experience in Software Engineering, Site Reliability Engineering, or a development-focused DevOps role.
  • Proficiency in one or more programming languages. We’re predominantly writing code in Go and Python.
  • Experience with Kubernetes and Cloud systems.
  • Familiarity with distributed systems development, bonus if familiar with any of the specific tools (Prometheus, Thanos, Grafana, Vector, Clickhouse, Otel, Loki)
  • Experience with the development and operation of high-traffic backend systems.
  • A demonstrated ability to debug, fix, and optimize code.
  • Troubleshooting skills that span applications, networking (TCP/IP), and systems.
  • Strong working knowledge of Linux and containers.
  • Excellent communication and collaborative skills.

Benefits:

  • Comprehensive Healthcare Benefits
  • 401k Matching
  • Workspace benefits for your home office
  • Personal & Professional development funds
  • Family Planning Support
  • Flexible Vacation (please use them!) & Reddit Global Wellness Days
  • 4+ months paid Parental Leave
  • Paid Volunteer time off

Pay Transparency:

This job posting may span more than one career level.

In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/.

To provide greater transparency to candidates, we share base pay ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below.

The base pay range for this position is:

$190,800 - $267,100 USD

Reddit is proud to be an equal opportunity employer, and is committed to building a workforce representative of the diverse communities we serve.  Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If, due to a disability, you need an accommodation during the interview process, please let your recruiter know.

Similar Jobs

bytedance - AI Developer Community Operations Expert

bytedance

(On-Site)
4 Months ago
Single Store - Director of Customer Marketing and Analyst Relations

Single Store

Raleigh, North Carolina, United States (Remote)
1 Month ago
PayPal - Director, Large Enterprise Commercial

PayPal

San Jose, California, United States (Hybrid)
2 Months ago
Accenture - Software Development Engineer

Accenture

Chennai, Tamil Nadu, India (On-Site)
3 Months ago
Eneba Games - Technical Content Operations Specialist

Eneba Games

Lithuania (Remote)
1 Month ago
Canva - Staff Frontend Engineer - Apps API Platform

Canva

Auckland, Auckland, New Zealand (Remote)
4 Months ago
bytedance - Site Reliability Engineer, Edge Services

bytedance

Seattle, Washington, United States (On-Site)
4 Months ago
Sailpoint - Manager, DevOps (AWS Infrastructure)

Sailpoint

Austin, Texas, United States (On-Site)
2 Months ago
Brillio - Enterprise Architect, AWS - R01535258

Brillio

Bengaluru, Karnataka, India (Hybrid)
9 Months ago
Trellix - Principal Engineer – Developer Enablement & CI/CD Strategy

Trellix

Bengaluru, Karnataka, India (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Autodesk - Senior Machine Learning Engineer

Autodesk

Bengaluru, Karnataka, India (On-Site)
1 Month ago
USE Insider - Senior Front-end Developer

USE Insider

Istanbul, İstanbul, Türkiye (Remote)
1 Month ago
Epic Games - Product Director

Epic Games

(On-Site)
7 Months ago
Ziff Davis - Marketing Events Director

Ziff Davis

United States (Remote)
2 Months ago
Nintendo - Senior Manager, Experiential Marketing

Nintendo

Redmond, Washington, United States (Hybrid)
1 Year ago
cirrus logic - Product Marketing Manager

cirrus logic

Austin, Texas, United States (Hybrid)
3 Months ago
Razer - Senior Category Specialist

Razer

Singapore (On-Site)
2 Months ago
Qualcomm - Senior ASIC Platform Design Engineer

Qualcomm

Colombes, Île-de-France, France (On-Site)
2 Months ago
moonee games - Product Manager - Live Games

moonee games

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
2 Months ago
Activision - Producer- Live Ops, Call of Duty

Activision

Santa Monica, California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in New York, New York, United States

Apple - Enterprise Regional Manager (U.S. Major Accounts)

Apple

New York, New York, United States (On-Site)
2 Months ago
Dynamis Inc - Assistant Test/Safety Observer

Dynamis Inc

Huntsville, Alabama, United States (On-Site)
1 Month ago
Bot VFX  - Associate Vice President - Finance Operations

Bot VFX

Atlanta, Georgia, United States (On-Site)
2 Months ago
MiQ - Account Manager

MiQ

Denver, Colorado, United States (Hybrid)
2 Months ago
nissan - Warehouse Operator -Lebanon

nissan

Lebanon, Tennessee, United States (On-Site)
10 Months ago
Next Level Business Services - Hadoop AWS Developer

Next Level Business Services

Beaverton, Oregon, United States (On-Site)
9 Months ago
Apple - Software Engineer

Apple

Cupertino, California, United States (On-Site)
1 Month ago
Apple - HomeKit Software Engineer

Apple

Cupertino, California, United States (On-Site)
2 Months ago
Gigamon - Regional Sales Director, Federal Systems Integrators

Gigamon

Vienna, Virginia, United States (On-Site)
2 Months ago
Fox Factory - Sr Director, Procurement

Fox Factory

Gainesville, Georgia, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Devops Jobs

HCL Tech - Technical Lead DevOps, Python, Kubernetes

HCL Tech

California, United States (On-Site)
1 Month ago
GoTo Group - Principal SRE Engineer (SE5)

GoTo Group

Gurugram, Haryana, India (On-Site)
9 Months ago
appier - Technical Solution Engineer

appier

Beijing, China (On-Site)
2 Months ago
PhonePe - Site Reliability Engineer - CDN

PhonePe

Bengaluru, Karnataka, India (On-Site)
8 Months ago
Google - Software Engineer III, Infrastructure, Google Cloud AI

Google

Kirkland, Washington, United States (On-Site)
9 Months ago
Blinkhealth - Senior Cloud Engineer

Blinkhealth

(Remote)
3 Months ago
bytedance - Software Engineer-Infrastructure Delivery Platform

bytedance

San Jose, California, United States (On-Site)
2 Months ago
Qualcomm - Senior Devops Engineer

Qualcomm

Hyderabad, Telangana, India (On-Site)
2 Months ago
HCL Tech - ET - Solution Architect

HCL Tech

California, United States (On-Site)
2 Months ago
bytedance - Senior Security Software Architect - Security Engineering - San Jose

bytedance

San Jose, California, United States (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded