Staff Software Engineer, Storage

Reddit

Job Summary

Reddit is a community-driven platform with over 100,000 active communities and 116 million daily active users. This Staff Software Engineer, Storage role focuses on building and evolving control and data planes, improving underlying systems, and automating large-scale storage infrastructure. The position involves deep dives into storage system codebases, implementing complex modifications, and collaborating with product teams to optimize data models and access patterns for Reddit's main workloads, ensuring scalability and efficiency.

Must Have

  • Design, write, and deliver software to improve availability, scalability, latency, and efficiency in Go, C++, and Python.
  • Dive deep into codebase of supported storage systems to understand system internals.
  • Make system level improvements, enhancements, and implement complex code modifications.
  • Engage actively with the open-source community to implement and upstream changes.
  • Contribute to design and implementation of high-performance, large-scale distributed storage systems.
  • Collaborate closely with engineering teams and stakeholders.
  • 7+ years experience building internet-scale software, preferably with machine learning storage infrastructure.
  • Software development experience in Golang, Python, C++, Java.
  • Hands-on experience implementing features, optimizations, and bug fixes to distributed storage systems.
  • Experience contributing code improvements, features, and bug fixes to open-source projects.
  • Excellent communication skills to collaborate with a service-oriented team and company.

Good to Have

  • Prior experience with operating a large scale critical infrastructure system with a focus on automation and workflow development.
  • Experience being on call.

Perks & Benefits

  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k Match
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Reddit Global Days off
  • Generous paid Parental Leave
  • Paid Volunteer time off
  • Equity in the form of restricted stock units

Job Description

The position is a blend of software engineering and systems engineering, with a strong focus on building and evolving control and data planes, improving underlying systems, and writing software that implements critical workflows to automate and enhance the operation of our large-scale storage infrastructure.

We also work with our product teams to make the underlying Storage technologies work better for Reddit’s main workloads, including defining and improving the application’s data models and data access patterns, and lots of times, this also involves data-driven analysis and tuning of the Storage stack to make step function improvements to the end-to-end data path. There is a substantial amount of troubleshooting and analysis to understand the real problems we face as we scale, and our customer’s workloads change.

A successful candidate will;

  • Design, write, and deliver software to improve the availability, scalability, latency, and efficiency of Reddit’s products in Go, C++ and sometimes Python.
  • Dive deep into the codebase of supported storage systems to understand system internals.
  • Be able to make system level improvements, enhancements and implement complex code modifications.
  • Engage actively with the open-source community to implement and upstream changes to the OSS codebase.
  • Contribute to the design and implementation of high-performance, large-scale distributed storage systems to power various use cases at Reddit.
  • Collaborate closely with engineering teams and stakeholders to integrate storage capabilities into broader storage infrastructure and use cases across Reddit.
  • Mentor and guide other engineers on how to design, build and evangelize vector storage services across Reddit

Who You Might Be:

  • 7+ years of experience building internet-scale software, preferably with a focus on machine learning storage infrastructure.
  • Software development experience in one or more general purpose programming languages; Golang, Python, C++, Java
  • Hands-on experience implementing features, optimizations, and bug fixes to distributed storage systems.
  • Experience contributing code improvements, features and bug fixes to open-source (OSS) projects.
  • Prior experience with operating a large scale critical infrastructure system with a focus on automation and workflow development is a plus, especially in a role where they were required to be on call.
  • Excellent communication skills to collaborate with a service-oriented team and company.

Benefits:

  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k Match
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Reddit Global Days off
  • Generous paid Parental Leave
  • Paid Volunteer time off

7 Skills Required For This Role

Communication Problem Solving Cpp Game Texts Python Java Machine Learning