Senior Site Reliability Engineer

1 Month ago • All levels • Devops • $150,000 PA - $195,000 PA

Job Summary

Job Description

AuthZed is seeking a Site Reliability Engineer to ensure the reliability, availability, and performance of their authorization solutions. The role involves designing, implementing, and maintaining scalable infrastructure, monitoring system performance, and automating deployment processes. Responsibilities include troubleshooting complex issues, collaborating with engineering teams, participating in on-call rotations, and documenting systems. This is a fully remote position within a growing tech startup focused on open-source authorization solutions for businesses.
Must have:
  • Site Reliability Engineer experience
  • Strong understanding of networking
  • Strong understanding of operating systems
  • Strong understanding of cloud infrastructure
  • Experience with Site Reliability Engineering
  • Experience with System Design
  • Experience with Distributed Computing
  • Experience with Docker
  • Experience with Kubernetes
  • Experience with infrastructure-as-code tools
  • Experience with monitoring and logging tools
  • Experience with Git and GitHub
  • Strong problem-solving skills
  • Excellent communication skills
Good to have:
  • Experience with Terraform
  • Experience with Pulumi
  • Experience with Prometheus
  • Experience with Grafana
  • Experience with ELK stack
  • Experience with distributed SQL databases
  • Experience with continuous integration and deployment

Job Details

About authzed

We’re pioneering open-source authorization solutions for scaling businesses tackling complex end-user permissions in zero-trust architectures. Our focus is on providing SpiceDB—the most mature open-source permissions database inspired by Google’s Zanzibar system—and building managed services that enable planet-scale production authorization services.

Our strategic approach to capital-raising has empowered us to efficiently utilize our $3.9M seed fund and recently secure a $12M Series A. This funding has allowed us to further develop SpiceDB, now the open-source standard in authorization database technology, fortify our reputation as authorization experts, accelerate our open-source community growth, and scale revenue with robust enterprise products.

AuthZed is a fully remote company with employees across the US and Europe. We’re a hardworking group with a software-driven culture; even our sales team understands and loves our technology! We bring integrity to all our interactions, fostering confidence in decision making - trusting and respecting each voice on our team, every day.

Company Values

  • Agency
    • Everyone should have the capability, freedom, and confidence to bring about changes to our business and product. Organizational processes exist to clearly define our goals, but not restrict how progress is made.
  • Collaboration
    • Success is defined in various dimensions and no single person can be an expert in all of them. Without valuing the opinions of others, finding compromises, and sharing mutual trust and respect, you cannot arrive at the best possible solution.
  • Open-mindness
    • Without asking questions, testing assumptions, and questioning our pre-existing biases we risk operating within an echo-chamber. We celebrate the representation of diverse perspectives and backgrounds as a catalyst for creating an inclusive work environment that everyone can appreciate.
About the role
Skills: Git, Kubernetes, SQL, Distributed Systems

Job Summary:

We are seeking a Site Reliability Engineer to join our tech startup in the infrastructure and authorization space. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of our systems. You will be responsible for designing, implementing, and maintaining scalable infrastructure solutions to support our growing customer base. This is an exciting opportunity to work in a fast-paced environment and contribute to the success of a company bringing a Google-inspired authorization system to companies around the globe.

Responsibilities:

  • Design, implement, and maintain highly available and scalable infrastructure solutions for our projects, products, and customers.
  • Monitor and analyze system performance, identifying and resolving bottlenecks and issues to ensure optimal performance and reliability.
  • Automate infrastructure deployment and configuration management processes.
  • Continuously improve system reliability, security, and efficiency through proactive monitoring, capacity planning, and performance tuning.
  • Troubleshoot and resolve complex infrastructure and application issues in production and test environments.
  • Collaborate with software engineering teams to design and implement systems that are resilient, scalable, and secure.
  • Participate in on-call rotation and respond to production incidents in a timely manner.
  • Document system configurations, troubleshooting procedures, and operational guidelines.

Requirements:

  • Proven experience as a Site Reliability Engineer or in a similar role.
  • Strong understanding of networking, operating systems, and cloud infrastructure.
  • Experience with Site Reliability Engineering, System Design, and Distributed Computing.
  • Experience in various programming languages — we currently have SDKs for NodeJS, Java, Python, Ruby, and Go.
  • Experience with containerization technologies such as Docker and Kubernetes.
  • Knowledge of infrastructure-as-code tools like Terraform and Pulumi.
  • Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Experience with lower-level implementation details of relational databases (bonus if you have have experience with distributed SQL databased like Google Cloud Spanner or CockroachDB).
  • Experience working with Git and GitHub.
  • Experience with continuous integration and deployment systems.
  • Strong problem-solving and troubleshooting skills.
  • Excellent communication and collaboration abilities.
Technology

Given our background, we build upon a foundation of using open source, cloud-native solutions to deliver our products.

We've given some webinars discussing parts of our stack:

Here are some keywords:

  • Go
  • TypeScript
  • Kubernetes
  • Kubernetes Operators
  • NextJS
  • Pulumi
  • CockroachDB
  • Cloud Spanner
  • PostgreSQL
  • Prometheus
  • Thanos
  • ArgoCD

Similar Jobs

Adyen - Senior Partner Solutions Engineer

Adyen

Mexico City, Mexico (On-Site)
1 Month ago
Perplexity - Legal Operations Lead

Perplexity

San Francisco, California, United States (On-Site)
1 Month ago
Interactive Brokers - Senior Automation Quality Assurance Engineer

Interactive Brokers

Mumbai, Maharashtra, India (Hybrid)
3 Months ago
quience - Head of Planning & Inventory

quience

London, England, United Kingdom (On-Site)
3 Weeks ago
Side - Games Publisher Experts-Consultant

Side

Hyderabad, Telangana, India (On-Site)
3 Weeks ago
bytedance - Regional Head of Solution Architect, Cloud Security

bytedance

Singapore (On-Site)
5 Months ago
Zazz - Cloud Engineer (AWS)

Zazz

(Remote)
6 Months ago
Lambda - Hardware Solutions Engineer

Lambda

San Jose, California, United States (Hybrid)
4 Months ago
Domo - DevOps Engineer - India

Domo

Pune, Maharashtra, India (Hybrid)
2 Weeks ago
Epic Games - Desktop Platform Engineer, Fortnite Tech

Epic Games

Cary, North Carolina, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Patreon - Staff Product Designer, Podcasts & Mobile

Patreon

San Francisco, California, United States (Hybrid)
3 Weeks ago
Vimeo - Principal Researcher

Vimeo

New York, United States (Remote)
1 Month ago
bytedance - Content Operations Manager (BR) - CapCut

bytedance

State Of São Paulo, Brazil (On-Site)
3 Months ago
Philips - Sr Product Sales

Philips

Guangzhou, Guangdong Province, China (On-Site)
2 Months ago
Enphase Energy - Customer Service Manager

Enphase Energy

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Evolution  - In Studio Game Presenter

Evolution

Atlantic City, New Jersey, United States (On-Site)
4 Months ago
USE Insider - Customer Success Manager

USE Insider

Sydney, New South Wales, Australia (Hybrid)
2 Weeks ago
Wolters Kluwer - Major & Strategic Account Executive - Hospital Software Solution Sales

Wolters Kluwer

United States (Remote)
4 Weeks ago
Saronic Technologies - Machine Shop Manager

Saronic Technologies

Austin, Texas, United States (On-Site)
2 Weeks ago
GoTo Group - Corporate & Tech Regulatory Manager

GoTo Group

Jakarta, Indonesia (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in United States

Autodesk - Unified Communications Solution Architect

Autodesk

San Francisco, California, United States (On-Site)
1 Year ago
WebFX - Inbound Sales Specialist

WebFX

Harrisburg, Pennsylvania, United States (On-Site)
9 Months ago
Genies.io - Head of Product

Genies.io

Los Angeles, California, United States (On-Site)
1 Month ago
CGS Carrers - Director of Global Compensation

CGS Carrers

United States (Remote)
1 Month ago
Miro - Account Executive - New Product Specialist, AI Platform

Miro

United States (Remote)
4 Weeks ago
Lambda - Senior Software Engineer - Kernel/Virtualization

Lambda

San Francisco, California, United States (Hybrid)
1 Month ago
Visa - Director – Business Development Executive Corporate Sales

Visa

Ashburn, Virginia, United States (Hybrid)
1 Month ago
Abridge - GTM Recruiter

Abridge

United States (Remote)
1 Month ago
Varonis  - Compensation Project Manager

Varonis

United States (Remote)
1 Month ago
Apple - Program Manager, Trust & Safety (Data)

Apple

Cupertino, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Open Systems Technologies - Lead Frontend Engineer with Full Stack Experience (GraphQL / React / AWS / Java / Event-Driven Architecture)

Open Systems Technologies

Boise, Idaho, United States (Remote)
1 Month ago
Tencent - Tencent Cloud - Senior Cloud Architect (R&D & Solution Design)

Tencent

Singapore (On-Site)
8 Months ago
Rackspace Technology - Cloud Practice Engineer III

Rackspace Technology

Jalisco, Mexico (Remote)
4 Months ago
Supabase - Site Reliability Engineer: Postgres

Supabase

(Remote)
2 Months ago
Apple - SRE Engineer (Site Reliability Engineer)

Apple

Austin, Texas, United States (On-Site)
2 Months ago
Unity - Mobile Automation Engineer

Unity

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
4 Months ago
Visa - Sr. Site Reliability Engineer

Visa

Ashburn, Virginia, United States (Hybrid)
3 Months ago
Zuora - Customer Solution Engineer

Zuora

Redwood City, California, United States (Remote)
3 Weeks ago
Collaborative Robotics - Software Engineer, Build and Deploy

Collaborative Robotics

Santa Clara, California, United States (On-Site)
3 Months ago
Gusto - Staff Software Engineer: GraphQL Platform

Gusto

Denver, Colorado, United States (Remote)
2 Weeks ago

Get notifed when new similar jobs are uploaded