Senior Site Reliability Engineer

Octopus

| Remote | Full Time | 1 months ago

Apply Now

Job Summary

Octopus Deploy is seeking a Senior Site Reliability Engineer for their Builds team to enhance the reliability and availability of their build systems. The role involves sharing SRE expertise, improving existing reliability practices, leading new capability implementations, and reducing toil through automation. Candidates should be strong systems engineers, comfortable with postmortems, and committed to a 'you build it, you run it' culture within a high-trust, remote-first environment.

Must Have

Share SRE expertise with teams across the company.
Keep build systems running with high reliability and availability.
Improve and iterate on existing reliability practices.
Bring fresh ideas and practices to increase reliability and reduce toil.
Lead the implementation of new capabilities.
Collaborate effectively across wide organisational distances to solve problems.
Thrive in an environment focused on availability, reliability, and observability.
Be a strong systems engineer.
Comfortable leading postmortems and designing deployment and monitoring pipelines.
Care deeply about automation across builds, tests, deployments, infrastructure, and operational tasks.
Embrace a “you build it, you run it” culture.
Self-motivated, work independently with high-quality output.
Results-oriented, adapt quickly when business direction changes.
Welcome candid feedback, enjoy solving complex problems, and like helping other engineers succeed.

Good to Have

Experience with the C# application SDLC (e.g. building, testing).
Familiarity with TeamCity as a primary build system.
Experience with GitHub Actions for internal tools.
Knowledge of Octopus Deploy for continuous delivery.
Experience with Azure (App Services, AKS clusters, Azure Functions).
Familiarity with Docker Hub and Artifactory as container registries.
Experience with Terraform as a primary IaC tool.
Knowledge of OpenTelemetry processing systems.
Experience with Sumo Logic and Honeycomb for analysis and troubleshooting.

Perks & Benefits

Minimum of 25 days annual leave
Up to 10 days of paid sick and carers leave
12 weeks of fully paid parental leave with flexible return options
Stock options

Job Description

The Builds team at Octopus Deploy is looking for a Senior Site Reliability Engineer (SRE) to:

Share SRE expertise with teams across the company.
Keep our build systems running with high reliability and availability.
Improve and iterate on our existing reliability practices.
Bring fresh ideas and practices to increase reliability and reduce toil.
Lead the implementation of new capabilities.

You’ll be a great fit if you:

Naturally work in line with our Senior SRE expectations.
You collaborate effectively, even across wide organisational distances, to solve problems, combining passion, pragmatism, and empathy.
Thrive in an environment focused on availability, reliability, and observability.
Are a strong systems engineer and may have deeper expertise in particular domains.
See value in applying safety culture lessons from other industries to software and operations.
Are comfortable leading postmortems and designing deployment and monitoring pipelines.
Care deeply about automation across builds, tests, deployments, infrastructure, and operational tasks.
Embrace a “you build it, you run it” culture, with a strong commitment to quality and system availability, and are happy to participate in a humane on-call program.
Are self-motivated, work independently with high-quality output, and proactively seek help or new work when needed.
Are results-oriented, adapt quickly when business direction changes, and encourage the same in others.
Welcome candid feedback, enjoy solving complex problems, and like helping other engineers succeed while working on genuinely valuable projects.

Our tech stack

You don’t need to know all of this – it’s here to give you a feel for our environment.

Octopus Server

Our primary focus and flagship product.
Written in .NET and backed by a SQL database.
Experience with the C# application SDLC (e.g. building, testing) is highly regarded.

CI/CD

TeamCity is our primary build system for Octopus Server.
GitHub Actions is used for some internal tools.
Continuous delivery is powered by Octopus Deploy.

Workloads

A mix of internally developed applications and third-party software (e.g. TeamCity).
Run in Azure using App Services, AKS clusters, and Azure Functions.
Container workloads run on AKS, with Docker Hub and Artifactory as container registries.

Infrastructure as Code (IaC)

Terraform is our primary IaC tool.
IaC workloads run mostly in Octopus Deploy, with some running via GitHub Actions.

Observability

Our team operates a multiregion OpenTelemetry processing system for the rest of R&D.
We’ve adopted OpenTelemetry across many of our Builds systems.
We help other teams adopt OpenTelemetry for more use cases company-wide.
We use Sumo Logic and Honeycomb for analysis and troubleshooting.

A typical day might include:

Building new capabilities to increase reliability (we don’t want you staring at dashboards all day).
Working where you do your best work – from your home office, with your preferred setup, tools, and soundtrack.
Consulting with another team on how to operate their services at the right level of reliability, or how best to use our build and observability platforms.
Pairing with another engineer over Zoom to solve a complex technical problem or explore the problem space for future improvements.
Responding to an actionable alert and working to maintain the reliability of the platform used across the company.
Improving documentation so engineers can discover solutions themselves and reduce lead time.
Writing a blog post or preparing a talk to share something interesting you’ve learned with other engineers.
Facilitating an incident review and turning the learnings into practical changes.
Proactively reducing toil by building thoughtful automation.

Compensation:

Octopus has an internally open and transparent system for compensation.

Any Octonaut can view the compensation for any role at any level. This ensures people doing the same work with the same skill get paid the same.

The compensation for this role is:

Level 3 - Senior Site Reliability Engineer

Maturing: $145k AUD / $155k NZD, Performing: $165k AUD / $175k NZD

Salaries exclude Super and Kiwi Saver.

Benefits include a minimum of 25 days annual leave, up to 10 days of paid sick and carers leave, 12 weeks of fully paid parental leave with flexible return options, and stock options. Learn more.

Below is the interview process you can expect for this role. We know interviewing can seem daunting, but rest assured we designed our interview process to move quickly while still getting you all the information you need.

👋🏼Initial Chat

[30 min] Meet with a Talent Acquisition team member, and get a feel for what it would be like to be an Octonaut!

💻Engineering Problem Presentation

[60 min] You'll be given instructions to prepare a presentation which you'll present to two members of the team (15-20 minutes) before being asked some questions.

🧑‍💻Hiring Manager chat

[30 min] A final call to answer any last questions of yours and ours.

We are looking for people who live and work in Australia and New Zealand to join our remote-first team. We are unable to provide visa sponsorship.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

13 Skills Required For This Role

Problem Solving Github Talent Acquisition Game Texts Software Development Lifecycle Sdlc C# Azure Terraform Teamcity Ci Cd Docker Sql Github Actions

Similar Jobs

Devops

Product Software Engineer - Devops Tools Automation Engineer

Wolters Kluwer • Waltham, Massachusetts, United States (Hybrid)