Senior Site Reliability Engineer

2 Months ago • All levels • $115,000 PA - $145,000 PA

Devops

Job Description

Octopus Deploy is seeking a Senior Site Reliability Engineer (SRE) to maintain high system reliability. The role involves improving existing reliability practices, introducing new ideas to reduce toil, spearheading new capabilities, and sharing SRE expertise. The company is remote-first, founded in Australia, and focuses on transparency, continuous delivery, and a balanced work environment with consistent growth. The SRE will work with .NET, SQL databases, Azure, Kubernetes, Terraform, and OpenTelemetry. Responsibilities include building new reliability features, assisting internal teams with technical challenges, responding to alerts, improving documentation, and reducing toil through automation. The role emphasizes a "you built it, you run it" culture with a humane on-call program.

Good To Have:

Specialized expertise in specific areas
Apply safety culture lessons
Design deployment and monitoring pipelines
Working knowledge of .NET and SQL
Experience with Azure, AKS, AppServices
Familiarity with Terraform
Experience with OpenTelemetry, SumoLogic, Honeycomb
Experience with CI/CD tools (TeamCity, GitHub Actions, Octopus Deploy)

Must Have:

Maintain high system reliability
Improve reliability practices
Reduce toil through new ideas
Spearhead new capabilities
Share SRE expertise
Excel in availability, reliability, observability
Skilled in systems engineering
Lead postmortems
Automate builds, tests, deployments
Embrace "you built it, you run it"
Self-motivated, work independently
Collaborate effectively
Results-oriented, adaptive
Thrive on feedback

Perks:

Minimum 25 days annual leave
Up to 10 days paid sick and carers leave
12 weeks fully paid parental leave
Stock options
Remote-first work environment
Transparent compensation system
Public handbook for transparency
Supportive, collaborative, high-trust environment

Add these skills to join the top 1% applicants for this job

excel

github

talent-acquisition

agile-development

linux

azure

terraform

teamcity

ci-cd

sql

github-actions

Octopus Deploy sets the standard for Continuous Delivery, empowering software teams to deliver value in an agile way. Over 4,000 organizations globally – including Ubisoft, ASOS, Xero, Stack Overflow, NASA, and Disney – rely on our Continuous Delivery, GitOps, and release orchestration solutions.

Founded in Australia in 2012, our team of over 300 Octonauts now spans the globe. We combine high growth and big ambitions with a sustainable, balanced working environment. Our revenue has grown consistently between 30–50% every year for the past 8 years, and we’ve been profitable for 10 out of the past 11 years.

We’ve been remote-first since 2015 and work with an uncommon level of transparency. You can read our public handbook to learn how we work. We have a transparent approach to compensation that ensures people doing the same work with the same skill get paid the same, with well-defined career pathways. We foster a supportive, collaborative, and high-trust environment. We leave our job titles at the door and focus on doing what’s best for our customers and team. Our leaders never shy away from answering the tough questions at our all-hands calls or in 1:1s. We conduct interviews and onboarding virtually as part of being a remote-first company.

This remote-first role requires full working rights and residency in Australia or New Zealand.

Octopus Deploy is looking for a Senior Site Reliability Engineer (SRE) who can:

Use their SRE skills to keep systems running with high reliability.
Help improve and iterate our existing reliability practices.
Bring new ideas/practices to increase reliability and reduce toil.
Spearhead implementation of new capabilities.
Share SRE expertise with other teams in the company.

You will be a great fit for this role if:

The way of working outlined here (https://github.com/OctopusDeploy/People/tree/main/Engineering/Site-Reliability-Engineering) is your natural way of getting things done.
You excel in an environment focused on availability, reliability, and observability.
You are skilled in systems engineering and may have specialized expertise in specific areas.
You find value in applying safety culture lessons from other industries to your work.
You are adept at leading postmortems and designing deployment and monitoring pipelines.
You have a passion for automating builds, tests, deployments, infrastructure, and operational tasks.
You embrace a "you built it, you run it" culture, with a commitment to quality and system availability, participating in a humane on-call program.
You are self-motivated, work independently with high-quality output, and seek help or new tasks when needed.
You collaborate effectively to solve problems, combining passion, pragmatism, and empathy.
You are results-oriented, adaptive to business direction changes, and encourage the same approach in others.
You thrive on candid feedback, solving complex problems, and helping fellow engineers succeed while working on valuable projects.

Our Tech Stack:

Please note - this is to give you an idea of our tools, we don't expect expertise in everything.

Octopus Server:

Our primary focus and flagship product.
Written in .NET and uses SQL database.

CI/CD:

TeamCity is our build system for Octopus Server.
Github Actions are used for some internal tools.
CD - Octopus Deploy.

Workloads:

A mix of internally developed applications and 3rd Party Software (e.g. TeamCity).
Run in Azure with a mix of AppServices, AKS Clusters, and Azure Functions.
We use Linux containers mostly with a few Windows containers.
Container workloads are run on AKS.
Dockerhub and Artifactory container registries.

Infrastructure as Code(IaC):

We use Terraform as our primary IaC tool.
IaC workloads run in Octopus Deploy, with a few running as github actions.

Observability:

We have adopted OpenTelemetry for a lot of our Builds systems.
We are adopting OpenTelemetry for more use cases company-wide, delivering a full telemetry pipeline.
SumoLogic and Honeycomb for analysis.

A typical day might include:

Working on building new capabilities to increase reliability (we don’t want you staring at monitoring dashboards all day).
Working where you work best, in a home office designed by you, using a device of your choosing, with or without music, in an atmosphere you create for yourself.
Handling a request from an internal team, helping solve a challenging build, test or packaging issue, or offering advice to an engineer to help them fall into the pit of success.
Pairing with another engineer on a Zoom call to solve a complex technical problem or explore and define the problem space for future innovation.
Responding to an actionable alert and working to maintain the reliability of the platform used across the company.
Improving our documentation to help engineers discover solutions for themselves and reduce lead time.
Writing a blog post about something interesting for other engineers or preparing a presentation on what was learned from a recent incident.
Facilitating an incident review or preparing a presentation on what was learned.
Proactively reducing future toil by building automation.

Compensation:

Octopus has an internally open and transparent system for compensation. Any Octonaut can view the compensation for any role at any level. This ensures people doing the same work with the same skill get paid the same.

The compensation for this role is:

Level 2 - Site Reliability Engineer

Maturing: $115k AUD / $125k NZD, Performing: $135k AUD / $145k NZD

Salaries exclude Super and Kiwi Saver.

Benefits include a minimum of 25 days annual leave, up to 10 days of paid sick and carers leave, 12 weeks of fully paid parental leave with flexible return options, and stock options. Learn more.

Below is the interview process you can expect for this role. We know interviewing can seem daunting, but rest assured we designed our interview process to move quickly while still getting you all the information you need.

👋🏼Initial Chat

[30 min] Meet with a Talent Acquisition team member, and get a feel for what it would be like to be an Octonaut!

💻Engineering Problem Presentation

[60 min] You'll be given instructions to prepare a presentation which you'll present to two members of the team (15-20 minutes) before being asked some questions.

🧑‍💻Hiring Manager chat

[30 min] A final call to answer any last questions of yours and ours.

Our public employee handbook is the best place to learn more about life at Octopus. It includes our values, how we structure teams, career progression, leave and benefits, and much more.

If you're enthusiastic about this position, even if you don’t meet all the criteria above, we wholeheartedly encourage you to submit your application. Our talent team is in-house, and we recognize that every individual brings something unique. We take the time to review every application and consider how you might add to the team.

We know your time is precious. If you apply, we promise to update you at least once per week about the status of your application and to give you clear expectations for each step in the journey.

Set alerts for more jobs like Senior Site Reliability Engineer

Set alerts for new jobs by Octopus

Set alerts for new Devops jobs in Australia

Set alerts for new jobs in Australia

Set alerts for Devops (Remote) jobs