Senior Software Engineer - Reliability Foundations (open to remote across ANZ)

19 Hours ago • All levels
Software Development & Engineering

Job Description

Join the Reliability Foundations team to redefine design experiences. This role involves writing production-grade software in Python, Go, or Java to solve reliability challenges, ensuring best practices are rolled out across the organization. You will lead deep-dive investigations into high-severity production incidents, foster a reliability-first culture, and design scalable backend systems. The ideal candidate is a software engineer with deep experience in large-scale, distributed systems, proficient in observability tooling, and passionate about code quality and system design.
Good To Have:
  • Experience in Java 13.
  • Experience working with microservice architectures in large containerised, distributed cloud environments (ideally AWS).
  • Experience working with data warehouse, analytics, and reporting tools such as Snowflake, Mode Analytics, and Looker.
Must Have:
  • Write production-grade software in Python, Go, or Java to solve reliability challenges.
  • Work with product engineering teams to ensure reliability best practices and tools are rolled out.
  • Lead deep-dive investigations into high-severity production incidents and prevent recurrence.
  • Foster a culture within Engineering that prioritizes reliability and establishes processes.
  • Design and build scalable backend systems, libraries, and frameworks to improve reliability.
  • Shape Canva’s reliability roadmap by identifying gaps and leading implementation.
  • Deep experience writing clean, maintainable, production-grade code in Python, Java, or Go.
  • Built and maintained large-scale, distributed systems, ideally user-facing apps with millions of users.
  • Understand and enjoy tackling performance, scalability, and resilience challenges across the stack.
  • Experience with guiding others in the principles of incident review, investigation, and remedial activity.
  • Proficiency with observability tooling (logs, metrics, traces) and strong instincts for diagnosing issues in live systems.
  • Enjoy collaborating and coordinating changes across multiple service teams.
  • Care deeply about code quality, system design, and engineering excellence.
Perks:
  • Equity packages
  • Inclusive parental leave policy
  • Annual Vibe & Thrive allowance to support wellbeing, social connection, office setup
  • Flexible leave options

Add these skills to join the top 1% applicants for this job

problem-solving
talent-acquisition
game-texts
aws
looker
python
java
system-design

Join the team redefining how the world experiences design.

Hey, g'day, mabuhay, kia ora, 你好, hallo, vítejte!

Thanks for stopping by. We know job hunting can be a little time consuming and you're probably keen to find out what's on offer, so we'll get straight to the point.

Where and how you can work

Our flagship campus is in Sydney. We also have a campus in Melbourne and co-working spaces in Brisbane, Perth and Adelaide. But you have choice in where and how you work, we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals.

What you’d be doing in this role

As scales change continues to be part of our DNA. But we like to think that's all part of the fun. So this will give you the flavour of the type of things you'll be working on when you start, but this will likely evolve.

At the moment, this role is focused on:

  • Writing production-grade software in Python, Go, or Java - your primary focus will be solving reliability challenges through code.
  • Working with product engineering teams to ensure reliability best practices and tools are rolled out in every service across the whole organisation. It’s not enough to create a new throttling library; we want to make sure it’s successfully used in every service.
  • Leading deep-dive investigations into high-severity production incidents and writing code to prevent recurrence at scale.
  • Fostering a culture within Engineering that puts reliability first and establishes processes and policies that drive reliability within product engineering teams. This includes things like SLAs, error budgets, on-call response, incident resolution, and observability best practices.
  • Designing and building scalable backend systems, libraries and frameworks to improve the reliability of the product architecture
  • Shaping the reliability roadmap by identifying gaps, proposing new approaches, and leading implementation end-to-end.

You're probably a match if

  • You’re a software engineer first - with deep experience writing clean, maintainable, production-grade code in Python, Java, or Go.
  • You’ve built and maintained large-scale, distributed systems - ideally user-facing apps with millions of users.
  • You understand and enjoy tackling performance, scalability, and resilience challenges across the stack (infra, backend, data, and even client code).
  • You have experience with guiding others in the principles of incident review, investigation and remedial activity.
  • You know your way around observability tooling (logs, metrics, traces) and have strong instincts around diagnosing and debugging issues in live systems.
  • You enjoy collaborating - as a Senior Reliability Engineer, you will need to share the knowledge, communicate and coordinate changes across multiple service teams.
  • You care deeply about code quality, system design, and engineering excellence - you write tests, own your changes, and value readability and review.

Nice to have's:

  • Our services and libraries are primarily written in Java 13, so experience in Java is a nice-to-have.
  • Experience working with microservice architectures in large containerised, distributed cloud environments (ideally AWS). We’re hosted on AWS and leverage the tools they provide as much as possible
  • Experience working with data warehouse, analytics and reporting tools such as Snowflake, Mode Analytics and Looker.

About the Group

The Reliability Platform Group is responsible for providing the tools and processes to scale reliability across all services. Our teams work together, and with other groups, to deliver preventive and detective tooling, processes and best practices that uplift reliability. We do this by driving operational excellence, reducing the impact of incidents, and providing visibility and accountability across the broader Engineering community.

This role sits within the Reliability Foundations team, whose focus is on providing tools and guidance for engineering teams to measure and maintain their systems’ reliability. Their key areas of practice include on-call management, service-level management, production readiness and operational review.

What's in it for you?

Achieving our crazy big goals motivates us to work hard - and we do - but you'll experience lots of moments of magic, connectivity and fun woven throughout life at, too. We also offer a range of benefits to set you up for every success in and outside of work.

Here's a taste of what's on offer:

  • Equity packages - we want our success to be yours too
  • Inclusive parental leave policy that supports all parents & carers
  • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
  • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Check out lifeatcanva.com for more info.

Other stuff to know

We see AI as a powerful amplifier of creativity and technology at. We’re evolving how we assess AI skills in our Technology hiring experience - you’ll tackle interactive, real-time challenges that reflect the kind of work we do. In some interviews, you may also be asked to solve a problem using an AI tool to show how you approach challenges with tech by your side. Your recruitment partner will walk you through what to expect. We make hiring decisions based on your experience, skills and passion, as well as how you can enhance and our culture.

When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process. We celebrate all types of skills and backgrounds at, so even if you don’t feel like your skills quite match what’s listed above - we still want to hear from you!

Please note that interviews are conducted virtually.

Set alerts for more jobs like Senior Software Engineer - Reliability Foundations (open to remote across ANZ)
Set alerts for new jobs by Canva
Set alerts for new Software Development & Engineering jobs in New Zealand
Set alerts for new jobs in New Zealand
Set alerts for Software Development & Engineering (Remote) jobs
Contact Us
hello@outscal.com
Made in INDIA 💛💙