Staff Machine Learning Engineer - Dataset & Training Platform

1 Month ago • 5 Years + • Devops

Job Summary

Job Description

Join Canva as a Staff Machine Learning Engineer to redefine how the world experiences design. This role focuses on architecting foundational AI Platform capabilities, making key technical decisions for model training and deployment. You will lead cross-team initiatives to consolidate and improve dataset and training capabilities, working with various engineering and research teams. Responsibilities include building and maintaining high-performance distributed data processing and training systems, optimizing for cost efficiency and developer experience, and driving technical strategy discussions. You will also mentor other engineers and contribute to growing Canva's AI platform engineering capabilities, solving complex technical challenges across the ML stack.
Must have:
  • 5+ years experience in ML training systems
  • Hands-on distributed training experience
  • Model lifecycle management experience
  • Large-scale data processing experience
  • Designing foundational AI/ML infrastructure
  • Setting technical direction
  • Influencing engineering practices
  • Strong understanding of distributed computing
  • Container orchestration (Kubernetes)
  • Cloud infrastructure (AWS)
  • Fluent in Python
  • Deep knowledge of ML frameworks (PyTorch, TensorFlow)
  • Modern ML tools (W&B, Ray, Anyscale)
  • Understanding of generative AI systems (LLMs, multimodal models)
  • Balancing platform investments with business needs
  • Experience with product teams and research
  • Growing other engineers
  • Contributing to technical culture
  • Infrastructure-as-code experience
  • Performance optimization understanding
Good to have:
  • GitOps principles for automation and deployment
Perks:
  • Equity packages
  • Inclusive parental leave policy
  • Annual Vibe & Thrive allowance
  • Flexible leave options

Job Details

Job Description

Join the team redefining how the world experiences design.

Hey, g'day, mabuhay, kia ora, 你好, hallo, vítejte!
Thanks for stopping by. We know job hunting can be a little time consuming and you're probably keen to find out what's on offer, so we'll get straight to the point.

Where and how you can work

Our flagship campus is in Sydney. We also have a campus in Melbourne and co-working spaces in Brisbane, Perth and Adelaide. But you have choice in where and how you work — we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals.

What you'd be doing in this role

As Canva scales, change continues to be part of our DNA. But we like to think that's all part of the fun. So this will give you the flavour of the type of things you'll be working on when you start, but this will likely evolve.

At the moment, this role is focused on:

  • Architecting foundational AI Platform capabilities, making key technical decisions that impact how models are trained and deployed across Canva.
  • Leading cross-team initiatives to consolidate and improve dataset and training capabilities, working with stakeholders across multiple engineering and research teams.
  • Building and maintain paved roads for high-performance distributed data processing and training systems, optimising for cost efficiency and developer experience.
  • Driving technical strategy discussions, weighing platform stability needs against product velocity requirements.
  • Mentoring other engineers and contribute to growing Canva's AI platform engineering capabilities.
  • Solving complex technical challenges spanning multiple parts of the ML stack.

You're probably a match if:

  • You have over 5 years of experience building and scaling ML training systems, with hands-on experience in distributed training, model lifecycle management, and large-scale data processing.
  • You have a proven track record of designing and implementing foundational AI/ML infrastructure that supports multiple teams and use cases.
  • You have experience setting technical direction, making architectural decisions, and influencing engineering practices across organisations.
  • You possess a strong understanding of distributed computing, container orchestration (Kubernetes), and cloud infrastructure (preferably AWS).
  • You are fluent in Python with deep knowledge of ML frameworks (PyTorch, TensorFlow) and modern tools (W&B, Ray, Anyscale).
  • You have a strong understanding of generative AI systems, including LLMs, multimodal models, and foundation model fine-tuning.
  • You have the ability to balance long-term platform investments with immediate business needs, making pragmatic technical decisions.
  • You have experience working with product teams, research, and other engineering specialties to understand and address diverse technical requirements.
  • You have a track record of growing other engineers and contributing to technical culture and standards.
  • You have experience with infrastructure-as-code and an understanding of performance optimisation.
  • You consider GitOps principles for automation and deployment a plus.

About the team

Canva's GenAI Platform Group is responsible for the delivery of Capabilities and Solutions which support ML and AI initiatives, from early ideation and prototyping, through to scaling to meet the needs of millions of Canva users in production. We empower thousands of engineers and product managers to deliver amazing product features which harness the power of cutting-edge technologies. 

Dataset & Training Platform team specifically focuses on the foundational Capabilities that power model training, dataset management, and AI/ML workflows. We're building the infrastructure that enables Canva's AI-first future, supporting everything from generative design models to intelligent automation systems that serve millions of users worldwide.

What's in it for you?

Achieving our crazy big goals motivates us to work hard — and we do — but you'll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too. We also offer a range of benefits to set you up for every success in and outside of work.

Here’s a taste of what’s on offer:
• Equity packages — we want our success to be yours too
• Inclusive parental leave policy that supports all parents & carers
• An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
• Flexible leave options that empower you to be a force for good, take time to recharge and support you personally

Check out lifeatcanva.com for more info.

Other stuff to know

We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture. When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process.

We celebrate all types of skills and backgrounds at Canva — so even if you don’t feel like your skills quite match what’s listed above — we still want to hear from you!

We see AI as a powerful amplifier of creativity and technology at Canva.We’re evolving how we assess AI skills in our Technology hiring experience - you’ll tackle interactive, real-time challenges that reflect the kind of work we do. In some interviews, you may also be asked to solve a problem using an AI tool to show how you approach challenges with tech by your side. Your recruitment partner will walk you through what to expect.We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture. When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process.We celebrate all types of skills and backgrounds at Canva so even if you don’t feel like your skills quite match what’s listed above - we still want to hear from you!

Please note that interviews are conducted virtually.

 

Similar Jobs

ClearPoint Recruitment - B2B Sales Executive

ClearPoint Recruitment

Sheffield, England, United Kingdom (On-Site)
5 Years ago
Samsung Semiconductor - Staff Engineer, Performance Modeling Architecture

Samsung Semiconductor

San Jose, California, United States (On-Site)
2 Months ago
TriCAT gmbh - CUSTOMER SUCCESS EXPERT m/f/d

TriCAT gmbh

Ulm, Baden-Württemberg, Germany (Hybrid)
2 Weeks ago
Mindtickle - Manager, Tax and Compliance

Mindtickle

Pune, Maharashtra, India (Hybrid)
5 Months ago
Madison Logic - Senior Data Engineer

Madison Logic

Pune, Maharashtra, India (On-Site)
1 Month ago
Apple - Engineering Project Manager, DevOps/SRE

Apple

Cupertino, California, United States (On-Site)
3 Months ago
bytedance - Software Engineer, SRE - Platform Services

bytedance

Seattle, Washington, United States (On-Site)
6 Months ago
Take-Two Interactive - Site Reliability Engineer I

Take-Two Interactive

Bengaluru, Karnataka, India (Hybrid)
2 Months ago
BigID - Site Reliability Engineer

BigID

Buenos Aires, Buenos Aires, Argentina (On-Site)
3 Weeks ago
bytedance - Senior Software Engineer, Multi Cloud CDN - San Jose / Seattle / Boston

bytedance

Seattle, Washington, United States (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Unity - Technical Project Manager

Unity

San Francisco, California, United States (Remote)
2 Months ago
Wind River - Principal Portfolio Product Manager – Platform and Tools

Wind River

Walnut Creek, California, United States (Remote)
2 Weeks ago
Crowd Strick - Senior Engineering Manager

Crowd Strick

Romania (Remote)
3 Weeks ago
HoYoverse - CRM Lifecycle Manager

HoYoverse

Singapore (On-Site)
5 Months ago
Axi - Senior Staff Engineer

Axi

Bengaluru, Karnataka, India (On-Site)
2 Weeks ago
GoMotive - Underwriting Manager, Risk Operations Management

GoMotive

United States (Remote)
5 Months ago
Rippling - Software Engineer II - Travel Products

Rippling

Bengaluru, Karnataka, India (On-Site)
6 Months ago
Epic Games - Senior Engine Programmer, Framework

Epic Games

Cary, North Carolina, United States (On-Site)
7 Months ago
deel. - Back-End Engineer - Infrastructure Team

deel.

Brazil (Remote)
2 Weeks ago
Alphawave Semi - Digital Marketing Program Specialist

Alphawave Semi

San Jose, California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Brisbane, Queensland, Australia

Canva - Senior Frontend Engineer - Organising Content

Canva

Brisbane, Queensland, Australia (Remote)
4 Months ago
Diligent Corporation - Advisory Specialist II

Diligent Corporation

Sydney, New South Wales, Australia (Hybrid)
2 Weeks ago
Telastra - Part Time Customer Service & Sales Consultant Mackay

Telastra

Mackay, Queensland, Australia (On-Site)
1 Month ago
Cubic corporation - Technician Field Services

Cubic corporation

Murarrie, Queensland, Australia (On-Site)
2 Months ago
Arkose Labs - Data Analyst (Weekend Shift)

Arkose Labs

Brisbane, Queensland, Australia (Hybrid)
2 Weeks ago
Rippling - Implementation Manager

Rippling

Sydney, New South Wales, Australia (Hybrid)
7 Months ago
Clearwater Analytics - Solutions Consultant

Clearwater Analytics

Sydney, New South Wales, Australia (On-Site)
2 Months ago
Big Ant Studios - Senior UI Programmer

Big Ant Studios

Melbourne, Victoria, Australia (On-Site)
9 Months ago
Canva - Engineering Manager (Backend) - Video AI

Canva

Sydney, New South Wales, Australia (Remote)
3 Weeks ago
deel. - Payroll Implementation Specialist

deel.

Australia (Remote)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Wooga - Site Reliability Engineer - Backend

Wooga

Berlin, Berlin, Germany (Hybrid)
1 Month ago
Playtika - Site Reliability Engineer

Playtika

Vinnytsia, Vinnytsia Oblast, Ukraine (On-Site)
3 Weeks ago
Next Level Business Services - Senior Java, Cloud Foundry Developer (Full Time)

Next Level Business Services

Herndon, Virginia, United States (On-Site)
9 Months ago
bytedance - Software Engineer in ML Engineering Platform

bytedance

Seattle, Washington, United States (On-Site)
9 Months ago
Shield AI - Sales Solution Engineer, Europe (R3661)

Shield AI

Oslo, Oslo, Norway (On-Site)
2 Weeks ago
Devoteam - Distributed Cloud | AWS DevOps Engineer

Devoteam

Lisbon, Lisbon, Portugal (Remote)
9 Months ago
Assystems - Automation Engineer

Assystems

Bois-Colombes, Île-de-France, France (On-Site)
1 Month ago
Adyen - Solutions Engineer

Adyen

Warsaw, Masovian Voivodeship, Poland (On-Site)
2 Months ago
Britive - Senior Cloud Solutions Engineer

Britive

United States (Remote)
3 Weeks ago
bytedance - Site Reliability Engineer, Traffic Platform

bytedance

Seattle, Washington, United States (On-Site)
9 Months ago

Get notifed when new similar jobs are uploaded