Staff Platform Engineer, MLOps

2 Months ago • 7-7 Years • Devops • $130,000 PA - $160,000 PA

Job Summary

Job Description

As a Staff Platform Engineer (MLOps), you'll design, deploy, and maintain cloud infrastructure for Inworld's AI Engine and Studio. Responsibilities include optimizing the ML model lifecycle using the Inworld AI platform and Nvidia CUDA, implementing CI/CD systems for ML workflows, monitoring models, designing MLOps tools, and facilitating a 'you build it, you run it' culture. You will manage CI/CD pipelines, identify opportunities to enhance engineering speed, conduct root cause analysis, and develop best practices for automation. The role requires expertise in Kubernetes, Terraform/Terragrunt, and at least one major cloud provider.
Must have:
  • 7+ years software engineering experience
  • 5+ years Infrastructure-as-code experience
  • Kubernetes & Helm/Kustomize proficiency
  • CI/CD pipeline creation & maintenance
  • Cloud provider expertise (GCP, Azure, Oracle)
  • Golang, Python, or Bash proficiency
Good to have:
  • Open-source LLM & serving solution familiarity
  • SLURM experience
  • Data pipeline & workflow management tools experience
  • Bare metal GPU experience
Perks:
  • Equity
  • Benefits

Job Details

view open roles

Why Join Inworld

Inworld is the leading provider of AI technology for real-time interactive experiences, with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft’s M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.

Inworld provides the market’s best framework for building production ready interactive experiences, coupled with dedicated services to optimize specific stages of development – from design and development, to ML pipeline optimization and custom compute infrastructure. We help developers bring their AI engines in-house with a framework optimized for real-time data ingestion, low latency, and massive scale. Inworld powers experiences built by Ubisoft, NVIDIA, Niantic, NetEase Games and LG, among others, and has partnerships with key industry players such as Microsoft Xbox, Epic Games, and Unity. 

Inworld was recognized by CB Insights as one of the 100 most promising AI companies in the world in 2024 and was named among LinkedIn's Top Startups of 2024 in the USA.

About the Role:

As a Staff Platform Engineer (MLOps), you'll work closely with backend and ML Engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI Engine and Studio. 

 

What you'll do:

  • Develop, manage, and optimize the ML model lifecycle in production using the Inworld AI platform and Nvidia CUDA, implementing CI/CD systems for ML workflows, monitoring models to identify issues and inefficiencies, and designing MLOps tools and frameworks to enhance automation and efficiency.
  • Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services.
  • Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment.
  • Identify and implement opportunities to enhance engineering speed and efficiency.
  • Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence.
  • Develop and share best practices to improve automation and efficiency across our engineering teams.

 

Expected experience:

  • 7 years of experience in software engineering.
  • 5 years of experience with infrastructure-as-code.
  • Proficiency in managing Kubernetes clusters and applications, including creating Helm charts/Kustomize manifests for new applications.
  • Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.).
  • Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud).
  • Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash.
  • Familiarity with open source LLM and open source serving solution (e.g. vLLM or llama.cpp, kserve, etc) is a plus.
  • Experience with SLURM
  • Experience with data pipeline and workflow management tools
  • Experience with bare metal GPUs (optional).

 

The base salary range for this full-time position is CAD $170,000 - $220,000. In addition to base pay, total compensation includes equity and benefits. Within the range, individual pay is determined by work location, level, and additional factors, including competencies, experience, and business needs. The base pay range is subject to change and may be modified in the future.

Inworld Jobs Privacy

Similar Jobs

Ion - Senior Technical Consultant: Openlink: Systems Focus

Ion

Noida, Uttar Pradesh, India (On-Site)
10 Months ago
Ion - Technical Consultant - Endur

Ion

New York, New York, United States (On-Site)
8 Months ago
Rackspace Technology - Storage Engineer I

Rackspace Technology

India (Remote)
2 Months ago
PwC - Oracle Cloud  ERP Senior Technical Consultant

PwC

Makati City, Metro Manila, Philippines (On-Site)
9 Months ago
Next Level Business Services - SAP PO Consultant

Next Level Business Services

Santa Clara, California, United States (On-Site)
8 Months ago
2K - Build Systems Engineer

2K

Austin, Texas, United States (On-Site)
1 Week ago
Bosch Group - Automation Engineer

Bosch Group

Braga, Braga, Portugal (On-Site)
2 Weeks ago
Google - Software Engineer III, Infrastructure, Google Cloud Platforms

Google

Kirkland, Washington, United States (On-Site)
7 Months ago
Ion - Site Reliability Engineer

Ion

Collecchio, Emilia-Romagna, Italy (Hybrid)
8 Months ago
Codeway Studios - DevOps Engineer (Mid/Sr)

Codeway Studios

İstanbul, Türkiye (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Synechron - Senior Business Analyst

Synechron

Pune, Maharashtra, India (On-Site)
1 Month ago
Thales - Fullstack Developer Java & Angular (French speaker)

Thales

Bucharest, Bucharest, Romania (Hybrid)
1 Month ago
oni - Materials Coordinator

oni

Oxford, England, United Kingdom (On-Site)
1 Month ago
WebTech Corporation - Warehouse Manager

WebTech Corporation

Fountaindale, New South Wales, Australia (On-Site)
3 Weeks ago
T systems - SAP Basis

T systems

Pune, Maharashtra, India (On-Site)
2 Weeks ago
PwC - Manager Conseil en gestion des risques IT | CDI | H/F

PwC

Neuilly-sur-Seine, Île-de-France, France (On-Site)
9 Months ago
Next Level Business Services - Java/J2EE Developer

Next Level Business Services

Tampa, Florida, United States (On-Site)
8 Months ago
Aeries technology - Senior Software Engineer

Aeries technology

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Cubic corporation - Security Operations Network Analyst

Cubic corporation

Hyderabad, Telangana, India (On-Site)
2 Weeks ago
Nice - Technical Support Engineer

Nice

Southampton, England, United Kingdom (On-Site)
1 Day ago

Get notifed when new similar jobs are uploaded

Jobs in Vancouver, British Columbia, Canada

Dentsu - Insights Analyst

Dentsu

Toronto, Ontario, Canada (Remote)
3 Weeks ago
Lumeto - Solutions Delivery Lead (Product Owner)

Lumeto

Toronto, Ontario, Canada (Remote)
2 Months ago
Blazesoft - WordPress Developer

Blazesoft

Vaughan, Ontario, Canada (On-Site)
1 Month ago
eBay - Marketing Manager, Microsoft Ads

eBay

Toronto, Ontario, Canada (On-Site)
1 Month ago
Intelerad - System Support Specialist I - Linux / PACS medical imaging

Intelerad

Canada (Remote)
1 Month ago
Epic Games - Senior AI Programmer, Fortnite

Epic Games

Montreal, Quebec, Canada (On-Site)
2 Months ago
bounteous - Senior Project Manager

bounteous

Canada (Remote)
1 Month ago
Bally's Interactive - Lead IOS Software Developer

Bally's Interactive

Toronto, Ontario, Canada (On-Site)
3 Weeks ago
Airlab Inc  - Senior Producer (Game Industry)

Airlab Inc

Quebec, Canada (On-Site)
3 Months ago
NetEase Games - Finance Director (SSC)-Canada

NetEase Games

Montreal, Quebec, Canada (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

bytedance - Site Reliability Engineer, Traffic Platform

bytedance

Singapore (On-Site)
5 Months ago
Ion - Software Architect - Java Multi-Tenant SAAS Cloud Native

Ion

Pune, Maharashtra, India (On-Site)
8 Months ago
Zscaler - Principal Platform Engineer (Tooling)

Zscaler

United States (Remote)
3 Weeks ago
PwC - SAP Lead Solution/ Enterprise Architect - NCR region

PwC

Bengaluru, Karnataka, India (On-Site)
9 Months ago
Volks Byte - DevOps Engineer

Volks Byte

Dhaka, Dhaka Division, Bangladesh (Remote)
3 Weeks ago
Expedia - Senior SRE Manager, iCloud

Expedia

Seattle, Washington, United States (On-Site)
2 Weeks ago
UXBERT Labs - Senior Solution Architect (IoT/Bluetooth Integration)

UXBERT Labs

Riyadh, Riyadh Province, Saudi Arabia (Hybrid)
6 Months ago
Canonical - Senior Site Reliability / Gitops Engineer

Canonical

(Remote)
1 Month ago
extreme network - PRINCIPAL SW SYSTEMS ENGINEER 9850- CloudOps/DevOps- Linux-Kubernetes-AWS/Azure

extreme network

Bengaluru, Karnataka, India (Hybrid)
3 Days ago

Get notifed when new similar jobs are uploaded