Staff MLOps Engineer - Canada

Inworld AI

7+ Years | Vancouver, British Columbia, Canada (On Site) | Full Time | 2 months ago

Apply Now

Job Summary

As a Staff MLOps Engineer at Inworld, you will be instrumental in designing, building, and scaling the infrastructure for intelligent AI agents in real-time, immersive applications. This role involves streamlining the ML model lifecycle from training to deployment, implementing robust pipelines, and collaborating with ML and backend teams to ensure scalable and secure infrastructure. You will also manage CI/CD, enhance engineering efficiency, and provide technical leadership in MLOps best practices.

Must Have

Build and scale MLOps systems to streamline the end-to-end ML model lifecycle.
Design and implement robust model training, evaluation, and release pipelines.
Collaborate cross-functionally with ML and backend teams to design, deploy, and maintain scalable secure infrastructure.
Facilitate a "you build it, you run it" culture by providing tools and processes for monitoring reliability, availability, and performance.
Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment.
Identify and implement opportunities to enhance engineering speed and efficiency.
Provide technical leadership in ML engineering best practices and mentor junior engineers in MLOps principles.
7+ years of software engineering experience, with 5+ years of infrastructure-as-code.
Proficiency in managing Kubernetes clusters and applications, including creating Helm charts/Kustomize manifests.
Experience in creating and maintaining CI/CD pipelines for applications and infrastructure deployments (e.g., Terraform, ArgoCD, GitHub Actions, Ansible).
Deep knowledge of at least one major cloud provider (GCP, Azure, Oracle Cloud).
Proficient in at least one backend programming/scripting language such as Golang, Python, and Bash.
Knowledge of SLURM or similar job schedulers for distributed training.
Experience with data pipeline and workflow management tools.

Good to Have

Familiarity with open source LLM and open source serving solutions (e.g., vLLM, llama.cpp, kserve).
Experience with bare metal GPUs.

Perks & Benefits

Equity
Benefits

Job Description

About Inworld

At Inworld, we believe that the benefits of AI should extend beyond business workflows to the applications and experiences that we enjoy every day. We began by pushing the frontier of lifelike, interactive characters for games and entertainment, pioneering realtime conversational AI at scale. Today, we apply that expertise to provide the multimodal models, pipelines and tools needed to build and evolve consumer-scale, real-time conversational AI applications across learning, health, social, assistants, games and media.

We’ve raised more than $125M from Lightspeed, Section 32, Kleiner Perkins, Microsoft’s M12 venture fund, Founders Fund, Meta and Stanford, among others. Our technology has powered experiences from companies such as NVIDIA, Microsoft Xbox, Niantic, Logitech Streamlabs, Wishroll, Little Umbrella and Bible Chat. We’ve also been recognized by CB Insights as one of the 100 most promising AI companies globally and have been named one of LinkedIn's Top 10 Startups in the USA.

About the role

At Inworld, we’re building the AI framework behind the next generation of real-time, immersive applications. As a Staff MLOps Engineer, you’ll design, build and scale the infrastructure that powers intelligent AI agents across massive consumer experiences while ensuring performance, reliability, and speed at every level.

What you’ll do

Build and scale MLOps systems to streamline the end-to-end ML model lifecycle on the Inworld AI platform, from training to deployment.
Design and implement robust model training, evaluation, and release pipelines.
Collaborate cross-functionally with ML and backend teams to design, deploy, and maintain scalable secure infrastructure for Inworld’s AI Engine and Studio.
Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services.
Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment.
Identify and implement opportunities to enhance engineering speed and efficiency.
Provide technical leadership in ML engineering best practices, raise the technical bar, and mentor junior engineers in MLOps principles.

Expected experience

7+ years of software engineering experience, with 5+ years of infrastructure-as-code
Proficiency in managing Kubernetes clusters and applications, including creating Helm charts/Kustomize manifests for new applications.
Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.).
Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud).
Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash.
Knowledge of SLURM or similar job schedulers for distributed training.
Experience with data pipeline and workflow management tools
Familiarity with open source LLM and open source serving solution (e.g. vLLM or llama.cpp, kserve, etc) is a plus.
Experience with bare metal GPUs (optional).
Desire to work at a fast-growing Series A startup, comfortable with uncertainty, owning and scaling new products, and embracing an experimental and iterative development process.

The base salary range for this full-time position is CAD $150,000 - $240,000. In addition to base pay, total compensation includes equity and benefits. Within the range, individual pay is determined by work location, level, and additional factors, including competencies, experience, and business needs. The base pay range is subject to change and may be modified in the future.

16 Skills Required For This Role

Oracle Cad Computer Aided Design Github Cpp Game Texts Azure Ansible Terraform Helm Google Cloud Platform Microsoft Azure Ci Cd Kubernetes Python Github Actions Bash