Staff MLOps Engineer

17 Minutes ago • 7 Years + • $150,000 PA - $240,000 PA

Research Development

Job Description

Inworld is building an AI framework for next-generation real-time, immersive applications. As a Staff MLOps Engineer, you will design, build, and scale infrastructure for intelligent AI agents, ensuring performance, reliability, and speed. This role involves streamlining the ML model lifecycle, implementing robust pipelines, collaborating with ML and backend teams, and managing CI/CD. You will also provide technical leadership and mentorship.

Good To Have:

Familiarity with open source LLM and open source serving solutions (vLLM, llama.cpp, kserve).
Experience with bare metal GPUs.
Desire to work at a fast-growing Series A startup, comfortable with uncertainty, owning and scaling new products, and embracing an experimental and iterative development process.

Must Have:

Build and scale MLOps systems for ML model lifecycle.
Design and implement robust model training, evaluation, and release pipelines.
Collaborate with ML and backend teams for scalable secure infrastructure.
Facilitate a "you build it, you run it" culture with monitoring tools.
Manage CI/CD pipelines for efficient code integration and deployment.
Identify and implement opportunities to enhance engineering speed.
Provide technical leadership and mentor junior engineers in MLOps principles.
7+ years of software engineering experience, with 5+ years of infrastructure-as-code.
Proficiency in managing Kubernetes clusters and applications.
Experience creating and maintaining CI/CD pipelines.
Deep knowledge of at least one major cloud provider (GCP, Azure, Oracle Cloud).
Proficient in at least one backend programming/scripting language (Golang, Python, Bash).
Knowledge of SLURM or similar job schedulers for distributed training.
Experience with data pipeline and workflow management tools.

Perks:

Equity
Benefits

Add these skills to join the top 1% applicants for this job

oracle

github

cpp

game-texts

azure

ansible

terraform

helm

spark

google-cloud-platform

microsoft-azure

ci-cd

kubernetes

python

github-actions

bash

About Inworld

At Inworld, we believe the processes of building, scaling, and evolving applications are monsters that consume value before it can reach users. Our mission is to solve evolution and transform static software into AI systems that autonomously evolve to better serve their users. We are building an intelligent runtime to conquer these monsters and make this vision a reality.

We are backed by investors such as Lightspeed, Section 32, Kleiner Perkins, Microsoft’s M12 venture fund, BITKRAFT, Founders Fund, and First Spark Ventures. Our technology is used by category leaders, including NVIDIA, Microsoft Xbox, Niantic, Wishroll, Little Umbrella and Streamlabs, among many others. Inworld has been recognized by CB Insights as one of the 100 most promising AI companies globally and has been named one of LinkedIn's Top 10 Startups in the USA.

About the role

At Inworld, we’re building the AI framework behind the next generation of real-time, immersive applications. As a Staff MLOps Engineer, you’ll design, build and scale the infrastructure that powers intelligent AI agents across massive consumer experiences while ensuring performance, reliability, and speed at every level.

What you’ll do

Build and scale MLOps systems to streamline the end-to-end ML model lifecycle on the Inworld AI platform, from training to deployment.
Design and implement robust model training, evaluation, and release pipelines.
Collaborate cross-functionally with ML and backend teams to design, deploy, and maintain scalable secure infrastructure for Inworld’s AI Engine and Studio.
Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services.
Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment.
Identify and implement opportunities to enhance engineering speed and efficiency.
Provide technical leadership in ML engineering best practices, raise the technical bar, and mentor junior engineers in MLOps principles.

Expected experience

7+ years of software engineering experience, with 5+ years of infrastructure-as-code
Proficiency in managing Kubernetes clusters and applications, including creating Helm charts/Kustomize manifests for new applications.
Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.).
Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud).
Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash.
Knowledge of SLURM or similar job schedulers for distributed training.
Experience with data pipeline and workflow management tools
Familiarity with open source LLM and open source serving solution (e.g. vLLM or llama.cpp, kserve, etc) is a plus.
Experience with bare metal GPUs (optional).
Desire to work at a fast-growing Series A startup, comfortable with uncertainty, owning and scaling new products, and embracing an experimental and iterative development process.

Set alerts for more jobs like Staff MLOps Engineer

Set alerts for new jobs by Inworld AI

Set alerts for new Research Development jobs in Canada

Set alerts for new jobs in Canada

Set alerts for Research Development (Remote) jobs

More Research Development Jobs

Senior Python Engineer with Gen AI experience

N-ix

Ukraine (Remote)

6 minutes ago