Staff Full Stack Engineer, Speech Infrastructure - USA

Inworld AI

5+ Years | Mountain View, California, United States (On Site) | Full Time | 2 months ago

Apply Now

Job Summary

At Inworld, we are building an intelligent runtime to transform static software into AI systems that autonomously evolve. This role focuses on building critical infrastructure for real-time, interactive experiences, specifically for demanding tasks like Text-to-Speech (TTS) and Speech-to-Text (STT). The Full Stack Engineer will design, build, and scale full-stack systems for voice production models, focusing on robust, low-latency platforms for next-generation AI-driven software.

Must Have

Design, develop, and scale infrastructure for TTS, STT, and other real-time voice AI capabilities.
Engineer robust deployment systems for speech models on Kubernetes with PyTorch.
Write clean, high-performance backend services and APIs in Python, Java/Kotlin, and Go.
Create and maintain internal web applications and dashboards using Typescript.
Collaborate closely with ML engineers to bridge cutting-edge research models and production-ready solutions.
BA/BS degree in Computer Science or a related technical field, or equivalent practical experience.
5+ years of professional experience in full-stack or backend software development.
Demonstrated experience in building production-grade application APIs that span both backend and frontend stacks.
Strong proficiency in Python and demonstrated experience with Java/Kotlin, Go, or Typescript.
Hands-on experience building and maintaining production systems using containerization (Docker) and orchestration (Kubernetes).
Experience with or a strong interest in the infrastructure challenges of deploying ML models, particularly with frameworks like PyTorch.
Solid foundation in data structures, algorithms, and system design.

Good to Have

A passion for learning and staying up-to-date with the latest advancements in AI infrastructure and ML systems.
Direct experience building infrastructure for speech processing (TTS/STT) or other real-time ML applications.
Ability to work collaboratively in a fast-paced environment with shifting priorities.
Familiarity with MLOps best practices and tools.
Experience with cloud platforms like GCP or AWS.

Perks & Benefits

Bonus
Equity
Benefits
Relocation assistance

Job Description

About Inworld

At Inworld, we believe the processes of building, scaling, and evolving applications are monsters that consume value before it can reach users. Our mission is to solve evolution and transform static software into AI systems that autonomously evolve to better serve their users. We are building an intelligent runtime to conquer these monsters and make this vision a reality.

We are backed by investors such as Lightspeed, Section 32, Kleiner Perkins, Microsoft’s M12 venture fund, BITKRAFT, Founders Fund, and First Spark Ventures. Our technology is used by category leaders, including NVIDIA, Microsoft Xbox, Niantic, Wishroll, Little Umbrella and Streamlabs, among many others. Inworld has been recognized by CB Insights as one of the 100 most promising AI companies globally and has been named one of LinkedIn's Top 10 Startups in the USA.

About the role

Our intelligent runtime must seamlessly connect to foundational models to power real-time, interactive experiences. For this to be possible at scale, the infrastructure that serves these models, especially for demanding tasks like Text-to-Speech (TTS) and Speech-to-Text (STT), must be exceptionally fast, reliable, and cost-effective.

We are seeking a Full Stack Engineer to build this critical infrastructure. You will be responsible for designing, building, and scaling the full-stack systems that serve our voice production models. Your work will focus on the difficult engineering problems of building a robust, low-latency platform that forms the backbone of the next generation of AI-driven software.

Responsibilities

Design, develop, and scale the complete infrastructure that powers cutting-edge TTS, STT, and other real-time voice AI capabilities.
Engineer robust deployment systems for speech models on Kubernetes with PyTorch, ensuring high availability and low latency for intelligent runtime.
Write clean, high-performance backend services and APIs in Python, Java/Kotlin, and Go to handle audio processing, model inference, and complex data pipelines.
Create and maintain internal web applications and dashboards using Typescript to enable teams to monitor, debug, and manage speech systems effectively.
Collaborate closely with ML engineers to bridge the gap between cutting-edge research models and production-ready solutions that can serve millions of users.

Qualifications

A BA/BS degree in Computer Science or a related technical field, or equivalent practical experience.
5+ years of professional experience in full-stack or backend software development.
Demonstrate experience in building production-grade application APIs that span both backend and frontend stacks.
Strong proficiency in Python and demonstrated experience with one or more of the following: Java/Kotlin, Go, or Typescript.
Hands-on experience building and maintaining production systems using containerization (Docker) and orchestration (Kubernetes).
Experience with or a strong interest in the infrastructure challenges of deploying ML models, particularly with frameworks like PyTorch.
Solid foundation in data structures, algorithms, and system design.

A good fit for this role may have

A passion for learning and staying up-to-date with the latest advancements in AI infrastructure and ML systems.
Direct experience building infrastructure for speech processing (TTS/STT) or other real-time ML applications.
Ability to work collaboratively in a fast-paced environment with shifting priorities.
Familiarity with MLOps best practices and tools.
Experience with cloud platforms like GCP or AWS.

We believe in the power of in-person collaboration to solve the hardest problems and foster a strong team culture. We offer relocation assistance and look forward to you joining us in our Mountain View office.

The base salary range for this full-time position is $200,000 - $300,000+ bonus + equity + benefits.

13 Skills Required For This Role

Data Structures Game Texts Aws Spark Pytorch Docker Kubernetes Kotlin Python Algorithms Typescript Java System Design

Similar Jobs