AI Infrastructure Engineer, Model Serving Platform

Scale AI

4+ Years | San Francisco, California, United States (On Site) | Full Time | 6 months ago

Apply Now

Job Summary

As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs and AI agents. The ideal candidate combines strong ML fundamentals with deep expertise in backend system design. You’ll work in a highly collaborative environment, bridging research and engineering to deliver seamless experiences to our customers and accelerate innovation across the company. You will build and maintain fault-tolerant, high-performance systems, collaborate with researchers and engineers, conduct architecture and design reviews, develop monitoring and observability solutions, and lead projects end-to-end.

Must Have

4+ years of experience building large-scale backend systems.
Strong programming skills in one or more languages.
Deep understanding of concurrency and distributed systems.
Experience with containers and orchestration tools.
Familiarity with cloud infrastructure and infrastructure as code.
Proven ability to solve complex problems independently.

Good to Have

Experience with modern LLM serving frameworks.
Knowledge of ML frameworks and optimization.
Experience with model inference optimizations.
Familiarity with emerging agent frameworks.

Job Description

As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs and AI agents. Our platform powers cutting-edge research and production systems, supporting both internal and external use cases across various environments.

The ideal candidate combines strong ML fundamentals with deep expertise in backend system design. You’ll work in a highly collaborative environment, bridging research and engineering to deliver seamless experiences to our customers and accelerate innovation across the company.

You will:

Build and maintain fault-tolerant, high-performance systems for serving LLMs and agent-based workloads at scale.
Collaborate with researchers and engineers to integrate and optimize models for production and research use cases.
Conduct architecture and design reviews to uphold best practices in system design and scalability.
Develop monitoring and observability solutions to ensure system health and performance.
Lead projects end-to-end, from requirements gathering to implementation, in a cross-functional environment.

Ideally you'd have:

4+ years of experience building large-scale, high-performance backend systems.
Strong programming skills in one or more languages (e.g., Python, Go, Rust, C++).
Deep understanding of concurrency, memory management, networking, and distributed systems.
Experience with containers, virtualization, and orchestration tools (e.g., Docker, Kubernetes).
Familiarity with cloud infrastructure (AWS, GCP) and infrastructure as code (e.g., Terraform).
Proven ability to solve complex problems and work independently in fast-moving environments.

Nice to haves:

Experience with modern LLM serving frameworks such as vLLM, SGLang, TensorRT-LLM, or text-generation-inference.
Knowledge of ML frameworks (e.g., PyTorch or TensorFlow) and how to optimize them for production serving.
Experience with model inference optimizations such as quantization, distillation, speculative decoding, etc.
Familiarity with emerging agent frameworks such as OpenHands, Agent2Agent, MCP.

14 Skills Required For This Role

Cross Functional Cpp Game Texts Networking Aws Rust Model Serving Terraform Pytorch Docker Kubernetes Python Tensorflow System Design

Similar Jobs

Devops

Infrastructure Engineer

Cerebras Systems • Sunnyvale, California, United States (On Site)