Member of Technical Staff - ML Infra

Fundamental Labs

| Menlo Park, CA, United States (On Site) | Full Time | 2 weeks ago

Apply Now

Job Summary

As a Member of Technical Staff focusing on ML infrastructure, you will design and scale platforms for cutting-edge AI, including high-performance inference engines, agent technologies, and large-scale compute clusters. You will collaborate with researchers and product engineers to optimize inference performance, build reliable foundations for AI agents, and advance next-generation training and post-training pipelines. This role involves speeding up research development, building and optimizing model training pipelines, and developing robust engineering discipline for observability and reliability at scale.

Must Have

Speed up research development, helping researchers explore SOTA and new techniques on day one
Build and optimize model training pipeline including data collection, data loading, SFT and RL
Optimize a high-performance inference platform on top of both open-source and proprietary inference engines
Develop and scale technologies for large-scale cluster scheduling, high-performance distributed training, and AI networking
Build a strong engineering discipline across observability and reliability at scale
Collaborate with research and product teams to translate breakthroughs into robust, production-ready infrastructure
Expertise in one or more of: inference engines, GPU optimization, cluster scheduling, or cloud-native infra

Good to Have

Familiarity with modern ML frameworks (PyTorch, vLLM, Verl, etc.)
Startup-ready mindset (adaptable, fast-moving, high-ownership)

Perks & Benefits

Generous salary
Additional benefits to be discussed during the hiring process

Job Description

About the Role

As our Member of Technical Staff focused on ML infrastructure, you’ll design and scale the platforms that power cutting-edge AI: from high-performance inference engines to the underlying agent technologies and large-scale compute clusters that keep everything running.

You’ll collaborate closely with researchers and product engineers to push the limits of inference performance, build reliable foundations for AI agents, and advance the next generation of training and post-training pipelines.

Responsibilities

Speed up research development, help researchers explore SOTA and new techniques on day one
Build and optimize model training pipeline including data collection, data loading, SFT and RL
Optimize a high-performance inference platform on top of both open-source and proprietary inference engines
Develop and scale technologies for large-scale cluster scheduling, high-performance distributed training, and AI networking
Build a strong engineering discipline across observability and reliability at scale
Collaborate with research and product teams to translate breakthroughs into robust, production-ready infrastructure

Qualifications

Expertise in one or more of: inference engines, GPU optimization, cluster scheduling, or cloud-native infra
Familiarity with modern ML frameworks (PyTorch, vLLM, Verl, etc.)
Startup-ready mindset (adaptable, fast-moving, high-ownership)

What makes us interesting

Small, elite team of ex-founders, researchers from top AI Labs, top CS grads, and engineers from top companies
True ownership You will not be blocked by bureaucracy, shipping meaningful work within weeks rather than months
Serious momentum We're well-funded by top investors, moving fast, and focused on execution

What we do

Ship consumer products powered by cutting-edge AI research, and
Build infrastructure that facilitates research and product, and
Innovate cutting-edge research that will open up new consumer product forms

The Details

Startup hours apply
Generous salary, with additional benefits to be discussed during the hiring process

3 Skills Required For This Role

Game Texts Networking Pytorch

Similar Jobs