Member of Technical Staff - ML Infra

Fundamental Labs

Job Summary

As a Member of Technical Staff focusing on ML infrastructure, you will design and scale platforms for cutting-edge AI, including high-performance inference engines, agent technologies, and large-scale compute clusters. You will collaborate with researchers and product engineers to optimize inference performance, build reliable foundations for AI agents, and advance next-generation training and post-training pipelines. This role involves speeding up research development, building and optimizing model training pipelines, and developing robust engineering discipline for observability and reliability at scale.

Must Have

  • Speed up research development, helping researchers explore SOTA and new techniques on day one
  • Build and optimize model training pipeline including data collection, data loading, SFT and RL
  • Optimize a high-performance inference platform on top of both open-source and proprietary inference engines
  • Develop and scale technologies for large-scale cluster scheduling, high-performance distributed training, and AI networking
  • Build a strong engineering discipline across observability and reliability at scale
  • Collaborate with research and product teams to translate breakthroughs into robust, production-ready infrastructure
  • Expertise in one or more of: inference engines, GPU optimization, cluster scheduling, or cloud-native infra

Good to Have

  • Familiarity with modern ML frameworks (PyTorch, vLLM, Verl, etc.)
  • Startup-ready mindset (adaptable, fast-moving, high-ownership)

Perks & Benefits

  • Generous salary
  • Additional benefits to be discussed during the hiring process

Job Description

About the Role

As our Member of Technical Staff focused on ML infrastructure, you’ll design and scale the platforms that power cutting-edge AI: from high-performance inference engines to the underlying agent technologies and large-scale compute clusters that keep everything running.

You’ll collaborate closely with researchers and product engineers to push the limits of inference performance, build reliable foundations for AI agents, and advance the next generation of training and post-training pipelines.

Responsibilities

  • Speed up research development, help researchers explore SOTA and new techniques on day one
  • Build and optimize model training pipeline including data collection, data loading, SFT and RL
  • Optimize a high-performance inference platform on top of both open-source and proprietary inference engines
  • Develop and scale technologies for large-scale cluster scheduling, high-performance distributed training, and AI networking
  • Build a strong engineering discipline across observability and reliability at scale
  • Collaborate with research and product teams to translate breakthroughs into robust, production-ready infrastructure

Qualifications

  • Expertise in one or more of: inference engines, GPU optimization, cluster scheduling, or cloud-native infra
  • Familiarity with modern ML frameworks (PyTorch, vLLM, Verl, etc.)
  • Startup-ready mindset (adaptable, fast-moving, high-ownership)

What makes us interesting

  • Small, elite team of ex-founders, researchers from top AI Labs, top CS grads, and engineers from top companies
  • True ownership You will not be blocked by bureaucracy, shipping meaningful work within weeks rather than months
  • Serious momentum We're well-funded by top investors, moving fast, and focused on execution

What we do

  • Ship consumer products powered by cutting-edge AI research, and
  • Build infrastructure that facilitates research and product, and
  • Innovate cutting-edge research that will open up new consumer product forms

The Details

  • Startup hours apply
  • Generous salary, with additional benefits to be discussed during the hiring process

3 Skills Required For This Role

Game Texts Networking Pytorch

Similar Jobs