Member of Technical Staff - ML

16 Minutes ago • All levels
Research Development

Job Description

As a Member of Technical Staff focused on ML, you will be responsible for architecting and optimizing high-performance inference infrastructure for large foundation models. This includes benchmarking and improving latency, throughput, and agent responsiveness, deploying new model architectures, implementing caching, batching, and prioritization for high-volume requests, and building monitoring into inference pipelines. The role involves pushing framework limits and refining agent architecture.
Must Have:
  • Architect and optimize high-performance inference infrastructure for large foundation models
  • Benchmark and improve latency, throughput, and agent responsiveness
  • Work with researchers to deploy new model architectures and multi-step agent behaviors
  • Implement caching, batching, and prioritization to handle high-volume requests
  • Build monitoring and observability into inference pipelines
  • Strong experience in distributed systems and low-latency ML serving
  • Skilled with performance optimization tools and techniques, and experienced in developing solutions for critical performance gains
  • Hands-on with vLLM, SGLang, or equivalent frameworks
  • Familiarity with GPU optimization, CUDA, and model parallelism
  • Comfort working in a high-velocity, ambiguity-heavy startup environment
Perks:
  • Generous salary
  • Additional benefits to be discussed during the hiring process

Add these skills to join the top 1% applicants for this job

game-texts
cuda

About the Role

As a Member of Technical staff focused on ML, you’ll push the limits of frameworks, refine our agent architecture, and build the benchmarks that define performance at scale. You’ll help take our frontier models from the lab into lightning-fast production-ready services.

If you relish experimenting with the latest serving research, building optimizations, and shipping infrastructure for researchers, then we invite you to apply!

Responsibilities

  • Architect and optimize high-performance inference infrastructure for large foundation models
  • Benchmark and improve latency, throughput, and agent responsiveness
  • Work with researchers to deploy new model architectures and multi-step agent behaviors
  • Implement caching, batching, and prioritization to handle high-volume requests
  • Build monitoring and observability into inference pipelines

Qualifications

  • Strong experience in distributed systems and low-latency ML serving
  • Skilled with performance optimization tools and techniques, and experienced in developing solutions for critical performance gains
  • Hands-on with vLLM, SGLang, or equivalent frameworks
  • Familiarity with GPU optimization, CUDA, and model parallelism
  • Comfort working in a high-velocity, ambiguity-heavy startup environment

What makes us interesting

  • Small, elite team of ex-founders, researchers from top AI Labs, top CS grads, and engineers from top companies
  • True ownership You will not be blocked by bureaucracy, shipping meaningful work within weeks rather than months
  • Serious momentum We're well-funded by top investors, moving fast, and focused on execution

What we do

  • Ship consumer products powered by cutting-edge AI research, and
  • Build infrastructure that facilitates research and product, and
  • Innovate cutting-edge research that will open up new consumer product forms

The Details

  • Full-time, onsite role
  • Startup hours apply
  • Generous salary, with additional benefits to be discussed during the hiring process

Set alerts for more jobs like Member of Technical Staff - ML
Set alerts for new jobs by Fundamental Labs
Set alerts for new Research Development jobs in United States
Set alerts for new jobs in United States
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙