Staff Technical Lead for Inference & ML Performance

4 Hours ago • All levels
Research Development

Job Description

fal is seeking a Staff Technical Lead for Inference & ML Performance to lead and optimize state-of-the-art inference systems for generative-media infrastructure. This role involves setting technical direction, hands-on contribution to performance enhancements, collaborating with research and applied ML teams, driving advanced optimizations, and mentoring a team of performance-focused engineers. The ideal candidate will have deep expertise in ML performance optimization and a strategic vision to push the boundaries of model inference performance.
Good To Have:
  • Experience building inference engines for diffusion and generative media models
  • Track record of industry-leading performance improvements (papers, open-source contributions, benchmarks)
  • Leadership experience in scaling technical teams
Must Have:
  • Set technical direction for high-performance inference solutions
  • Personally contribute to critical inference performance enhancements and optimizations
  • Collaborate closely with research & applied ML teams on inference strategies
  • Drive advanced performance optimizations including model parallelism, kernel optimization, and compiler strategies
  • Mentor and scale a team of performance-focused engineers
  • Deep experience in ML performance optimization for large-scale generative models in production
  • Understanding of the full ML performance stack (PyTorch, TensorRT, TransformerEngine, Triton, CUTLASS kernels)
  • Expert-level familiarity with advanced inference techniques (quantization, kernel authoring, compilation, model parallelism, distributed serving, profiling)
  • Lead from the front as a respected IC
  • Thrive in cross-functional collaboration
Perks:
  • One of the highest impact roles at a fast-growing company
  • Revenue growing 40% MoM
  • 60x+ RR compared to last year
  • Raised Series A/B/C within the last 12 months
  • World-changing vision: hyperscaling human creativity

Add these skills to join the top 1% applicants for this job

cross-functional
leadership
game-texts
cross-functional-collaboration
model-serving
pytorch

fal is pioneering the next generation of generative-media infrastructure. We're pushing the boundaries of model inference performance to power seamless creative experiences at unprecedented scale. We're looking for a Staff Technical Lead for Inference & ML Performance, someone who blends deep technical expertise with strategic vision, guiding a team to build and optimize state-of-the-art inference systems. This role is intense yet deeply impactful. Apply if you're ready to lead the future of inference performance at a fast-paced, high-growth frontier.

Why this role matters

You’ll shape the future of fal’s inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.

What you'll do

| Day-to-day | What success looks like |

| --- | --- |

| Set technical direction. Guide your team (kernels, applied performance, ML compilers, distributed inference) to build high-performance inference solutions. | fal’s inference engine consistently outperforms industry benchmarks in throughput, latency, and efficiency. |

| Hands-on IC leadership. Personally contribute to critical inference performance enhancements and optimizations. | You regularly ship code that significantly improves model serving performance. |

| Collaborate closely with research & applied ML teams. Influence model inference strategies and deployment techniques. | Seamless integration of inference innovations rapidly moves from research to production deployment. |

| Drive advanced performance optimizations. Implement model parallelism, kernel optimization, and compiler strategies. | Performance bottlenecks are quickly identified and eliminated, dramatically enhancing inference speed and scalability. |

| Mentor and scale your team. Coach and expand your team of performance-focused engineers. | Your team independently innovates, proactively solves complex performance challenges, and consistently levels up their skills. |

You might be a fit if you

  • Are deeply experienced in ML performance optimization. You've optimized inference for large-scale generative models in production environments.
  • Understand the full ML performance stack. From PyTorch, TensorRT, TransformerEngine, Triton to CUTLASS kernels, you’ve navigated and optimized them all.
  • Know inference inside-out. Expert-level familiarity with advanced inference techniques: quantization, kernel authoring, compilation, model parallelism (TP, context/sequence parallel, expert parallel), distributed serving and profiling.
  • Lead from the front. You're a respected IC who enjoys getting hands-on with the toughest problems, demonstrating excellence to inspire your team.
  • Thrive in cross-functional collaboration. Comfortable interfacing closely with applied ML teams, researchers, and stakeholders.

Nice-to-haves

  • Experience building inference engines specifically for diffusion and generative media models
  • Track record of industry-leading performance improvements (papers, open-source contributions, benchmarks)
  • Leadership experience in scaling technical teams

What you'll get

One of the highest impact roles at one of the fastest growing companies (revenue is growing 40% MoM, we are 60x+ RR compared to last year, raised Series A/B/C within the last 12 months) with a world changing vision: hyperscaling human creativity.

Sound like your calling? Share your proudest optimization breakthrough, open-source contribution, or performance milestone with us. Let's set new standards for inference performance, together.

Set alerts for more jobs like Staff Technical Lead for Inference & ML Performance
Set alerts for new jobs by fal
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙