Software Engineer

Gimlet Labs

| San Francisco, California, United States (On Site) | Full Time | 2 weeks ago

Apply Now

Job Summary

Gimlet Labs is building the foundation for the next generation of AI applications, redefining AI inference with an integrated hardware-software stack for breakthrough performance and efficiency. The platform offers a seamless developer experience for deploying, managing, and monitoring AI workloads. As a Software Engineer, you will contribute to scaling this platform, working on orchestration for distributed data and workloads, and compilation frameworks for AI optimization across diverse environments. This role involves shaping modern AI infrastructure by driving performance for the latest AI models and refining the AI deployment experience.

Must Have

Designing and implementing infrastructure for running distributed AI workloads at production scale
Monitoring and evaluating cutting-edge AI research
Writing clean, maintainable code and participating in code reviews to uphold high standards and secure, sustainable, well-tested systems
Bachelor’s degree in computer science, engineering, applied mathematics or comparable area of study
Experience with AI/ML or distributed systems

Good to Have

Graduate degree in computer science, engineering, applied mathematics or comparable area of study
Experience with PyTorch, TensorFlow, ONNX and other AI frameworks
Familiarity with distributed systems and orchestration frameworks (e.g., Kubernetes)
Software development experience with Python and C++
Understanding of the latest AI research and techniques

Perks & Benefits

Offers Equity

Job Description

The company is building the foundation for the next generation of AI applications. As generative AI workloads rapidly scale, inference efficiency is becoming the critical bottleneck. The company is redefining AI inference from the ground up, combining cutting-edge research with an integrated hardware-software stack that delivers breakthrough performance, efficiency, and model quality.

The company pairs its inference stack with a seamless developer experience, allowing users to deploy, manage, and monitor AI workloads from frameworks like PyTorch and LangChain at production scale in seconds.

The company is spun out of a Stanford research project under Professors Zain Asgar and Sachin Katti. The founding team has deep experience across AI, distributed systems, and hardware with previous successful exits.

The company is seeking a professional to help build and scale our platform for deploying efficient AI inference. You will have the opportunity to work across the stack: from the orchestration layer for distributing data and workloads at production scale, to its compilation framework for optimizing AI across diverse environments. Whether you’re diving deep into breakthrough techniques to drive performance for the latest AI models, designing systems for processing millions of tokens a second, or refining the developer experience for AI deployment, you’ll help shape the future of modern AI infrastructure.

Responsibilities:

Designing and implementing infrastructure for running distributed AI workloads at production scale
Monitoring and evaluating cutting-edge AI research
Writing clean, maintainable code and participating in code reviews to uphold high standards and secure, sustainable, well-tested systems

Qualifications:

Bachelor’s degree in computer science, engineering, applied mathematics or comparable area of study
Experience with AI/ML or distributed systems

Preferred Qualifications:

Graduate degree in computer science, engineering, applied mathematics or comparable area of study
Experience with PyTorch, TensorFlow, ONNX and other AI frameworks
Familiarity with distributed systems and orchestration frameworks (e.g., Kubernetes)
Software development experience with Python and C++
Understanding of the latest AI research and techniques