About the Team
The Kernels team at OpenAI builds the low-level software that accelerates our most ambitious AI research.
We work at the boundary of hardware and software, developing high-performance kernels, distributed system optimizations, and runtime improvements to make large-scale training and inference more efficient.
Our work enables OpenAI to push the limits by ensuring models - from LLMs to recommender systems - to run reliably on advanced supercomputing platforms. That includes adapting our software stack to new types of accelerators, tuning system performance end-to-end, and removing bottlenecks across every layer of the stack.
About the Role
On the Accelerators team, you will help OpenAI evaluate and bring up new compute platforms that can support large-scale AI training and inference.
Your work will range from prototyping system software on new accelerators to enabling performance optimizations across our AI workloads.
You’ll work across the stack, collaborating with both hardware and software aspects - working on kernels, sharding strategies, scaling across distributed systems, and performance modeling.
You'll help adapt OpenAI's software stack to non-traditional hardware and drive efficiency improvements in core AI workloads. This is not a compiler-focused role, rather bridging ML algorithms with system performance - especially at scale.
In this role, you will:
- Prototype and enable OpenAI's AI software stack on new, exploratory accelerator platforms.
- Optimize large-scale model performance (LLMs, recommender systems, distributed AI workloads) for diverse hardware environments.
- Develop kernels, sharding mechanisms, and system scaling strategies tailored to emerging accelerators.
- Collaborate on optimizations at the model code level (e.g. PyTorch) and below to enhance performance on non-traditional hardware.
- Perform system-level performance modeling, debug bottlenecks, and drive end-to-end optimization.
- Work with hardware teams and vendors to evaluate alternatives to existing platforms and adapt the software stack to their architectures.
- Contribute to runtime improvements, compute/communication overlapping, and scaling efforts for frontier AI workloads.
You might thrive in this role if you have:
- 3+ years of experience working on AI infrastructure, including kernels, systems, or hardware-software co-design.
- Hands-on experience with accelerator platforms for AI at data center scale (e.g., TPUs, custom silicon, exploratory architectures).
- Strong understanding of kernels, sharding, runtime systems, or distributed scaling techniques.
- Familiarity with optimizing LLMs, CNNs, or recommender models for hardware efficiency.
- Experience with performance modeling, system debugging, and software stack adaptation for novel architectures.
- Exposure to mobile accelerators is welcome, but experience enabling data center-scale AI hardware is preferred.
- Ability to operate across multiple levels of the stack, rapidly prototype solutions, and navigate ambiguity in early hardware bring-up phases.
- Interest in shaping the future of AI compute through exploration of alternatives to mainstream accelerators.