AI Inference Engineer
Perplexity
Job Summary
As an AI Inference Engineer, you will be responsible for developing APIs for AI inference that will be used by both internal and external customers. You will benchmark and address bottlenecks throughout our inference stack, improve the reliability and observability of our systems, and respond to system outages. In addition, you will explore novel research and implement LLM inference optimizations. This role involves working on large-scale deployment of machine learning models for real-time inference and requires expertise in ML systems and deep learning frameworks.
Must Have
- Experience with ML systems and deep learning frameworks.
- Familiarity with common LLM architectures and optimization techniques.
- Experience with deploying reliable, distributed, real-time model serving.
Good to Have
- Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Perks & Benefits
- Comprehensive health, dental, and vision insurance.
- 401(k) plan
- Equity may be part of the total compensation package.
Job Description
We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.
Responsibilities
- Develop APIs for AI inference that will be used by both internal and external customers
- Benchmark and address bottlenecks throughout our inference stack
- Improve the reliability and observability of our systems and respond to system outages
- Explore novel research and implement LLM inference optimizations
Qualifications
- Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
- Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
- Experience with deploying reliable, distributed, real-time model serving at scale
- (Optional) Understanding of GPU architectures or experience with GPU kernel programming using CUDA
To support our rapid expansion, we've raised significant funding from some of the most respected investors in technology. In January 2024, we raised $73.6 million in a Series B round led by IVP, with participation from NVIDIA, Jeff Bezos' investment fund, NEA, Databricks, and other prominent firms. We followed that up with a $62.7 million Series B1 round in April 2024 led by Daniel Gross, valuing Perplexity at over $1 billion.
Our prominent investor base includes IVP, NEA, Jeff Bezos, NVIDIA, Databricks, Bessemer Venture Partners, Elad Gil, Nat Friedman, Naval Ravikant, Tobi Lutke, and many other visionary individuals.