AI Inference Engineer

2 Months ago • All levels • Research Development

Job Summary

Job Description

As an AI Inference Engineer, you will be responsible for developing APIs for AI inference that will be used by both internal and external customers. You will benchmark and address bottlenecks throughout our inference stack, improve the reliability and observability of our systems, and respond to system outages. In addition, you will explore novel research and implement LLM inference optimizations. This role involves working on large-scale deployment of machine learning models for real-time inference and requires expertise in ML systems and deep learning frameworks.
Must have:
  • Experience with ML systems and deep learning frameworks.
  • Familiarity with common LLM architectures and optimization techniques.
  • Experience with deploying reliable, distributed, real-time model serving.
Good to have:
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Perks:
  • Comprehensive health, dental, and vision insurance.
  • 401(k) plan
  • Equity may be part of the total compensation package.

Job Details

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations

Qualifications

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Experience with deploying reliable, distributed, real-time model serving at scale
  • (Optional) Understanding of GPU architectures or experience with GPU kernel programming using CUDA
At Perplexity, we've experienced tremendous growth and adoption since publicly launching the world's first fully functional conversational answer engine just over a year ago. Our AI-powered search assistant has amassed 10 million monthly active users as of early 2024, with our mobile apps installed over 1 million times across iOS and Android devices. In 2023 alone, we served over 500 million queries from users around the globe.

To support our rapid expansion, we've raised significant funding from some of the most respected investors in technology. In January 2024, we raised $73.6 million in a Series B round led by IVP, with participation from NVIDIA, Jeff Bezos' investment fund, NEA, Databricks, and other prominent firms. We followed that up with a $62.7 million Series B1 round in April 2024 led by Daniel Gross, valuing Perplexity at over $1 billion.
Our prominent investor base includes IVP, NEA, Jeff Bezos, NVIDIA, Databricks, Bessemer Venture Partners, Elad Gil, Nat Friedman, Naval Ravikant, Tobi Lutke, and many other visionary individuals.
 
Final offer amounts are determined by multiple factors, including, experience and expertise, and may vary from the amounts listed above.
 
Equity: In addition to the base salary, equity may be part of the total compensation package.
Benefits: Comprehensive health, dental, and vision insurance for you and your dependents. Includes a 401(k) plan.
 
 

Similar Jobs

Postman - Staff Engineer, Identity Platform

Postman

San Francisco, California, United States (Hybrid)
2 Months ago
Mapbox - Software Development Engineer I, C++, Navigation

Mapbox

Minsk, Minsk Region, Belarus (On-Site)
3 Weeks ago
bytedance - Traffic Access Architectural SRE - Traffic Infrastructure

bytedance

Singapore (On-Site)
3 Months ago
bytedance - Research Engineer, Computer Vision

bytedance

San Jose, California, United States (On-Site)
8 Months ago
Cadence - Lead Application Engineer

Cadence

Cork, County Cork, Ireland (On-Site)
1 Month ago
Hedra - Applied Research Scientist

Hedra

New York, New York, United States (On-Site)
3 Months ago
Ansys - Senior R&D Engineer

Ansys

Montigny-le-Bretonneux, Île-de-France, France (On-Site)
2 Months ago
bytedance - Research Scientist- Foundation Model, Vision and Language

bytedance

San Jose, California, United States (On-Site)
8 Months ago
Apple - Engineering Program Manager, AIML Annotation & Visualization

Apple

Cupertino, California, United States (On-Site)
1 Month ago
HP - Machine Learning Engineer

HP

Sant Cugat Del Vallès, Catalonia, Spain (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

AppLovin - Software Engineer I, Backend

AppLovin

Palo Alto, California, United States (On-Site)
4 Months ago
Mozilla - Senior Software Engineer

Mozilla

Canada (Remote)
1 Month ago
bytedance - System Engineer, STE Intern - 2025 Start

bytedance

Singapore (On-Site)
2 Months ago
playrix  - Lead Technical Designer

playrix

Armenia (Remote)
8 Months ago
Ion - Senior Technical Support Analyst, Jersey City - 7537

Ion

Jersey City, New Jersey, United States (On-Site)
9 Months ago
Tencent - Senior Backend Programmer, Gaming

Tencent

London, England, United Kingdom (On-Site)
3 Months ago
Any Desk - C++ Software Developer

Any Desk

Tampa, Florida, United States (Hybrid)
1 Week ago
Epic Games - Senior Server Programmer

Epic Games

London, England, United Kingdom (On-Site)
3 Months ago
Autodesk - Software Engineer, C++

Autodesk

Bengaluru, Karnataka, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

Kaedim - Customer Success Engineer

Kaedim

London, England, United Kingdom (On-Site)
1 Year ago
The Rank Group - Customer Service Host

The Rank Group

Blackpool, England, United Kingdom (On-Site)
7 Months ago
Qualcomm - Software Security Engineer

Qualcomm

Farnborough, England, United Kingdom (On-Site)
2 Months ago
FORTUNE - Business Development Manager (Media Sales)

FORTUNE

London, England, United Kingdom (On-Site)
1 Month ago
Scopely - Unity Client Engineer - Unannounced Project

Scopely

London, England, United Kingdom (Hybrid)
4 Months ago
Xsolla - Senior Director/Vice President of Regional Marketing - EMEA

Xsolla

London, England, United Kingdom (Hybrid)
4 Weeks ago
Ion - Principal Technical Consultant - Endur

Ion

London, England, United Kingdom (On-Site)
9 Months ago
London stock Exchange - Director, Real-Time Platform Transformation – Product Lead

London stock Exchange

London, England, United Kingdom (On-Site)
1 Year ago
Landor - Senior Designer

Landor

London, England, United Kingdom (Hybrid)
1 Week ago
Critical mass - Technology Director

Critical mass

London, England, United Kingdom (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

zoox - Machine Learning Engineer - Prediction and Planning

zoox

Foster City, California, United States (Hybrid)
2 Weeks ago
Playtika - R&D Team Leader

Playtika

Romania (Hybrid)
8 Months ago
Match Group - Machine Learning Engineer

Match Group

New York, New York, United States (Hybrid)
9 Months ago
Western Digital - Intern - AI Software Developer

Western Digital

Phra Nakhon Si Ayutthaya, Thailand (On-Site)
1 Week ago
Capgemini - Machine Learning Developer

Capgemini

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Canva - Backend Software Engineer - Gen AI, Design Generation Experience

Canva

Melbourne, Victoria, Australia (Remote)
3 Months ago
Reddit - Senior Machine Learning Engineer, Conversion Lift

Reddit

Canada (Remote)
1 Month ago
Token Metrics - Tech Lead - Crypto & AI (Jakarta - Remote)

Token Metrics

Jakarta, Jakarta, Indonesia (Remote)
9 Months ago
ISS Stoxx - Research Analyst (Financial Services)

ISS Stoxx

Mumbai, Maharashtra, India (On-Site)
2 Months ago
bytedance - AI Developer Community Operations Expert

bytedance

(On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Palo Alto, California, United States (On-Site)

London, England, United Kingdom (On-Site)

New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (Remote)

Palo Alto, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Palo Alto, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Perplexity

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug