AI Inference Engineer

3 Months ago • All levels • Research Development

Job Summary

Job Description

As an AI Inference Engineer, you will be responsible for developing APIs for AI inference that will be used by both internal and external customers. You will benchmark and address bottlenecks throughout our inference stack, improve the reliability and observability of our systems, and respond to system outages. In addition, you will explore novel research and implement LLM inference optimizations. This role involves working on large-scale deployment of machine learning models for real-time inference and requires expertise in ML systems and deep learning frameworks.
Must have:
  • Experience with ML systems and deep learning frameworks.
  • Familiarity with common LLM architectures and optimization techniques.
  • Experience with deploying reliable, distributed, real-time model serving.
Good to have:
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Perks:
  • Comprehensive health, dental, and vision insurance.
  • 401(k) plan
  • Equity may be part of the total compensation package.

Job Details

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations

Qualifications

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Experience with deploying reliable, distributed, real-time model serving at scale
  • (Optional) Understanding of GPU architectures or experience with GPU kernel programming using CUDA
At Perplexity, we've experienced tremendous growth and adoption since publicly launching the world's first fully functional conversational answer engine just over a year ago. Our AI-powered search assistant has amassed 10 million monthly active users as of early 2024, with our mobile apps installed over 1 million times across iOS and Android devices. In 2023 alone, we served over 500 million queries from users around the globe.

To support our rapid expansion, we've raised significant funding from some of the most respected investors in technology. In January 2024, we raised $73.6 million in a Series B round led by IVP, with participation from NVIDIA, Jeff Bezos' investment fund, NEA, Databricks, and other prominent firms. We followed that up with a $62.7 million Series B1 round in April 2024 led by Daniel Gross, valuing Perplexity at over $1 billion.
Our prominent investor base includes IVP, NEA, Jeff Bezos, NVIDIA, Databricks, Bessemer Venture Partners, Elad Gil, Nat Friedman, Naval Ravikant, Tobi Lutke, and many other visionary individuals.
 
Final offer amounts are determined by multiple factors, including, experience and expertise, and may vary from the amounts listed above.
 
Equity: In addition to the base salary, equity may be part of the total compensation package.
Benefits: Comprehensive health, dental, and vision insurance for you and your dependents. Includes a 401(k) plan.
 
 

Similar Jobs

oni - Senior Software Engineer

oni

Oxford, England, United Kingdom (Hybrid)
3 Weeks ago
Penrose studios - Network Engineer

Penrose studios

San Francisco, California, United States (On-Site)
5 Years ago
bytedance - Senior Software Engineer, Anti-DDoS - Network Security

bytedance

San Jose, California, United States (On-Site)
4 Months ago
META4 - Senior Game AI Programmer

META4

Montreal, Quebec, Canada (Remote)
1 Year ago
GoMotive - Embedded Engineer

GoMotive

Taipei City, Taiwan (Remote)
2 Months ago
Razer - Senior AI Software Engineer - AI Integration

Razer

Singapore (On-Site)
2 Months ago
C3 IoT - Pre-Sales AI Director – Healthcare Provider/Payor

C3 IoT

Redwood City, California, United States (On-Site)
1 Month ago
Thales - Quantum-AI Research Scientist

Thales

Montreal, Quebec, Canada (On-Site)
3 Months ago
USE Insider - Senior Machine Learning Engineer (Search)

USE Insider

Istanbul, İstanbul, Türkiye (Remote)
1 Month ago
Tencent - Senior AI Strategy Researcher

Tencent

California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Rocket Science - Software Engineer - UI

Rocket Science

Brighton And Hove, England, United Kingdom (Hybrid)
4 Months ago
Quantic Dream - Engine Programmer

Quantic Dream

Paris, Île-de-France, France (Hybrid)
4 Months ago
Thales - Senior Sales Engineer

Thales

California, United States (Remote)
2 Months ago
Qualcomm - Graphics/GPU Software Engineer

Qualcomm

Hyderabad, Telangana, India (On-Site)
2 Months ago
Mashgin - Senior Technical Product Manager

Mashgin

Palo Alto, California, United States (Hybrid)
10 Months ago
Fortra - Senior Machine Learning Engineer

Fortra

United Kingdom (On-Site)
1 Month ago
Ion - Senior Software Engineer - C++

Ion

Pune, Maharashtra, India (On-Site)
1 Year ago
Epic Games - Desktop Platform Programmer, Fortnite Tech

Epic Games

Vancouver, British Columbia, Canada (On-Site)
4 Months ago
Tesla - Senior Trading Analyst, Autobidder

Tesla

North Holland, Netherlands (On-Site)
5 Months ago
Marvell - Design Verification Engineer

Marvell

Ho Chi Minh City, Ho Chi Minh City, Vietnam (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

Triple dot studios - Principal Game Artist

Triple dot studios

London, England, United Kingdom (Hybrid)
1 Month ago
Steel City Interactive - Lead Pipeline & Tools Engineer

Steel City Interactive

Sheffield, England, United Kingdom (Hybrid)
3 Months ago
ALTEN - Mechanical Design Engineer – High-Integrity Pressure Systems

ALTEN

Derby, England, United Kingdom (On-Site)
2 Months ago
Unity - Senior Software Engineer

Unity

Brighton And Hove, England, United Kingdom (On-Site)
3 Months ago
ChainGuard - Enterprise Account Executive - Middle East

ChainGuard

United Kingdom (Remote)
1 Month ago
Ninja theory - Animation

Ninja theory

Cambridge, England, United Kingdom (On-Site)
1 Month ago
Reddit - Strategic Community Specialist - Dutch speaker (contract)

Reddit

United Kingdom (Remote)
1 Month ago
Publicis Groupe - Project Manager

Publicis Groupe

London, England, United Kingdom (Hybrid)
1 Month ago
Tangle Wood Games - Senior Animation Engineer

Tangle Wood Games

Hartlepool, England, United Kingdom (Remote)
2 Months ago
lucas films - Animator

lucas films

London, England, United Kingdom (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

Autodesk - Senior Software Engineer, ML Ops

Autodesk

London, England, United Kingdom (Hybrid)
2 Months ago
Cloud Imperium Games - AI Programmer

Cloud Imperium Games

Frankfurt Am Main, Hessen, Germany (On-Site)
1 Year ago
Match Group - Machine Learning Engineer

Match Group

Seoul, South Korea (Hybrid)
1 Month ago
bytedance - Research Scientist Graduate (Generative AI for Science (ByteDance Seed)) - 2026 Start (PhD)

bytedance

San Jose, California, United States (On-Site)
3 Weeks ago
NielsenIQ - Machine Learning Engineer

NielsenIQ

Barcelona, Catalonia, Spain (On-Site)
2 Months ago
bytedance - Research Scientist in Generative AI for Science

bytedance

San Jose, California, United States (On-Site)
4 Months ago
Pinterest - Senior Staff Machine Learning Engineer, Homefeed

Pinterest

San Francisco, California, United States (Hybrid)
1 Month ago
bytedance - Research Scientist - Multimodal Interaction and World Model - Pre-Training

bytedance

Seattle, Washington, United States (On-Site)
4 Months ago
Reddit - Staff Machine Learning Engineer, Ads Measurement

Reddit

Ontario, Canada (Remote)
2 Months ago
ISS Stoxx - Primary Market Research Analyst

ISS Stoxx

Mumbai, Maharashtra, India (On-Site)
1 Year ago

Get notifed when new similar jobs are uploaded

About The Company

San Francisco, California, United States (On-Site)

New York, New York, United States (On-Site)

Belgrade, Serbia (Hybrid)

Palo Alto, California, United States (On-Site)

New York, New York, United States (Hybrid)

New York, New York, United States (On-Site)

San Francisco, California, United States (On-Site)

Palo Alto, California, United States (On-Site)

London, England, United Kingdom (On-Site)

View All Jobs

Get notified when new jobs are added by Perplexity

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug