AI Inference Engineer

1 Month ago • All levels • Research Development

Job Summary

Job Description

As an AI Inference Engineer, you will be responsible for developing APIs for AI inference that will be used by both internal and external customers. You will benchmark and address bottlenecks throughout our inference stack, improve the reliability and observability of our systems, and respond to system outages. In addition, you will explore novel research and implement LLM inference optimizations. This role involves working on large-scale deployment of machine learning models for real-time inference and requires expertise in ML systems and deep learning frameworks.
Must have:
  • Experience with ML systems and deep learning frameworks.
  • Familiarity with common LLM architectures and optimization techniques.
  • Experience with deploying reliable, distributed, real-time model serving.
Good to have:
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Perks:
  • Comprehensive health, dental, and vision insurance.
  • 401(k) plan
  • Equity may be part of the total compensation package.

Job Details

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations

Qualifications

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Experience with deploying reliable, distributed, real-time model serving at scale
  • (Optional) Understanding of GPU architectures or experience with GPU kernel programming using CUDA
At Perplexity, we've experienced tremendous growth and adoption since publicly launching the world's first fully functional conversational answer engine just over a year ago. Our AI-powered search assistant has amassed 10 million monthly active users as of early 2024, with our mobile apps installed over 1 million times across iOS and Android devices. In 2023 alone, we served over 500 million queries from users around the globe.

To support our rapid expansion, we've raised significant funding from some of the most respected investors in technology. In January 2024, we raised $73.6 million in a Series B round led by IVP, with participation from NVIDIA, Jeff Bezos' investment fund, NEA, Databricks, and other prominent firms. We followed that up with a $62.7 million Series B1 round in April 2024 led by Daniel Gross, valuing Perplexity at over $1 billion.
Our prominent investor base includes IVP, NEA, Jeff Bezos, NVIDIA, Databricks, Bessemer Venture Partners, Elad Gil, Nat Friedman, Naval Ravikant, Tobi Lutke, and many other visionary individuals.
 
Final offer amounts are determined by multiple factors, including, experience and expertise, and may vary from the amounts listed above.
 
Equity: In addition to the base salary, equity may be part of the total compensation package.
Benefits: Comprehensive health, dental, and vision insurance for you and your dependents. Includes a 401(k) plan.
 
 

Similar Jobs

Ion - Lead Software Engineer, Italy

Ion

Collecchio, Emilia-Romagna, Italy (On-Site)
8 Months ago
AeroSpike - Solutions Architect

AeroSpike

Mumbai, Maharashtra, India (On-Site)
1 Month ago
Varonis  - Cloud Security Researcher

Varonis

Herzliya, Tel Aviv District, Israel (On-Site)
8 Months ago
Genies - 2025 Summer Backend Engineer Intern

Genies

San Mateo, California, United States (On-Site)
3 Months ago
Cadence - Lead Software Engineer (Agentic AI)

Cadence

Cambridge, England, United Kingdom (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Google - Technical Solutions Consultant, Google Photos and Google One

Google

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Epic Games - Senior Rust Software Engineer

Epic Games

Cary, North Carolina, United States (On-Site)
1 Month ago
yellow brick games - Gameplay Programmer

yellow brick games

Montreal, Quebec, Canada (Remote)
1 Month ago
Kojima - Sound Programmer

Kojima

Minato City, Tokyo, Japan (On-Site)
1 Month ago
InnoGames - Software Developer (Full Stack)

InnoGames

Hamburg, Hamburg, Germany (Hybrid)
1 Week ago
Ion - Technical Support Analyst - 8034

Ion

Hong Kong (On-Site)
8 Months ago
Keen Software House - Senior Gameplay Programmer

Keen Software House

Prague, Prague, Czechia (Remote)
4 Months ago
Qualcomm - Software Test Engineer, Sr.

Qualcomm

San Diego, California, United States (On-Site)
3 Weeks ago
The Mill - Lead Developer

The Mill

New York, New York, United States (On-Site)
11 Months ago
Virtuos - Software Engineer

Virtuos

Czechia (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

Banyan Software - Senior Financial Analyst

Banyan Software

London, England, United Kingdom (Hybrid)
2 Weeks ago
Inspired Entertainment - Depot Operations Team Leader

Inspired Entertainment

Ashby-de-la-Zouch, England, United Kingdom (On-Site)
1 Month ago
NBC Universal - VP, HR, NBC News

NBC Universal

Brentford, England, United Kingdom (On-Site)
4 Days ago
Aryaka - Senior Sales Engineer

Aryaka

United Kingdom (Remote)
2 Weeks ago
ISG - Principal Consultant – IT Price Benchmarking

ISG

Guildford, England, United Kingdom (On-Site)
3 Weeks ago
Polygon Labs - Director of Product Management - Protocol

Polygon Labs

United Kingdom (Remote)
2 Months ago
Netflix - Senior Manager, Import & Studio Relations - EMEA

Netflix

London, England, United Kingdom (On-Site)
7 Months ago
Ion - Senior Security Architect

Ion

London, England, United Kingdom (On-Site)
8 Months ago
Bazaar Voice - Sales Development Representative - German Speaking

Bazaar Voice

London, England, United Kingdom (Hybrid)
2 Months ago
Zscaler - SDR Operations Manager

Zscaler

London, England, United Kingdom (Hybrid)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

San Francisco, California, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

Palo Alto, California, United States (Hybrid)

San Francisco, California, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

San Francisco, California, United States (On-Site)

New York, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Perplexity

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug