Research Engineer - Performance Optimization

6 Months ago • All levels • Research Development • $200,000 PA - $300,000 PA

Job Summary

Job Description

We are seeking engineers with strong problem-solving abilities in PyTorch, CUDA, and distributed systems. You will collaborate with Research Scientists to develop and train advanced foundation models utilizing thousands of GPUs. The role involves ensuring efficient implementation of models and systems for data processing, training, inference, and deployment. Key responsibilities include identifying and applying optimization techniques for large-scale parallel and distributed systems, remedying bottlenecks in speed and memory through profiling and high-performance coding in CUDA, Triton, C++, and PyTorch. You will also work with researchers to optimize system efficiency from conception to completion, build tools for data visualization and filtering, and implement cutting-edge prototypes for multimodal generative AI.
Must have:
  • Experience training large models with Python & PyTorch
  • Experience optimizing inference workloads
  • Experience profiling CPU & GPU code in PyTorch
  • Experience with parallel & distributed PyTorch code (DDP, FSDP)
  • Experience writing high-performance parallel C++
  • Experience with high-performance Triton / CUDA
Good to have:
  • Experience with Transformers & Multimodal Generative models
  • Experience building inference/demo prototype code (Gradio, Docker)
Perks:
  • Competitive equity packages in the form of stock options
  • Comprehensive benefits plan

Job Details

We are looking for engineers with significant problem solving experience in PyTorch, CUDA and distributed systems. You will work with Research Scientists to build & train cutting edge foundation models on thousands of GPUs. 

Responsibilities

  • Ensure efficient implementation of models & systems for data processing, training, inference and deployment

  • Identify and implement optimization techniques for massively parallel and distributed systems

  • Identify and remedy efficiency bottlenecks (memory, speed, utilization) by profiling and implementing high-performance CUDA, Triton, C++ and PyTorch code

  • Work closely together with the research team to ensure systems are planned to be as efficient as possible from start to finish

  • Build tools to visualize, evaluate and filter datasets

  • Implement cutting-edge product prototypes based on multimodal generative AI

Experience

  • Experience training large models using Python & Pytorch, including practical experience working with the entire development pipeline from data processing, preparation & data loading to training and inference.

  • Experience optimizing and deploying inference workloads for throughput and latency across the stack (inputs, model inference, outputs, parallel processing etc.)

  • Experience with profiling CPU & GPU code in PyTorch, including Nvidia Nsight or similar.

  • Experience writing & improving highly parallel & distributed PyTorch code, with familiarity in DDP, FSDP, Tensor Parallel, etc.

  • Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference code.

  • Experience with high-performance Triton / CUDA and writing custom PyTorch kernels. Top candidates will be able to utilize tensor cores; optimize performance with CUDA memory and other similar skills.

  • Good to have experience working with Deep learning concepts such as Transformers & Multimodal Generative models such as Diffusion Models and GANs.

  • Good to have experience building inference / demo prototype code (incl. Gradio, Docker etc.)


Compensation

  • The pay range for this position in California is $180,000 - $250,000yr; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan. 

Your applications are reviewed by real people.

Similar Jobs

Snail Studios - Senior Tools Engineer – AI NPC Systems

Snail Studios

Beverly Hills, California, United States (Hybrid)
2 Months ago
Qualcomm - Senior Engineer, AI Orchestration

Qualcomm

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Intrepid Studios - Senior Anti-Cheat Engineer

Intrepid Studios

Canada (On-Site)
12 Months ago
Thatgamecompany - Technical Game Designer

Thatgamecompany

Canada (Remote)
4 Months ago
Epic Games - Security Programmer - Backend (Asset Integrity)

Epic Games

Montreal, Quebec, Canada (On-Site)
4 Months ago
Riot Games - Research Operation Coordinator - Global Research Operations Team (Contract)

Riot Games

Shanghai, Shanghai, China (On-Site)
3 Months ago
sound cloud - Senior Machine Learning Engineer

sound cloud

Berlin, Berlin, Germany (On-Site)
2 Months ago
Accenture - AI / ML Engineer

Accenture

Pune, Maharashtra, India (On-Site)
1 Month ago
AI Fund - Machine Learning Engineer

AI Fund

(Remote)
9 Months ago
Glean - Senior/Staff Applied Scientist

Glean

Palo Alto, California, United States (Hybrid)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

PlayStation Global - QA Lead (Contract)

PlayStation Global

Los Angeles, California, United States (On-Site)
7 Months ago
Adtran - Senior Software Engineer

Adtran

Meiningen, Thuringia, Germany (Hybrid)
1 Month ago
Mozilla - Senior Test Engineer

Mozilla

Canada (Remote)
1 Month ago
bytedance - Site Reliability Engineer, Edge Services

bytedance

Boston, Massachusetts, United States (On-Site)
6 Months ago
bytedance - Application Security Engineer - Global Monetization

bytedance

Singapore (On-Site)
4 Months ago
Coda - Senior/Staff Software Engineer

Coda

Bangkok, Bangkok, Thailand (Hybrid)
1 Year ago
Luxoft - Regular C++ Software Developer

Luxoft

Chennai, Tamil Nadu, India (On-Site)
8 Months ago
Penrose studios - Lead Platform Engineer

Penrose studios

San Francisco, California, United States (On-Site)
5 Years ago
Mob entertainment  - Technical Systems Designer

Mob entertainment

(Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Palo Alto, California, United States

Twitch - Software Engineer - Creator

Twitch

New York, New York, United States (On-Site)
1 Month ago
Baton - Senior Software Engineer - Full Stack

Baton

San Francisco, California, United States (Hybrid)
1 Month ago
Coherent corp. - Software Engineer

Coherent corp.

Fremont, California, United States (On-Site)
2 Months ago
Dave Ramsey - FP&A Analyst

Dave Ramsey

Franklin, Tennessee, United States (On-Site)
2 Months ago
Marvell - Distinguished Engineer, Switch Architect

Marvell

Santa Clara, California, United States (On-Site)
1 Month ago
Stacklok - Director of Product Management

Stacklok

Bellevue, Washington, United States (Hybrid)
1 Month ago
Nintendo - Bilingual Project Manager (Japanese)

Nintendo

Redmond, Washington, United States (Hybrid)
11 Months ago
Apple - Data Engineer

Apple

New York, New York, United States (On-Site)
2 Months ago
bytedance - Senior Software Engineer - Network Security

bytedance

San Jose, California, United States (On-Site)
6 Months ago
Rockstar Games - Senior Production Coordinator: Motion Capture

Rockstar Games

New York, New York, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

bytedance - Senior Software Engineer - IaaS AI Infra

bytedance

San Jose, California, United States (On-Site)
4 Months ago
Apple - AIML - Senior Engineering Program Manager, Search and Answer Quality

Apple

Cupertino, California, United States (On-Site)
1 Month ago
Apple - Machine Learning Resource Management Engineer - SIML

Apple

Cupertino, California, United States (On-Site)
1 Month ago
Qualcomm - Senior Engineer, AI Orchestration

Qualcomm

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Unity - Principal Machine Learning Engineer

Unity

United States (Remote)
3 Months ago
Mistral AI - Applied AI Engineer, Use-case - Paris (Internship)

Mistral AI

Paris, Île-de-France, France (On-Site)
4 Weeks ago
zoox - Engineering Manager, ML Training Platform

zoox

Foster City, California, United States (Hybrid)
11 Months ago
appier - Research Scientist (Generative AI)

appier

Taipei City, Taiwan (On-Site)
1 Month ago
Snorkel AI - Head of Applied AI

Snorkel AI

New York, United States (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Luma

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug