Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization

5 Months ago • All levels • Research Development • $220,000 PA - $300,000 PA

Job Summary

Job Description

Luma is building multimodal AI to expand human imagination and capabilities, focusing on vision to create more aware, capable, and useful systems. They are seeking engineers experienced in maintaining and designing highly efficient systems and code optimized for multiple hardware platforms. The role involves ensuring efficient implementation of models and systems with a focus on abstractions that scale beyond NVIDIA/CUDA hardware, identifying and remedying efficiency bottlenecks, and benchmarking products across various hardware and software to understand tradeoffs. The engineer will collaborate with partners and the research team on hardware integration and system efficiency.
Must have:
  • Experience optimizing Pytorch for memory, latency, and throughput.
  • Experience using torch.compile / torch.XLA.
  • Experience benchmarking and profiling GPU & CPU code in Pytorch for optimal device utilization.
  • Experience building tools & abstractions for optimal model performance on different hardware and software stacks.
  • Experience working with transformer models and attention implementations.
  • Experience with parallel inference, particularly tensor parallelism and pipeline parallelism.
Good to have:
  • Experience with high-performance Triton/CUDA and writing custom PyTorch kernels and ops.
  • Experience writing high-performance parallel C++.
  • Experience building inference/demo prototype code (incl. Gradio, Docker etc.).
Perks:
  • Offers Equity

Job Details

Luma’s mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.

We are looking for engineers with significant experience maintaining & designing highly efficient systems and code that can be optimized to run on multiple hardware platforms, bringing our state-of-the-art models to as many people at the best performance per dollar.

Responsibilities

  • Ensure efficient implementation of models & systems with a focus on designing, maintaining, and writing abstractions that scale beyond NVIDIA/CUDA hardware.

  • Identify and remedy efficiency bottlenecks (memory, speed, utilization, communication) by profiling and implementing high-performance PyTorch code, deferring to Triton or similar kernel-level languages as necessary.

  • Benchmarking our products across a variety of hardware & software to help the product team understand the optimal tradeoffs between latency, throughput and cost at various degrees of parallelism.

  • Work together with our partners to help them identify bottlenecks and push forward new iterations of hardware and software.

  • Work closely together with the rest of the research team to ensure systems are planned to be as efficient as possible from start to finish and raise potential issues for hardware integration.

Must have experience

  • Experience optimizing for memory, latency and throughput in Pytorch.

    • Bonus: experience with non-NVIDIA systems

  • Experience using torch.compile / torch.XLA.

  • Experience benchmarking and profiling GPU & CPU code in Pytorch for optimal device utilization (examples: torch profiler, memory profilers, trace viewers, custom tooling).

  • Experience building tools & abstractions to ensure models run optimally on different hardware and software stacks .

  • Experience working with transformer models and attention implementations.

  • Experience with parallel inference, particularly with tensor parallelism, pipeline parallelism.

Good to have experience

  • Experience with high-performance Triton/CUDA and writing custom PyTorch kernels and ops. Top candidates will be able to write fused kernels for common hot paths, understand when to make use of lower level features like tensor cores or warp intrinsics, and will understand where these tools can be most impactful.

  • Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference code

  • Experience building inference / demo prototype code (incl. Gradio, Docker etc.)

Similar Jobs

Capgemini - MBD+Mscripting

Capgemini

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Rockstar Games - Senior Software Engineer (C#)

Rockstar Games

New York, New York, United States (On-Site)
11 Months ago
supercell - Senior Game Tech Programmer

supercell

Helsinki, Uusimaa, Finland (On-Site)
3 Months ago
Scout - Staff Software Engineer, Instrument Cluster UI

Scout

Fremont, California, United States (On-Site)
2 Months ago
grendel games - Unity game developer

grendel games

Leeuwarden, Friesland, Netherlands (Hybrid)
3 Months ago
Apple - Senior Machine Learning Applied Researcher

Apple

San Francisco, California, United States (On-Site)
3 Months ago
Apple - AIML - Machine Learning Engineer, Foundation Models

Apple

Cupertino, California, United States (On-Site)
3 Months ago
bytedance - Research Scientist in Large Multimodal Models Applications - San Diego

bytedance

San Diego, California, United States (On-Site)
9 Months ago
world resource institute - Consultant as User Research Analyst

world resource institute

Jakarta, Indonesia (On-Site)
4 Weeks ago
Moloco - Applied Scientist II - Moloco Ads

Moloco

Seattle, Washington, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Ludeo - Game SDK Engineer (C++)

Ludeo

Tel Aviv-Yafo, Tel Aviv District, Israel (Remote)
3 Weeks ago
Apple - Neural Engine HW Modeling Architect, Platform Architecture

Apple

Seattle, Washington, United States (On-Site)
2 Months ago
Capgemini - Android Middleware/Framework

Capgemini

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Sumo logic - Senior Software Engineer II - OpenTelemetry Collection

Sumo logic

Bengaluru, Karnataka, India (On-Site)
1 Month ago
PhonePe - Firmware Engineer

PhonePe

Bengaluru, Karnataka, India (On-Site)
2 Months ago
we are unseen  - Senior Gameplay Engineer

we are unseen

Tokyo, Japan (Hybrid)
1 Year ago
flying wild hog - Audio Programmer

flying wild hog

(Remote)
4 Months ago
Game freak - Game Programmer: Communication System [Leader]

Game freak

Chiyoda City, Tokyo, Japan (On-Site)
3 Months ago
Escape Velocity Entertainment - Animation Engineer

Escape Velocity Entertainment

(Remote)
3 Months ago
That's No Moon Entertainment - Senior Animation Engineer

That's No Moon Entertainment

Los Angeles, California, United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in United States

Flow - Construction Project Manager

Flow

Miami, Florida, United States (On-Site)
3 Months ago
zoox - Senior/Staff Software Engineer, Mission Planning

zoox

Foster City, California, United States (Hybrid)
2 Years ago
Sabre India - Financial Analyst II

Sabre India

Texas, United States (Hybrid)
1 Month ago
Super.com - Senior Corporate Accountant

Super.com

United States (Remote)
1 Month ago
DraftKings - Senior Analyst, Risk

DraftKings

Boston, Massachusetts, United States (On-Site)
1 Month ago
HCL Tech - Senior Technical Lead

HCL Tech

Florida, United States (On-Site)
3 Months ago
Rockstar Games - Senior Software Engineer (C#)

Rockstar Games

New York, New York, United States (On-Site)
1 Year ago
Lionsgate - Attorney, Business & Legal Affairs

Lionsgate

Santa Monica, California, United States (On-Site)
1 Month ago
Barracuda - Senior Product Marketing Manager

Barracuda

Chelmsford, Massachusetts, United States (On-Site)
5 Months ago
Scopely - Senior Product Analyst

Scopely

United States (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

Rippling - Senior Staff Machine Learning Engineer

Rippling

Seattle, Washington, United States (On-Site)
6 Months ago
Fieldguide - Senior Machine Learning Engineer

Fieldguide

San Francisco, California, United States (Remote)
1 Month ago
PayPal - Machine Learning Engineer

PayPal

San Jose, California, United States (Hybrid)
3 Weeks ago
Apple - AIML - Machine Learning Engineer, Model Evaluations

Apple

Cupertino, California, United States (On-Site)
3 Months ago
GoMotive - Software Engineer, Machine Learning

GoMotive

Pakistan (Remote)
5 Months ago
Deepgram - Voice AI Evaluation Lead

Deepgram

California, United States (Remote)
2 Months ago
appier - Machine Learning Scientist (Intern)

appier

Taipei City, Taiwan (Hybrid)
2 Months ago
Playtika - R&D Director

Playtika

Poland (Hybrid)
6 Months ago
Catina - Machine Learning Engineer, Memory

Catina

San Francisco, California, United States (On-Site)
6 Months ago
Honor - Senior AI Engineer

Honor

United States (Remote)
3 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Luma

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug