Research Scientist / Engineer – Multimodal Capabilities

14 Minutes ago • All levels • Research Development • $200,000 PA - $300,000 PA

Job Summary

Job Description

The Multimodal Capabilities team at Luma focuses on unlocking advanced capabilities in our foundation models through strategic research into multimodal understanding and generation. This team tackles fundamental research questions around how different modalities can be combined to enable new behaviors and capabilities, working on the open-ended challenges of what makes multimodal AI systems truly powerful and versatile.
Must have:
  • Identify capability gaps and research solutions
  • Design datasets, experiments, and methodologies to systematically improve model capabilities across vision, audio, and language
  • Develop evaluation frameworks and benchmarking approaches for multimodal AI capabilities
  • Create prototypes and demonstrations that showcase new multimodal capabilities
  • Strong programming skills in Python and PyTorch
  • Experience with multimodal data processing pipelines and large-scale dataset curation
  • Understanding of computer vision, audio processing, and / or natural language processing techniques
Good to have:
  • Expertise working with interleaved multimodal data
  • Hands-on experience with Vision Language Models, Audio Language Models, or generative video models
Perks:
  • Offers Equity

Job Details

About the Role

The Multimodal Capabilities team at Luma focuses on unlocking advanced capabilities in our foundation models through strategic research into multimodal understanding and generation. This team tackles fundamental research questions around how different modalities can be combined to enable new behaviors and capabilities, working on the open-ended challenges of what makes multimodal AI systems truly powerful and versatile.

Responsibilities

  • Identify capability gaps and research solutions
  • Design datasets, experiments, and methodologies to systematically improve model capabilities across vision, audio, and language
  • Develop evaluation frameworks and benchmarking approaches for multimodal AI capabilities
  • Create prototypes and demonstrations that showcase new multimodal capabilities

Experience

  • Strong programming skills in Python and PyTorch
  • Experience with multimodal data processing pipelines and large-scale dataset curation
  • Understanding of computer vision, audio processing, and / or natural language processing techniques
  • (Preferred) Expertise working with interleaved multimodal data
  • (Preferred) Hands-on experience with Vision Language Models, Audio Language Models, or generative video models

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in United States

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Research Development Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Luma

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug