Research Scientist / Engineer – Data

Luma

| Palo Alto, California, United States (Remote) | Full Time | 1 day ago

Apply Now

Job Summary

This role focuses on leveraging data to enhance Luma's foundation models, enabling advanced multimodal AI capabilities. Responsibilities include identifying capability gaps, designing datasets for model improvement across vision, audio, and language, developing evaluation frameworks, and creating prototypes to demonstrate new multimodal functionalities.

Must Have

Identify capability gaps and research solutions
Design datasets and data-mixture ablations to systematically improve model capabilities across vision, audio, and language
Develop evaluation frameworks and benchmarking approaches for multimodal AI capabilities
Create prototypes and demonstrations that showcase new multimodal capabilities
Strong programming skills in Python and PyTorch
Experience with large-scale dataset
Experience with multimodal data processing pipeline
Understanding of computer vision, audio processing, and / or natural language processing techniques

Good to Have

Expertise working with interleaved multimodal data
Hands-on experience with Vision Language Models, Audio Language Models, or generative video models

Job Description

About the Role

Data is a fundamental layer in Luma that unlocks advanced capabilities in our foundation models. We tackle the fundamental data questions around how different modalities can be combined to enable new behaviors and capabilities, working on the open-ended challenges of what makes multimodal AI systems truly powerful and versatile**.

Responsibilities

Identify capability gaps and research solutions
Design datasets and data-mixture ablations to systematically improve model capabilities across vision, audio, and language
Develop evaluation frameworks and benchmarking approaches for multimodal AI capabilities
Create prototypes and demonstrations that showcase new multimodal capabilities

Experience

Strong programming skills in Python and PyTorch
Experience with large-scale dataset
Experience with multimodal data processing pipeline
Understanding of computer vision, audio processing, and / or natural language processing techniques
(Preferred) Expertise working with interleaved multimodal data
(Preferred) Hands-on experience with Vision Language Models, Audio Language Models, or generative video models

5 Skills Required For This Role

Game Texts Prototyping Pytorch Computer Vision Python