Research Scientist, GenAI - Multimodal Audio (Speech, Sound and Music)

10 Months ago • 3-5 Years • Sound Design • $177,000 PA - $251,000 PA

Job Summary

Job Description

The GenAI org at Meta builds industry leading LLM and multimodal generative foundation models, which sets the industry benchmark of open source foundation models and enables many Meta products. The team is working on the industrial leading research on multimodal generative foundation models with a focus on the audio modality (including speech, sound and music). The team is working closely with the language and the vision research teams, and is collaborating with product teams in bringing the results to benefit billions of Meta users around the world. Responsibilities: - Full life-cycle research on multimodal generative foundation models with a focus on the audio modality, including bringing up ideas - Designing and implementing models and algorithms - Collecting and selecting training data, training / tuning / scaling the models, evaluating the performance, open sourcing and publication - Work together with collaborating teams (e.g. language and vision) to leverage each other and deliver the high-level goals.

Must have:

Bachelor's degree in Computer Science, Computer Engineering, or relevant technical field
Solid track record of research in audio (speech, sound, or music) or vision (image or video) domains
PhD degree in related field with 3+ years of experience, or BS degree with 5+ years of industrial research experience
Proven knowledge in neural networks
Experienced in one of the following popular ML frameworks: Pytorch, Tensorflow, JAX
Experienced in Python programming language
Solid communication skills

Good to have:

Solid publication track record in related fields
Solid experience in either of the following: audio dataset curation, model scaling, audio generation model evaluation
Experienced in large-scale data processing
Experienced in solving complex problems involving trade-offs, alternative solutions, cross functional collaboration, taking into account diverse points of views

Perks:

Bonus
Equity
Benefits

9 skills required

9 skills required for this role

Add these skills to join the top 1% applicants for this job

neural-networks

tensorflow

unity

algorithms

python

pytorch

foundation

communication

game-texts

Job Details

Research Scientist, GenAI - Multimodal Audio (Speech, Sound and Music) Responsibilities

Full life-cycle research on multimodal generative foundation models with a focus on the audio modality, including bringing up ideas

Designing and implementing models and algorithms

Collecting and selecting training data, training / tuning / scaling the models, evaluating the performance, open sourcing and publication

Work together with collaborating teams (e.g. language and vision) to leverage each other and deliver the high-level goals.

Minimum Qualifications

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.

Solid track record of research in the audio (speech, sound, or music) or vision (image or video) domains. Can be publication records or unpublished industrial experience.

PhD degree in the related field with 3+ years of experience, or BS degree with 5+ years of industrial research experience in the related field.

Related research fields: audio (speech, sound, or music) generation, text-to-speech (TTS) synthesis, text-to-music generation, text-to-sound generation, speech recognition, speech / audio representation learning, vision perception, image / video generation, video-to-audio generation, audio-visual learning, audio language models, lip sync, lip movement generation / correction, lip reading, etc.

Proven knowledge in neural networks.

Experienced in one of the following popular ML frameworks: Pytorch, Tensorflow, JAX.

Experienced in Python programming language.

Solid communication skills.

Preferred Qualifications

Solid publication track record in related fields.

Solid experience in either of the following: audio dataset curation, model scaling, audio generation model evaluation.

Experienced in large-scale data processing.

Experienced in solving complex problems involving trade-offs, alternative solutions, cross functional collaboration, taking into account diverse points of views.

For those who live in or expect to work from California if hired for this position, please click for additional information.

Locations

Use Ctrl and scroll to zoom the map

Zoom in

Zoom out

Re-centre

Data Center

About Meta

Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today—beyond the constraints of screens, the limits of distance, and even the rules of physics.

$177,000/year to $251,000/year + bonus + equity + benefits

Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about at Meta.

Equal Employment Opportunity and Affirmative Action

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice .

Meta is committed to providing reasonable support (called accommodations) in our recruiting processes for candidates with disabilities, long term conditions, mental health conditions or sincerely held religious beliefs, or who are neurodivergent or require pregnancy-related support. If you need support, please reach out to .

Similar Jobs

Game Designer

Playrix

Georgia (Remote)

• 10 Months ago

Machine Learning Engineer Intern (Global E-commerce Risk Control) - 2025 Summer (MS)

ByteDance

San Jose, California, United States (On-Site)

• 10 Months ago

Feature Owner (LiveOps)

Playrix

Georgia (Remote)

• 10 Months ago

Software Engineer, Large User Models, Core Machine Learning

Google

Mountain View, California, United States (On-Site)

• 10 Months ago

Senior PySpark Data Engineer

Luxoft

(Remote)

• 9 Months ago

Audio Drivers Developer

Luxoft

(Remote)

• 9 Months ago

Podcast Director

Take One School Of Mass Communication

Mumbai, Maharashtra, India (On-Site)

• 11 Months ago

Senior Audio Programmer

IO Interactive

Copenhagen, Denmark (Hybrid)

• 11 Months ago

Senior Technical Audio Designer

Epic Games

(On-Site)

• 1 Year ago

サウンドディレクター｜Sound Director

Light Speed Studios

Osaka, Osaka, Japan (On-Site)

• 10 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Software Engineering Manager, Creative Camera

Google

New York, New York, United States (On-Site)

• 10 Months ago

2D UI/UX Artist (match-3 project)

G5 Games

Tbilisi, Tbilisi, Georgia (Remote)

• 10 Months ago

Data Scientist

Sabre India

Bengaluru, Karnataka, India (On-Site)

• 9 Months ago

Director of AI Engineering

Wargaming

Berlin, Berlin, Germany (On-Site)

• 9 Months ago

Data Scientist

10times

Bengaluru, Karnataka, India (On-Site)

• 10 Months ago

Senior/Lead Machine Learning and Image Processing Specialist

Luxoft

Italy, New York, United States (Remote)

• 9 Months ago

Senior Data Scientist

Morning Star

Mumbai, Maharashtra, India (Hybrid)

• 11 Months ago

Feature Owner (LiveOps)

Playrix

Cyprus (Remote)

• 10 Months ago

Research Scientist, GenAI - Multimodal Audio (Speech, Sound and Music)

Artificial Intelligence - JBU

Hitachi

Chennai, Tamil Nadu, India (On-Site)

• 10 Months ago

Get notifed when new similar jobs are uploaded

Jobs in New York City, New York, USA

AI Operations Specialist - Housing

Meetelise

New York, New York, United States (On-Site)

• 10 Months ago

Researcher - Interdisciplinary - New York

ByteDance

New York, New York, United States (On-Site)

• 10 Months ago

Technical Sound Designer II - League of Legends, Creative Expressions

Riot Games

Los Angeles, California, United States (On-Site)

• 11 Months ago

IT Hardware Support Engineer

Life church

Edmond, Oklahoma, United States (On-Site)

• 11 Months ago

Research Scientist, Vision Foundation Model

ByteDance

San Jose, California, United States (On-Site)

• 10 Months ago

Open Career Opportunities, Autonomous (Self-Driving) Vehicle Jobs, Waymo

Google

Phoenix, Arizona, United States (On-Site)

• 10 Months ago

Sr. Solution Architect - Virtualization

Fluence

Houston, Texas, United States (Hybrid)

• 11 Months ago

Director, Global Franchise Strategy

Hasbro

United States (On-Site)

• 9 Months ago

Sales Associate

Trek

Leesburg, Virginia, United States (On-Site)

• 11 Months ago

Splunk Developer

Next Level Business Services

San Diego, California, United States (On-Site)

• 10 Months ago

Get notifed when new similar jobs are uploaded

Sound Design Jobs

Senior Audio Programmer

IO Interactive

İstanbul, Türkiye (Hybrid)

• 11 Months ago

Technical Sound Designer (Senior/Principal)

Bonfire Studios

California, United States (Hybrid)

• 10 Months ago

Audio Artist - NHL

Electronic Arts

Vancouver, British Columbia, Canada (On-Site)

• 1 Year ago

Senior Audio Programmer

IO Interactive

Malmö, Skåne County, Sweden (Hybrid)

• 11 Months ago

[5minlab] Sound Designer

5minlab

Seoul, South Korea (On-Site)

• 10 Months ago

Senior Sound Designer

Gearbox Entertainment

Frisco, Texas, United States (Hybrid)

• 11 Months ago

Electroacoustic Trainee

Logitech

Suzhou, Jiangsu, China (On-Site)

• 10 Months ago

DSP Engineer - Audio

Qualcomm

Hyderabad, Telangana, India (On-Site)

• 11 Months ago

Composer

Playtech

(On-Site)

• 10 Months ago

Voice-over Coordinator

PublicisGroupe

(On-Site)

• 10 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Get notified when new jobs are added by Meta

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Research Scientist, GenAI - Multimodal Audio (Speech, Sound and Music)

Job Summary

Job Description

9 skills required

9 skills required for this role

Job Details

Similar Jobs

Game Designer

Machine Learning Engineer Intern (Global E-commerce Risk Control) - 2025 Summer (MS)

Feature Owner (LiveOps)

Software Engineer, Large User Models, Core Machine Learning

Senior PySpark Data Engineer

Audio Drivers Developer

Podcast Director

Senior Audio Programmer

Senior Technical Audio Designer

サウンドディレクター｜Sound Director

Similar Skill Jobs

Software Engineering Manager, Creative Camera

2D UI/UX Artist (match-3 project)

Data Scientist

Director of AI Engineering

Data Scientist

Senior/Lead Machine Learning and Image Processing Specialist

Senior Data Scientist

Feature Owner (LiveOps)

Research Scientist, GenAI - Multimodal Audio (Speech, Sound and Music)

Artificial Intelligence - JBU

Jobs in New York City, New York, USA

AI Operations Specialist - Housing

Researcher - Interdisciplinary - New York

Technical Sound Designer II - League of Legends, Creative Expressions

IT Hardware Support Engineer

Research Scientist, Vision Foundation Model

Open Career Opportunities, Autonomous (Self-Driving) Vehicle Jobs, Waymo

Sr. Solution Architect - Virtualization

Director, Global Franchise Strategy

Sales Associate

Splunk Developer

Sound Design Jobs

Senior Audio Programmer

Technical Sound Designer (Senior/Principal)

Audio Artist - NHL

Senior Audio Programmer

[5minlab] Sound Designer

Senior Sound Designer

Electroacoustic Trainee

DSP Engineer - Audio

Composer

Voice-over Coordinator

About The Company

Data Scientist, Product Analytics

Data Scientist, Product Analytics

Product Manager

Production Engineering

Production Engineering

Production Engineering

Software Engineer (Leadership) - Machine Learning

Data Science Director

Data Science Director

Manager, Production Engineering

Level Up Your Career in Game Development!