Research Internship- Multimodal LLM (Speech/Music/Audio/Vision/Language)

23 Minutes ago • All levels • $56,160 PA - $120,016 PA
Research Development

Job Description

Tencent AI Lab at Seattle Area is seeking Research Interns for 2026 to work on Multimodal LLMs, focusing on speech, music, audio, vision, and language processing. Interns will develop cutting-edge techniques for large multimodal models, including pretraining/post-training strategies, efficient architectures, and enhanced memory/reasoning. The role involves solving challenging real-world problems, publishing research results, and contributing to the advancement of Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI).
Good To Have:
  • Good publication track records and history of creativity and intellectual flexibility.
Must Have:
  • Ph.D. students in computer science, electrical engineering, mathematics or a related field.
  • Self-motivated and excited about developing novel techniques.
  • Research experiences in natural language processing, speech, audio, and music processing, computer vision, dialog system, or machine learning.
  • Program skillfully in Python and/or C++.
  • Experiences in using one of the leading deep learning toolkits.
Perks:
  • 1 hour of paid sick leave for every 30 hours worked.
  • Up to 13 paid holidays throughout the calendar year.
  • Eligibility to enroll in Company-sponsored medical plan for full-time interns.

Add these skills to join the top 1% applicants for this job

cpp
game-texts
deep-learning
computer-vision
python
machine-learning

Business Unit

What the Role Entails

About Tencent AI Lab at Seattle Area

Tencent is a leading internet company in China. Tencent AI Lab at Seattle Area was established in May 2017. The lab strives to continuously improve AI's capability in perception, cognition, and creativity. Researchers there aim at solving challenging real-world problems with advanced technologies and publish extensively at top conferences and journals.

Research Internship: Multimodal LLM (Speech/Music/Audio/Vision/Language)

Tencent AI Lab is dedicated to advancing cutting-edge AI technologies, with a particular focus on innovative breakthroughs in large foundation models. The lab's long-term ambition is to drive the development of Artificial General Intelligence (AGI), and ultimately, Artificial Superintelligence (ASI). We are seeking research interns who are interested in developing novel speech/music/audio/vision/language processing techniques and large multimodal models for our Seattle area office located at Bellevue WA for the year 2026.

Every research intern will work with researchers on a research project aimed at attacking one of the core problems by inventing cutting edge techniques. We encourage discussions and collaborations between researchers and interns. Interns are also encouraged to publish the results from the internship. Our projects span a wide range of areas, including developing more effective multimodal pretraining and post-training strategies for audio, speech, music, image, and video understanding and generation. We aim to enable fully duplex conversations, design more efficient large-model architectures, enhance multimodal memory and reasoning capabilities, and advance novel audio, speech, music, image, and video processing techniques—such as encoding, tokenization, and representation learning—with a focus on multimodal applications and end-to-end large models.

Who We Look For

Requirements & Qualifications

The ideal intern candidates are those who

  • are Ph.D. students in computer science, electrical engineering, mathematics or a related field,
  • are self-motivated and excited about developing novel techniques,
  • have research experiences in natural language processing, speech, audio, and music processing, computer vision, dialog system, or machine learning,
  • have good publication track records and history of creativity and intellectual flexibility,
  • can program skillfully in Python and/or C++ and have experiences in using one of the leading deep learning toolkits.
  • Intern duration: 3 months (with the possibility of extension). Can start any time in the year 2026.

Location State(s)

US-Washington-Bellevue

The expected base pay range for this position in the location(s) listed above is $27.00 to $57.70 per hour. Actual pay may vary depending on job-related knowledge, skills, and experience. This position will be eligible for 1 hour of paid sick leave for every 30 hours worked and up to 13 paid holidays throughout the calendar year. Subject to the terms and conditions of the applicable plans then in effect, full-time interns are also eligible to enroll in the Company-sponsored medical plan.

Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Who we are

Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life for people around the world.

Read More

Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Read More

Set alerts for more jobs like Research Internship- Multimodal LLM (Speech/Music/Audio/Vision/Language)
Set alerts for new jobs by Tencent
Set alerts for new Research Development jobs in United States
Set alerts for new jobs in United States
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙