Research Scientist - Speech & Audio Understanding (Speech Generation)

19 Minutes ago • All levels • $149,000 PA - $279,800 PA
Audio

Job Description

Tencent is seeking a Research Scientist for Speech & Audio Understanding, focusing on Speech Generation. The role involves tracking cutting-edge research in speech generation algorithms, exploring next-generation paradigms, and investigating multimodal voice foundation models to enhance voice interaction experiences. Responsibilities include leading technical R&D for voice foundation models, driving performance improvements, and innovative applications. Candidates should have a Master’s or Ph.D. in a relevant field, experience in voice foundation models or related areas, familiarity with mainstream voice-enabled large models, proficiency in deep learning frameworks, and a solid understanding of large model architectures.
Good To Have:
  • Prior project experience with mainstream voice-enabled large models (e.g., GPT4o, GLM-4-Voice, Qwen2.5-Omni, Voila).
  • Experience with large-scale model training frameworks (Megatron/Deepspeed).
  • Experience in large-scale pretraining or post-training.
Must Have:
  • Track the latest research in speech generation algorithms, explore next-generation paradigms for speech/audio generation, and push the boundaries of speech generation capabilities.
  • Investigate cutting-edge multimodal voice foundation model technologies to enhance voice interaction experiences by integrating text, speech, and vision.
  • Lead the technical R&D of voice foundation models, driving model performance improvements and innovative applications.
  • Master’s or Ph.D. in Computer Science, Artificial Intelligence, Electronic Engineering, Signal Processing, or related fields.
  • Research or development experience in one or more areas: voice foundation models, speech synthesis, speech recognition, audio generation, voice conversion, or speech codec.
Perks:
  • Sign on payment
  • Relocation package
  • Restricted stock units
  • Medical benefits
  • Dental benefits
  • Vision benefits
  • Life and disability benefits
  • Participation in the Company’s 401(k) plan
  • Up to 15 to 25 days of vacation per year
  • Up to 13 days of holidays throughout the calendar year
  • Up to 10 days of paid sick leave per year

Add these skills to join the top 1% applicants for this job

game-texts
pytorch
deep-learning
algorithms
glm

Business Unit

What the Role Entails

Job Responsibilities:

1. Track the latest research in speech generation algorithms, explore next-generation paradigms for speech/audio generation, and push the boundaries of speech generation capabilities.

2. Investigate cutting-edge multimodal voice foundation model technologies to enhance voice interaction experiences by integrating text, speech, and vision.

3. Lead the technical R&D of voice foundation models, driving model performance improvements and innovative applications.

Who We Look For

Job Requirements:

1. Master’s or Ph.D. in Computer Science, Artificial Intelligence, Electronic Engineering, Signal Processing, or related fields.

2. Research or development experience in one or more areas: voice foundation models, speech synthesis, speech recognition, audio generation, voice conversion, or speech codec.

3. Familiarity with mainstream voice-enabled large models (e.g., GPT4o, GLM-4-Voice, Qwen2.5-Omni, Voila). Prior project experience is preferred.

4. Proficient in deep learning frameworks (e.g., PyTorch). Experience with large-scale model training frameworks (Megatron/Deepspeed) is a plus.

5. Solid understanding of large model architectures and principles. Experience in large-scale pretraining or post-training is preferred.

The expected base pay range for this position in the location(s) listed above is $149,000.00 to $279,800.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis. Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year. Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year.

Set alerts for more jobs like Research Scientist - Speech & Audio Understanding (Speech Generation)
Set alerts for new jobs by Tencent
Set alerts for new Audio jobs in United States
Set alerts for new jobs in United States
Set alerts for Audio (Remote) jobs
Contact Us
hello@outscal.com
Made in INDIA 💛💙