Multimodal Large Model Algorithm Intern
Tencent
Job Summary
This role involves conducting research and development in multimodal large model technologies, focusing on cross-modal alignment and understanding tasks to build industry-leading models. The intern will track state-of-the-art algorithms, participate in the design, training, optimization, and evaluation of these models, and promote their application in business scenarios within Tencent's Technology Engineering Group (TEG). TEG supports the company's technology and operational platforms, R&D management, and data centers, providing comprehensive customer services and leading infrastructure R&D.
Must Have
- Master’s degree or higher in Computer Science, Machine Learning, Artificial Intelligence, Applied Mathematics, or related fields.
- Solid research background in multimodal understanding (e.g., natural language processing, computer vision, speech understanding/generation).
- Familiarity with mainstream models and algorithms such as CLIP, LLaVA, VALL-E.
- Proficiency in deep learning frameworks like TensorFlow or PyTorch.
- Knowledge of distributed training frameworks (e.g., DeepSpeed, Megatron-LM) and practical experience in multi-node/multi-GPU distributed training.
- Strong engineering skills with proficiency in at least one programming language: C/C++, Java, or Python.
- Excellent learning ability, technical curiosity, and strong teamwork and communication skills.
Good to Have
- Publication record in top-tier conferences (e.g., ICLR, NeurIPS, CVPR, ICCV, ECCV, ACL, EMNLP) is preferred.
Job Description
Business Unit
Technology Engineering Group (TEG) is responsible for supporting the company and its business groups on technology and operational platforms, as well as the construction and operation of R&D management and data centers, TEG provides users with a full range of customer services. As the operator of the largest networking, devices, and data center in Asia,TEG also leads the Tencent Technology Committee in strengthening infrastructure R&D through internal and distributed open source collaboration, constructing new platforms and supporting business innovation.
What the Role Entails
- Conduct research and development of multimodal large model technologies, including cross-modal alignment and multimodal understanding tasks, to build industry-leading multimodal large models.
- Continuously track state-of-the-art algorithms in multimodal large models, participate in the design, training, optimization, and evaluation of these models, and promote their application in business scenarios.
Who We Look For
- Master’s degree or higher in Computer Science, Machine Learning, Artificial Intelligence, Applied Mathematics, or related fields.
- Solid research background in multimodal understanding (e.g., natural language processing, computer vision, speech understanding/generation), with familiarity in mainstream models and algorithms such as CLIP, LLaVA, VALL-E, etc..
- Proficiency in deep learning frameworks like TensorFlow or PyTorch; knowledge of distributed training frameworks (e.g., DeepSpeed, Megatron-LM) and practical experience in multi-node/multi-GPU distributed training.
- Strong engineering skills with proficiency in at least one programming language: C/C++, Java, or Python.
- Publication record in top-tier conferences (e.g., ICLR, NeurIPS, CVPR, ICCV, ECCV, ACL, EMNLP) is preferred.
- Excellent learning ability, technical curiosity, and strong teamwork and communication skills.