Multimodal Large Model Algorithm Engineer

Tencent

Job Summary

The Technology Engineering Group (TEG) at Tencent supports the company's technology and operational platforms, R&D management, and data centers, providing comprehensive customer services. As the operator of Asia's largest networking and data center, TEG also leads infrastructure R&D. This role involves researching and developing industry-leading multimodal large model technologies, including cross-modal alignment and understanding tasks. The engineer will track state-of-the-art algorithms, participate in model design, training, optimization, and evaluation, and promote their application in business scenarios.

Must Have

  • Master’s degree or higher in Computer Science, Machine Learning, Artificial Intelligence, Applied Mathematics, or related fields
  • Solid research background in multimodal understanding (e.g., natural language processing, computer vision, speech understanding/generation)
  • Familiarity in mainstream models and algorithms such as CLIP, LLaVA, VALL-E
  • Proficiency in deep learning frameworks like TensorFlow or PyTorch
  • Knowledge of distributed training frameworks (e.g., DeepSpeed, Megatron-LM) and practical experience in multi-node/multi-GPU distributed training
  • Strong engineering skills with proficiency in at least one programming language: C/C++, Java, or Python

Good to Have

  • Publication record in top-tier conferences (e.g., ICLR, NeurIPS, CVPR, ICCV, ECCV, ACL, EMNLP) is preferred

Job Description

Business Unit

Technology Engineering Group (TEG) is responsible for supporting the company and its business groups on technology and operational platforms, as well as the construction and operation of R&D management and data centers, TEG provides users with a full range of customer services. As the operator of the largest networking, devices, and data center in Asia,TEG also leads the Tencent Technology Committee in strengthening infrastructure R&D through internal and distributed open source collaboration, constructing new platforms and supporting business innovation.

What the Role Entails

  • Conduct research and development of multimodal large model technologies, including cross-modal alignment and multimodal understanding tasks, to build industry-leading multimodal large models.
  • Continuously track state-of-the-art algorithms in multimodal large models, participate in the design, training, optimization, and evaluation of these models, and promote their application in business scenarios.

Who We Look For

  • Master’s degree or higher in Computer Science, Machine Learning, Artificial Intelligence, Applied Mathematics, or related fields.
  • Solid research background in multimodal understanding (e.g., natural language processing, computer vision, speech understanding/generation), with familiarity in mainstream models and algorithms such as CLIP, LLaVA, VALL-E, etc..
  • Proficiency in deep learning frameworks like TensorFlow or PyTorch; knowledge of distributed training frameworks (e.g., DeepSpeed, Megatron-LM) and practical experience in multi-node/multi-GPU distributed training.
  • Strong engineering skills with proficiency in at least one programming language: C/C++, Java, or Python.
  • Publication record in top-tier conferences (e.g., ICLR, NeurIPS, CVPR, ICCV, ECCV, ACL, EMNLP) is preferred.
  • Excellent learning ability, technical curiosity, and strong teamwork and communication skills.

13 Skills Required For This Role

Team Management Communication Cpp Game Texts Networking Pytorch Deep Learning Computer Vision Python Algorithms Tensorflow Java Machine Learning

Similar Jobs