Multimodal Reinforcement Learning Post-Training Algorithm Expert

Tencent

Job Summary

This role focuses on multimodal reinforcement learning post-training algorithms. The expert will bridge algorithm and framework teams, translating advanced post-training principles (RLHF, DPO) for large models into framework requirements. Responsibilities include optimizing and evaluating post-training pipelines for stability, efficiency, and generalization, with a focus on cross-modal alignment and reward function design. The role also involves researching cutting-edge advancements, troubleshooting training bottlenecks, and collaborating on solutions. Efficient cross-team support, technical documentation, and knowledge sharing are essential to enhance team expertise.

Must Have

  • Master's degree or higher in Computer Science, Artificial Intelligence, Electronic Engineering, Automation, or related fields.
  • Solid foundation in machine learning/deep learning.
  • Deep understanding of multimodal large models and reinforcement learning post-training technology stack.
  • Proficiency in Python programming and familiarity with deep learning frameworks like PyTorch.
  • Deep understanding of model architectures such as Transformer and Diffusion.
  • Thorough comprehension of SFT, RLHF, and DPO post-training algorithms.
  • Strong engineering implementation and debugging skills.
  • Capable of rapidly validating algorithmic ideas and conducting rigorous experimental analysis.
  • Familiarity with at least one mainstream large model training/inference framework (e.g., Megatron-LM, DeepSpeed, VLLM).
  • Excellent cross-team communication skills.
  • Strong sense of responsibility, self-motivation, and passion for solving complex problems.

Good to Have

  • Experience with post-training frameworks like VERL or OpenRLHF is a plus.

Job Description

Business Unit

Technology Engineering Group (TEG) is responsible for supporting the company and its business groups on technology and operational platforms, as well as the construction and operation of R&D management and data centers, TEG provides users with a full range of customer services. As the operator of the largest networking, devices, and data center in Asia,TEG also leads the Tencent Technology Committee in strengthening infrastructure R&D through internal and distributed open source collaboration, constructing new platforms and supporting business innovation.

What the Role Entails

  • Algorithm-Framework Co-design: Act as a technical bridge between the algorithm and framework teams. Deeply understand the principles and evolution trends of post-training algorithms for multimodal large models (e.g., RLHF, DPO, Curriculum Reinforcement Learning) and translate these into functional requirements for the underlying frameworks, providing insights for framework architecture design
  • Training Pipeline Optimization and Evaluation: Lead or deeply participate in the setup, optimization, and effectiveness evaluation of post-training pipelines (e.g., multimodal SFT, RLHF). Focus on training stability, efficiency, and generalization capability, particularly proposing systematic improvements for areas like cross-modal alignment, reward function design, and policy optimization
  • Technical Research and Bottleneck Resolution: Proactively track cutting-edge advancements in multimodal reinforcement learning post-training from academia and industry. Perform root cause analysis for training bottlenecks (e.g., insufficient OOD generalization, modality fusion conflicts) and collaborate with the framework team to develop and implement solutions
  • Cross-team Support and Knowledge Sharing: Collaborate efficiently with framework development, hardware optimization, and business algorithm teams to ensure the implementation of technical solutions. Produce high-quality technical documentation, design drafts, and experimental reports. Organize internal sharing sessions to enhance the overall technical expertise of the team

Who We Look For

  • Education and Technical Background: A Master's degree or higher in Computer Science, Artificial Intelligence, Electronic Engineering, Automation, or related fields. A solid foundation in machine learning/deep learning, with a deep understanding of multimodal large models and the reinforcement learning post-training technology stack

Core Algorithm and Engineering Skills:

  • Proficiency in Python programming and familiarity with deep learning frameworks like PyTorch.
  • Deep understanding of model architectures such as Transformer and Diffusion
  • Thorough comprehension of the principles, processes, and common challenges (e.g., training instability, reward hacking) of post-training algorithms like SFT, RLHF, and DPO
  • Strong engineering implementation and debugging skills, capable of rapidly validating algorithmic ideas and conducting rigorous experimental analysis for performance evaluation

Framework Collaboration and System Perspective:

  • Familiarity with at least one mainstream large model training/inference framework (e.g., Megatron-LM, DeepSpeed, VLLM) and an understanding of their architectural design principles
  • Ability to assess framework usability, scalability, and performance from an algorithmic perspective and propose improvement suggestions. Experience with post-training frameworks like VERL or OpenRLHF is a plus
  • Soft Skills: Excellent cross-team communication skills, able to clearly translate requirements and articulate solutions between algorithm and engineering teams. A strong sense of responsibility, self-motivation, and passion for solving complex problems

10 Skills Required For This Role

Communication Problem Solving Game Texts Networking Pytorch Deep Learning Reinforcement Learning Python Algorithms Machine Learning

Similar Jobs