Vision Researcher – Multimodal Understanding & Generation in Foundation Models

19 Minutes ago • All levels • $122,500 PA - $229,700 PA
Research Development

Job Description

Tencent is seeking a Vision Researcher to drive cutting-edge research in native multimodal foundation models, focusing on novel architecture design and modeling for 2D+time and 3D+time scenarios. The role involves exploring large model training for understanding and generating physical world representations, multimodal reasoning, and self-evolving continual learning. The ideal candidate will be a domain expert in computer vision, collaborate with other modalities, stay updated with industry advancements, and contribute impactful research to open-source communities or internal product teams. A Master’s or Ph.D. in Computer Science or related field, proven multimodal research experience, and proficiency in open-source tools are essential.
Good To Have:
  • Candidates with influential GitHub projects or contributions to high-impact open-source communities are preferred
Must Have:
  • Serve as a domain expert in computer vision
  • Collaborate with researchers from other modalities
  • Drive cutting-edge research in native multimodal foundation models
  • Explore training and design of large models for understanding and generating representations of the physical world
  • Stay up to date with the latest advancements in academia and industry
  • Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field
  • Proven multi-modal research experience with a strong publication record in top-tier conferences or journals
  • Proficiency with mainstream open-source tools and frameworks relevant to the field
  • Strong engineering skills to support research implementation
Perks:
  • Medical benefits
  • Dental benefits
  • Vision benefits
  • Life and disability benefits
  • Participation in the Company’s 401(k) plan
  • Up to 15 to 25 days of vacation per year (depending on tenure)
  • Up to 13 days of holidays throughout the calendar year
  • Up to 10 days of paid sick leave per year
  • Sign-on payment (evaluated on a case-by-case basis)
  • Relocation package (evaluated on a case-by-case basis)
  • Restricted stock units (evaluated on a case-by-case basis)

Add these skills to join the top 1% applicants for this job

communication
github
game-texts
computer-vision
machine-learning

Business Unit

What the Role Entails

Responsibilities:

1. Serve as a domain expert in computer vision and collaborate with researchers from other modalities to drive cutting-edge research in native multimodal foundation models, including novel architecture design and modeling for “2D + time” and “3D + time” scenarios.

2. Explore the training and design of large models for understanding and generating representations of the physical world, multimodal reasoning, and self-evolving continual learning.

3. Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops, and engage with leading global research teams.

4. Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams.

Who We Look For

Qualifications:

1. Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field.

2. Proven multi-modal research experience in relevant areas, with familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML.

3. Proficiency with mainstream open-source tools and frameworks relevant to the field, and strong engineering skills to support research implementation; candidates with influential GitHub projects or contributions to high-impact open-source communities are preferred.

4. Strong team spirit and ability to collaborate across disciplines, excellent communication skills, intellectual curiosity, and a goal-oriented, problem-solving mindset.

The expected base pay range for this position in the location(s) listed above is $122,500.00 to $229,700.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis. Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year. Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year.

Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee feels supported and inspired to achieve individual and common goals.

Who we are

Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life for people around the world.

Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee feels supported and inspired to achieve individual and common goals.

Set alerts for more jobs like Vision Researcher – Multimodal Understanding & Generation in Foundation Models
Set alerts for new jobs by Tencent
Set alerts for new Research Development jobs in United States
Set alerts for new jobs in United States
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙