AI Research Intern, TAO Multi-Modal Model Development - 2026

NVIDIA

| Hanoi, Vietnam (On Site) | Full Time | 1 day ago

Apply Now

Job Summary

Our team is seeking to extend the internship of our current AI Research Intern for the TAO Multi-Modal Model Development project, recognizing their exceptional performance and strong alignment with the team’s research goals. This extension provides further growth opportunities for the intern and strengthens our team’s capacity to develop scalable, high-impact AI solutions in the rapidly evolving field of multi-modal AI. As an AI Research Intern at NVIDIA in Hanoi/HCM City, Vietnam, you will advance cutting-edge machine learning research, collaborating on state-of-the-art deep learning models for image segmentation, cross-modal understanding, and universal representation learning, contributing to next-generation AI systems with real-world impact.

Must Have

Develop and fine-tune multi-modal AI models using NVIDIA’s TAO Toolkit and deep learning frameworks
Contribute to the design and implementation of vision-language models (VLMs) and universal segmentation systems
Conduct experiments and benchmarking to evaluate model accuracy, robustness, and scalability
Collaborate with cross-functional teams to integrate research into production-level pipelines and NVIDIA SDKs
Participate in research discussions, code reviews, and technical documentation
Currently pursuing a degree in Computer Science, Computer Engineering, or a related field
Proven experience with machine learning, deep learning, or computer vision model development
Strong Python programming skills and proficiency with PyTorch or similar frameworks
Solid understanding of neural network architectures, transformers, and multi-modal learning techniques
Excellent problem-solving abilities, attention to detail, and a collaborative mindset

Good to Have

Familiarity with vision-language models
Familiarity with image segmentation
Familiarity with large-scale pretraining

Perks & Benefits

Highly competitive salaries
Comprehensive benefits package
Two free days each quarter to recharge

Job Description

Our team is seeking to extend the internship of our current AI Research Intern for the TAO (Train, Adapt, Optimize) Multi-Modal Model Development project, recognizing their exceptional performance and strong alignment with the team’s research goals. Their innovative ideas and technical contributions have significantly enhanced our work. Given the rapidly evolving field of multi-modal AI, encompassing vision-language modeling, universal segmentation, and large-scale model training, extending this internship will provide further growth opportunities for the intern while strengthening our team’s capacity to develop scalable, high-impact AI solutions.

Embark on an exciting journey with NVIDIA, a global leader in AI and accelerated computing. As an AI Research Intern focusing on multi-modal AI and vision-language model development within the TAO framework in Hanoi/HCM City, Vietnam, you will be at the forefront of advancing cutting-edge machine learning research. You’ll collaborate with a talented team of engineers and researchers dedicated to developing state-of-the-art deep learning models for tasks such as image segmentation, cross-modal understanding, and universal representation learning. This internship offers a unique opportunity to contribute to next-generation AI systems with real-world impact across industries—from autonomous vehicles to intelligent content understanding.

What you'll be doing:

Develop and fine-tune multi-modal AI models using NVIDIA’s TAO Toolkit and deep learning frameworks.
Contribute to the design and implementation of vision-language models (VLMs) and universal segmentation systems.
Conduct experiments and benchmarking to evaluate model accuracy, robustness, and scalability.
Collaborate with cross-functional teams to integrate your research into production-level pipelines and NVIDIA SDKs.
Participate in research discussions, code reviews, and technical documentation to share insights and improve methodologies.

What we need to see:

Currently pursuing a degree in Computer Science, Computer Engineering, or a related field.
Proven experience with machine learning, deep learning, or computer vision model development.
Strong Python programming skills and proficiency with PyTorch or similar frameworks.
Solid understanding of neural network architectures, transformers, and multi-modal learning techniques.
Excellent problem-solving abilities, attention to detail, and a collaborative mindset.
Familiarity with vision-language models, image segmentation, or large-scale pretraining is a strong plus.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family [](https://www.nvidiabenefits.com/)

www.nvidiabenefits.com/

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.