Research Intern - Training-Time Provenance (Data Dignity)

1 Month ago • 2 Years + • Research & Development • $65,520 PA - $128,800 PA

Job Summary

Job Description

This research internship focuses on training-time provenance for Large Language Models (LLMs), aiming to understand and estimate the influence of specific training data on model outputs. The research addresses "data dignity" by exploring ways to incentivize and recognize data contributors. Responsibilities include training small LLMs with novel provenance-preserving schemes, experimenting with model performance and reliability, and collaborating with researchers. The internship involves contributing to the development of methods to 'X-ray' model intent to detect malicious activity and promote ethical AI practices. Candidates should possess a strong background in deep learning, natural language processing, and generative models.
Must have:
  • PhD in CS or related STEM field
  • 2+ years research experience
  • Peer-reviewed publications
  • Experience with NLP, deep learning, generative models
  • Training small LLMs with novel schemes
  • Experimenting with model performance and reliability
Good to have:
  • Experience training large AI models
  • Experience in approximation methods for deep learning
  • Ability to develop original research agendas
  • Collaboration skills
Perks:
  • Industry-leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Job Details

Overview

Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment.

Training-time provenance is a research effort on estimating the influence of specific training data on outputs of large language models (LLMs). Current neural network architectures are opaque in terms of providing sources for their generations, and there are at least two good reasons to change this:

  1. “X-ray” into intent, so that we can detect bad human actors or dangerous AI activity by identifying the most influential source documents related to a given model output. For instance, sneaky prompts might invoke articles about bomb making that could evade guardrails otherwise. This will be a deeper method of countering this type of danger than others currently in use.
  2. “Data dignity”, meaning incentives, recognition, and potentially pay for people who contribute certain valuable data to unforeseen kinds of models we will want in the future, assuming the future will surprise us fundamentally. The goal is to foster new classes of creative professionals where possible, instead of relying solely on ideas like Universal Basic Income in the event of a future with very high-functioning large models. 

We are attempting to demonstrate that LLMs can be trained in such a way that influence of specific training data on generated outputs can be efficiently and usefully estimated. You can read more about “Data dignity” in the article: There is no A.I. (The New Yorker).

Qualifications

Required Qualifications

  • Currently enrolled in a PhD program in Computer Science or a related STEM field. Exceptional candidates enrolled in a master’s program might also be considered.
  • At have at least 2 years of research experience, including peer-reviewed publications, researching a topic closely related to the above description, such as natural language processing, deep learning, generative models, approximation methods, etc.

Other Requirements

  • Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship.
  • In addition to the qualifications below, you’ll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter. 

Preferred Qualifications

  • Demonstrated ability to develop original research agendas.
  • Ability to collaborate effectively with other researchers and product development teams.
  • Experience in training large AI models.
  • Experience in approximation methods for deep learning systems.
  • Proficient interpersonal skills, cross-group, and cross-culture collaboration.
  • Ability to think unconventionally to derive creative and innovative solutions.

Applied Sciences IC2 : The base pay range for this internship is USD $5,460 -$10,680 per month.

There is a different range applicable to specific work locations, with the San Francisco Bay area and New York City Metropolitan area, and the base pay range for this role in those locations is USD $7,040 -$11,640 per month.

Applied Sciences IC3 : The base pay range for this internship is USD $6,550 -$12,880 per month.

There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $8,480 - $13, 920 per month.

 

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: 

Microsoft accepts applications and processes offers for these roles on an ongoing basis.

Responsibilities

Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world’s best researchers, Research Interns learn, collaborate, and network for life. Research Interns not only advance their own careers, but they also contribute to exciting research and development strides. During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer.

 

For this Research Internship (summer 2025), we are seeking PhD students with a passion for fundamental Deep Learning research, particularly those with experience in training LLMs and other large AI models. The Research Intern's responsibilities will include (1) training small language models with novel schemes preserving provenance of data, (2) experimenting with these models to test their performance and reliability. 

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Similar Jobs

ASSIST Software - AI Engineer

ASSIST Software

Suceava, Suceava County, Romania (Remote)
3 Months ago
NVIDIA - Signal and Power Integrity Engineer (RDSS Intern)

NVIDIA

Taipei City, Taiwan (On-Site)
1 Month ago
Meta - Software Engineer (Leadership) - Machine Learning

Meta

Burlingame, California, United States (Remote)
3 Months ago
Kaedim - Machine Learning Engineer

Kaedim

Singapore (On-Site)
6 Months ago
NVIDIA - Engineering Farm Engineer

NVIDIA

Bengaluru, Karnataka, India (On-Site)
1 Month ago
The Walt Disney Company - Software Engineer, Platform

The Walt Disney Company

Emeryville, California, United States (On-Site)
3 Months ago
NVIDIA - Senior Technical Program Manager – CSP Datacenter Compute Server Software

NVIDIA

Santa Clara, California, United States (On-Site)
2 Weeks ago
NVIDIA - Senior High-Performance LLM Training Engineer

NVIDIA

Santa Clara, California, United States (Hybrid)
1 Month ago
NVIDIA - System Software Engineer, High Integrity Data Pipelining

NVIDIA

California, United States (Remote)
1 Month ago
Riot Games - Senior Visual Design Artist - League of Legends, Summoner's Rift Environment

Riot Games

Los Angeles, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Student Researcher (Doubao (Seed) - Machine Learning System) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
3 Months ago
Tencent - NLP Research Intern

Tencent

(On-Site)
1 Month ago
NVIDIA - System Design Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
NVIDIA - Senior Solutions Architect, Omniverse Platform

NVIDIA

Shanghai, Shanghai, China (On-Site)
1 Month ago
NVIDIA - Senior System Software Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
NVIDIA - Senior Technical Program Manager, AI Datacenter

NVIDIA

Beijing, Beijing, China (On-Site)
1 Month ago
ByteDance - Student Researcher (Foundation Models - LLM Post-Training)

ByteDance

San Jose, California, United States (On-Site)
5 Days ago
Netomi - Data Scientist - I

Netomi

Gurugram, Haryana, India (Hybrid)
4 Months ago
Kokotree - Artificial Intelligence Developers

Kokotree

Wilmington, North Carolina, United States (On-Site)
3 Months ago
Rackspace Technology - Machine Learning Architect (AWS)

Rackspace Technology

(Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Mountain View, California, United States

Riot Games - Sound Design Intern - League of Legends - Summer 2025 (Remote)

Riot Games

Los Angeles, California, United States (Remote)
3 Months ago
Next Level Business Services - Sharepoint Architect (Full Time)

Next Level Business Services

Montvale, New Jersey, United States (On-Site)
4 Months ago
The Walt Disney Company - Assistant Manager, Global Hardlines Licensing - Toys

The Walt Disney Company

Glendale, California, United States (On-Site)
2 Weeks ago
Morning Star - Senior Application Security Architect

Morning Star

Chicago, Illinois, United States (Hybrid)
4 Months ago
Nintendo - DevOps Engineer

Nintendo

Redmond, Washington, United States (On-Site)
1 Month ago
Anavation - Lead Network Lab Engineer

Anavation

Reston, Virginia, United States (On-Site)
4 Months ago
NVIDIA - Senior Solutions Architect, Retail

NVIDIA

Arkansas, United States (Remote)
1 Month ago
Blinkhealth - Insurance Verification Specialist  (ON SITE)

Blinkhealth

Pittsburgh, Pennsylvania, United States (On-Site)
1 Month ago
Nagarro - Associate Staff Engineer, Python

Nagarro

New York, New York, United States (On-Site)
4 Months ago
Netflix - Software Engineer (L4) - CKG

Netflix

Los Angeles, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Research & Development Jobs

NVIDIA - Senior Software Engineer

NVIDIA

Ra'anana, Center District, Israel (On-Site)
3 Weeks ago
NVIDIA - Senior System Networking Engineer, InfiniBand

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
1 Month ago
Google - Senior Network Design Verification Engineer, Google Cloud

Google

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
Valeo - Deputy Tech Lead

Valeo

Chennai, Tamil Nadu, India (On-Site)
3 Months ago
Rockstar Games - Associate Software Engineer C++

Rockstar Games

New York, New York, United States (On-Site)
1 Week ago
Google - Software Engineer, Embedded Systems/Firmware, Pixel

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Month ago
Google - Hardware Engineering Intern, 2025

Google

(On-Site)
2 Months ago
Riot Games - Principal 3D Environment Artist - VALORANT

Riot Games

Los Angeles, California, United States (On-Site)
2 Months ago
Krafton  - [Publishing Platform Div.] Sr. Web Front-End Developer (5년 이상)

Krafton

Seoul, South Korea (On-Site)
3 Months ago
COMSOL,  Inc  - Pre-Sales Applications Engineer: Acoustics

COMSOL, Inc

Bengaluru, Karnataka, India (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

New York, New York, United States (Hybrid)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

London, England, United Kingdom (On-Site)

Dublin, County Dublin, Ireland (On-Site)

Mountain View, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug