Research Intern - LLM Inference Acceleration and Optimization

3 Months ago • Upto 1 Years • $78,600 PA - $154,560 PA

Job Summary

Job Description

This Research Internship at Microsoft's AIFX team focuses on accelerating and optimizing Large Language Model (LLM) inference. Interns will investigate and implement cutting-edge techniques like quantized KV-caches, flash/paged/radix attention, speculative decoding, and advanced collective communication on GPUs. The work involves leveraging state-of-the-art approaches like "You only cache once (YOCO)" to improve LLM serving efficiency at scale. The internship includes exploring, implementing, optimizing, and potentially publishing research findings related to real-world production workloads. Collaboration with Microsoft teams and contributions to open-source projects like vLLM, SGLang, and HuggingFace are key aspects of this role.
Must have:
  • PhD in CS or related field
  • 6+ months LLM training/inference experience
  • Experience with LLMs like Llama and Phi
  • Ability to convert research ideas into code
Good to have:
  • Experience with large-scale GPU communication
  • AI framework benchmarking experience (Pytorch, vLLM, SGLang)
  • Proficient interpersonal skills
  • Open to fast iteration and ambitious ideas
Perks:
  • Industry leading healthcare
  • Educational resources
  • Product and service discounts
  • Savings and investments
  • Maternity/paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Job Details

Overview

Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment.

If you are excited about investigating and implementing cutting-edge large language model (LLM) inference techniques and optimizations like quantized KV-caches, flash/paged/radix attention, speculative decoding, and advanced collective communication on graphics processing units (GPUs), come join the AIFX team at Microsoft Azure and contribute to a production-focused, planetary-scale LLM serving stack that is being built on top of excellent open-source efforts like vLLM, SGLang, and HuggingFace. The work includes investigation of cutting-edge, state-of-the-art approaches like "You only cache once (YOCO)" and leveraging them to save memory and compute for serving LLMs at scale. You will get a chance to explore, implement, optimize, and publish your research ideas in collaboration with teams at Microsoft working on real-world production workloads at an unprecedented scale.

Qualifications

Required Qualifications

  • Accepted or currently enrolled in a PhD program in Computer Science or related STEM field.
  • At least 6 months of experience with training and/or inference of recent LLMs like Llama and Phi.

Other Requirements

  • Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship.
  • In addition to the qualifications below, you’ll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter. 

Preferred Qualifications

  • Experience with large-scale collective communication on GPUs.
  • Experience with performance benchmarking of AI frameworks like Pytorch, vLLM, and/or SGLang.
  • Ability to convert research ideas into working code that runs and scales on real systems.
  • Proficient interpersonal skills and growth mindset.
  • Open to failing fast in pursuit of ambitious ideas.

The base pay range for this internship is USD $6,550 - $12,880 per month. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $8,480 - $13,920 per month.

 

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: 

Microsoft accepts applications and processes offers for these roles on an ongoing basis.

  •  

Responsibilities

Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world’s best researchers, Research Interns learn, collaborate, and network for life. Research Interns not only advance their own careers, but they also contribute to exciting research and development strides. During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Similar Jobs

Microsoft - Senior Firmware Engineer

Microsoft

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Luxoft - Senior AWS Automation Engineer with CICD

Luxoft

Bengaluru, Karnataka, India (On-Site)
5 Months ago
The Walt Disney Company - Lead Software Engineer (Identity)

The Walt Disney Company

Bristol, Connecticut, United States (On-Site)
5 Months ago
Microsoft - Sr. AI HW Quality Engineer

Microsoft

Taipei City, Taiwan (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Axinous - Technical Customer Success Manager

Axinous

Hong Kong (Remote)
2 Months ago
PwC - Senior Manager - Data Strategy & Management

PwC

Zürich, Zurich, Switzerland (On-Site)
6 Months ago
Unity - Développeur de logiciels Staff | Staff Software Developer

Unity

Montreal, Quebec, Canada (On-Site)
5 Months ago
SciPlay - Senior Software Engineer

SciPlay

Cedar Falls, Iowa, United States (Hybrid)
3 Months ago
Lulalend - Senior Azure Infrastructure Engineer

Lulalend

Cape Town, Western Cape, South Africa (On-Site)
6 Months ago
Rackspace Technology - Trainee Data Scientist

Rackspace Technology

Dubai, Dubai, United Arab Emirates (Remote)
1 Month ago
Interactive Brokers - Senior Systems Engineer- Microsoft M365/Active Directory

Interactive Brokers

Greenwich, Connecticut, United States (Hybrid)
6 Months ago
Rackspace Technology - Lead Platform Enterprise Architect

Rackspace Technology

United States (Remote)
5 Months ago
Varonis  - DevOps Engineer

Varonis

Herzliya, Tel Aviv District, Israel (On-Site)
6 Months ago
CloudHire - Microsoft /Inquoto Sales Specialist

CloudHire

Houston, Texas, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

Fliff  Inc  - Senior Game Operations Associate

Fliff Inc

Philadelphia, Pennsylvania, United States (On-Site)
9 Months ago
ByteDance - Data Researcher - Global E-Commerce Supply Chain & Logistics

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
Team Liquid - Creative Coordinator, NA

Team Liquid

California, United States (Remote)
1 Month ago
Tencent - Senior Staff Researcher

Tencent

Palo Alto, California, United States (On-Site)
5 Months ago
Fantastic Pixel Castle - Senior Gameplay Engineer

Fantastic Pixel Castle

United States (Remote)
6 Months ago
ByteDance - Senior Optical system engineer - Pico Lab -(AR/VR)- San Jose

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
ZeniMax Media - Senior Animator (Faces)

ZeniMax Media

Rockville, Maryland, United States (On-Site)
7 Months ago
Paypal - Sr. UX Designer, Venmo Teens

Paypal

San Jose, California, United States (On-Site)
6 Months ago
Whatnot - Software Engineer, Discovery Feed & Browse

Whatnot

San Francisco, California, United States (Remote)
6 Months ago
Epic Games - Principal Engine Programmer, Verse Framework

Epic Games

Bellevue, Washington, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.
View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug