Research Intern - LLM Inference Acceleration and Optimization

58 Minutes ago • Upto 1 Years

About the job

Job Description

This Research Internship at Microsoft's AIFX team focuses on accelerating and optimizing Large Language Model (LLM) inference. Interns will investigate and implement cutting-edge techniques like quantized KV-caches, flash/paged/radix attention, speculative decoding, and advanced collective communication on GPUs. The work involves leveraging state-of-the-art approaches like "You only cache once (YOCO)" to improve LLM serving efficiency at scale. The internship includes exploring, implementing, optimizing, and potentially publishing research findings related to real-world production workloads. Collaboration with Microsoft teams and contributions to open-source projects like vLLM, SGLang, and HuggingFace are key aspects of this role.
Must have:
  • PhD in CS or related field
  • 6+ months LLM training/inference experience
  • Experience with LLMs like Llama and Phi
  • Ability to convert research ideas into code
Good to have:
  • Experience with large-scale GPU communication
  • AI framework benchmarking experience (Pytorch, vLLM, SGLang)
  • Proficient interpersonal skills
  • Open to fast iteration and ambitious ideas
Perks:
  • Industry leading healthcare
  • Educational resources
  • Product and service discounts
  • Savings and investments
  • Maternity/paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Overview

Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment.

If you are excited about investigating and implementing cutting-edge large language model (LLM) inference techniques and optimizations like quantized KV-caches, flash/paged/radix attention, speculative decoding, and advanced collective communication on graphics processing units (GPUs), come join the AIFX team at Microsoft Azure and contribute to a production-focused, planetary-scale LLM serving stack that is being built on top of excellent open-source efforts like vLLM, SGLang, and HuggingFace. The work includes investigation of cutting-edge, state-of-the-art approaches like "You only cache once (YOCO)" and leveraging them to save memory and compute for serving LLMs at scale. You will get a chance to explore, implement, optimize, and publish your research ideas in collaboration with teams at Microsoft working on real-world production workloads at an unprecedented scale.

Qualifications

Required Qualifications

  • Accepted or currently enrolled in a PhD program in Computer Science or related STEM field.
  • At least 6 months of experience with training and/or inference of recent LLMs like Llama and Phi.

Other Requirements

  • Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship.
  • In addition to the qualifications below, you’ll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter. 

Preferred Qualifications

  • Experience with large-scale collective communication on GPUs.
  • Experience with performance benchmarking of AI frameworks like Pytorch, vLLM, and/or SGLang.
  • Ability to convert research ideas into working code that runs and scales on real systems.
  • Proficient interpersonal skills and growth mindset.
  • Open to failing fast in pursuit of ambitious ideas.

The base pay range for this internship is USD $6,550 - $12,880 per month. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $8,480 - $13,920 per month.

 

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: 

Microsoft accepts applications and processes offers for these roles on an ongoing basis.

  •  

Responsibilities

Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world’s best researchers, Research Interns learn, collaborate, and network for life. Research Interns not only advance their own careers, but they also contribute to exciting research and development strides. During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect
View Full Job Description
$78.6K - $154.6K/yr (Outscal est.)
$116.6K/yr avg.
Redmond, Washington, United States

Add your resume

80%

Upload your resume, increase your shortlisting chances by 80%

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

London, England, United Kingdom (On-Site)

Dublin, County Dublin, Ireland (On-Site)

Ho Chi Minh City, Ho Chi Minh City, Vietnam (On-Site)

San José, San José Province, Costa Rica (On-Site)

Prague, Prague, Czechia (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Similar Jobs

Microsoft - Azure Infra Specialist

Microsoft, (On-Site)

Paypal - Sr. Software Engineer, Data Governance

Paypal, United States (Hybrid)

CData Software - Software Development Engineer III

CData Software, India (On-Site)

Blue Yonder - Software Engineer II (.Net Full stack)

Blue Yonder, India (On-Site)

Xplor Technologies - IT OPS - Infrastructure Engineer

Xplor Technologies, India (On-Site)

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Trend Micro - Technical Account Manager - Montreal, QC

Trend Micro, Canada (On-Site)

Lulalend - Senior Security Operations Engineer

Lulalend, South Africa (On-Site)

CyberArk - R&D Manager

CyberArk, India (On-Site)

Skan AI - Release Manager

Skan AI, India (Hybrid)

Inworld AI - Staff Platform Engineer  - Canada

Inworld AI, Canada (On-Site)

Microsoft - Technical Program Manager 2

Microsoft, India (On-Site)

version 1 - Senior Microsoft Dynamics 365 Developer

version 1, United Kingdom (On_site)

Luxoft - Senior Data Engineer/Analyst

Luxoft, Switzerland (On-Site)

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

Sphere Entertainment Co - Immersive Video Engineer

Sphere Entertainment Co, United States (On-Site)

Fliff  Inc  - Payments and Fraud Analyst

Fliff Inc , United States (On-Site)

ByteDance - Video Codec Architect - Multimedia Lab

ByteDance, United States (On-Site)

Nintendo - Internal Auditor

Nintendo, United States (Hybrid)

Ubisoft - Social Creative Manager - One Year Contract

Ubisoft, United States (Hybrid)

Feld Entertainment - Warehouse Associate

Feld Entertainment, United States (On-Site)

Meta - Software Engineer, Product

Meta, United States (Remote)

OpenGov - Staff Technical Program Manager

OpenGov, United States (Hybrid)

Trek - Service Tech

Trek, United States (On-Site)

Get notifed when new similar jobs are uploaded