Member of Technical Staff, AI Pretraining Platform

1 Month ago • All levels • Artificial Intelligence

Job Summary

Job Description

Microsoft AI is seeking a Member of Technical Staff to contribute to their cutting-edge AI pre-training platform. This role involves designing and developing Python and CUDA/HIP C++ code for distributed training of multimodal LLMs, building and maintaining infrastructure for petabyte-scale data processing, partnering with other teams to improve data recipes, and collaborating on identifying gaps in current models. Responsibilities include optimizing for scalability, performance, and reliability on a large-scale GPU cluster. The ideal candidate will be passionate about large-scale AI infrastructure, thrive in a fast-paced collaborative environment, and demonstrate a high degree of craftsmanship.
Must have:
  • Python & CUDA/HIP C++ development
  • Experience with HPC and parallel programming
  • Large-scale AI model training experience
  • GPU cluster experience

Job Details


Job Description

Help build the world’s most advanced training platform at Microsoft AI 

We are on a mission to create the leading pretraining platform to develop the world’s most capable AI frontier models. This platform will span one of the world’s most foremost GPU clusters, pushing the boundaries of scale, performance, and reliability. 

The AI Pre-training Platform team at Microsoft AI is responsible for all aspects of infrastructure including scalability, benchmarking, kernel development, performance optimizations, communications, and fault tolerance to support our model pre-training operations. We are an interdisciplinary team of engineers and scientists, learning from each other, and collaborating to create the best models, methods and products. We work closely with the teams that transform pre-trained models into the models that power the consumer Copilot experience. 

We are looking for outstanding individuals excited about contributing to the next generation of systems that will transform the field. We are looking for candidates who: 
  • Are passionate about the infrastructure enabling large-scale AI model training 
  • Will thrive in a highly collaborative, fast-paced environment 
  • Have a high degree of craftsmanship and pay close attention to details 
  • Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies 
  • Effectively manage multiple responsibilities and can adjust to shifting priorities.  
 
Responsibilities 
  • Design and develop Python and CUDA/HIP C code that enable distributed training of multimodal LLMs ingesting text, audio, images, or video data. 
  • Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models. 
  • Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation. 
  • Collaborate with the product team and other engineers and researchers across Microsoft AI to identify gaps in the current generation of models. 
  • Embody our and
 

Required/Minimum Qualifications  
  • Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, data modeling or data engineering work 
  • OR Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, or data engineering work 
  • Experience with HPC (High performance computing) and/ or parallel programming?
  • Experience in the area of pretraining
  • Experience working with GPU clusters

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the .
 
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
 
#Copilot #MicrosoftAI


Similar Jobs

Zazz - Cloud Engineer (AWS)

Zazz

(Remote)
3 Months ago
Adtran - Software Engineer (Python, SaaS)

Adtran

Gdynia, Pomeranian Voivodeship, Poland (Hybrid)
6 Days ago
Mobiloitte - Job for Tech Lead – AI Agent Development

Mobiloitte

(Remote)
1 Week ago
Mistplay - Senior Data Analyst I, Trust & Safety / Fraud

Mistplay

Montreal, Quebec, Canada (Hybrid)
2 Months ago
Nintendo - Intern - Software Engineer

Nintendo

Redmond, Washington, United States (On-Site)
7 Months ago
A-Team - AI Strategy Lead

A-Team

New York, New York, United States (Hybrid)
2 Months ago
NVIDIA - Machine Learning Intern - 2025

NVIDIA

(On-Site)
4 Months ago
PlayStation Global - Machine Learning Engineer

PlayStation Global

London, England, United Kingdom (On-Site)
1 Month ago
Trustana - Senior Data Engineer

Trustana

Gurugram, Haryana, India (Hybrid)
7 Months ago
CharacterAI - Research Engineer, ML Systems

CharacterAI

New York, New York, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Qualcomm - Senior Engineer - Network Stack Development with AI

Qualcomm

Hyderabad, Telangana, India (On-Site)
3 Weeks ago
Canva - Senior Computer Vision Engineer - Photo AI

Canva

Vienna, Vienna, Austria (Remote)
2 Months ago
bytedance - Software Engineer, SRE - Platform Services

bytedance

Seattle, Washington, United States (On-Site)
2 Months ago
Enphase Energy - Senior Engineer - Embedded Firmware DVT

Enphase Energy

Bengaluru, Karnataka, India (On-Site)
3 Months ago
bytedance - Research Scientist in LLM Foundation Models (reasoning, planning & agent)

bytedance

Seattle, Washington, United States (On-Site)
7 Months ago
Accurate - Information Security Engineer

Accurate

Hyderabad, Telangana, India (Hybrid)
1 Week ago
bytedance - Researcher Graduate (Applied Machine Learning - Enterprise)

bytedance

San Jose, California, United States (On-Site)
1 Month ago
Super.com - Senior Analytics Engineer

Super.com

(Remote)
3 Weeks ago
Glean - Senior/Staff Applied Scientist

Glean

Palo Alto, California, United States (Hybrid)
2 Weeks ago
Thales - Senior Sales Engineer

Thales

California, United States (Remote)
1 Week ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

dovetail games - Environment Artist

dovetail games

Chatham, England, United Kingdom (Remote)
2 Weeks ago
fortis games - Staff Full-Stack Engineer (Game Team)

fortis games

United Kingdom (Remote)
1 Week ago
fortis games - Principal Game Designer

fortis games

United Kingdom (Remote)
1 Week ago
Epic Games - Tech Artist

Epic Games

London, England, United Kingdom (On-Site)
1 Month ago
ISS Stoxx - Junior Product Manager

ISS Stoxx

London, England, United Kingdom (On-Site)
2 Weeks ago
Sabre India - Sr Commercial Financial Analyst, EMEA

Sabre India

Richmond, England, United Kingdom (Hybrid)
4 Days ago
Insight Software - Senior Consultant

Insight Software

London, England, United Kingdom (On-Site)
3 Weeks ago
Lighthouse Games - Physics Engineer

Lighthouse Games

Royal Leamington Spa, England, United Kingdom (Hybrid)
5 Days ago
Whalar - Sales Account Manager

Whalar

London, England, United Kingdom (Hybrid)
1 Week ago
Accurate - Key Language Researcher

Accurate

Brighton And Hove, England, United Kingdom (Hybrid)
1 Year ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

zoox - Senior/Staff Machine Learning Engineer - Prediction & Behavior ML

zoox

Boston, Massachusetts, United States (Hybrid)
7 Months ago
Smilegate - AI Reinforcement Learning and Prediction Model Development

Smilegate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
1 Month ago
bytedance - Research Engineer / Scientist - AI for Databases

bytedance

San Jose, California, United States (On-Site)
1 Month ago
PlayStation Global - Machine Learning Engineer

PlayStation Global

London, England, United Kingdom (On-Site)
1 Month ago
Mashgin - Senior Software Engineer, Computer Vision and Deep Learning

Mashgin

Palo Alto, California, United States (Hybrid)
7 Months ago
Meta - Research Intern, Computer Vision for Egocentric Representation Learning (PhD)

Meta

Redmond, Washington, United States (On-Site)
6 Months ago
Zazz - Machine Learning Engineer

Zazz

(Remote)
3 Months ago
The Walt Disney Company - Senior Data Scientist - NLP/LLM

The Walt Disney Company

Glendale, California, United States (On-Site)
1 Month ago
Meta - Research Scientist Intern, Language and Multimodal Research for MetaAI (PhD)

Meta

Menlo Park, California, United States (On-Site)
6 Months ago
Google - Software Developer III, AI/ML GenAI

Google

New York, New York, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Vancouver, British Columbia, Canada (On-Site)

Mountain View, California, United States (Hybrid)

Shenzhen, Guangdong Province, China (On-Site)

Noida, Uttar Pradesh, India (On-Site)

Redmond, Washington, United States (On-Site)

Paris, Île-de-France, France (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug