Member of Technical Staff, AI Pretraining Platform

3 Months ago • All levels • Research Development

Job Summary

Job Description

Microsoft AI is seeking a Member of Technical Staff to contribute to their cutting-edge AI pre-training platform. This role involves designing and developing Python and CUDA/HIP C++ code for distributed training of multimodal LLMs, building and maintaining infrastructure for petabyte-scale data processing, partnering with other teams to improve data recipes, and collaborating on identifying gaps in current models. Responsibilities include optimizing for scalability, performance, and reliability on a large-scale GPU cluster. The ideal candidate will be passionate about large-scale AI infrastructure, thrive in a fast-paced collaborative environment, and demonstrate a high degree of craftsmanship.
Must have:
  • Python & CUDA/HIP C++ development
  • Experience with HPC and parallel programming
  • Large-scale AI model training experience
  • GPU cluster experience

Job Details


Job Description

Help build the world’s most advanced training platform at Microsoft AI 

We are on a mission to create the leading pretraining platform to develop the world’s most capable AI frontier models. This platform will span one of the world’s most foremost GPU clusters, pushing the boundaries of scale, performance, and reliability. 

The AI Pre-training Platform team at Microsoft AI is responsible for all aspects of infrastructure including scalability, benchmarking, kernel development, performance optimizations, communications, and fault tolerance to support our model pre-training operations. We are an interdisciplinary team of engineers and scientists, learning from each other, and collaborating to create the best models, methods and products. We work closely with the teams that transform pre-trained models into the models that power the consumer Copilot experience. 

We are looking for outstanding individuals excited about contributing to the next generation of systems that will transform the field. We are looking for candidates who: 
  • Are passionate about the infrastructure enabling large-scale AI model training 
  • Will thrive in a highly collaborative, fast-paced environment 
  • Have a high degree of craftsmanship and pay close attention to details 
  • Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies 
  • Effectively manage multiple responsibilities and can adjust to shifting priorities.  
 
Responsibilities 
  • Design and develop Python and CUDA/HIP C code that enable distributed training of multimodal LLMs ingesting text, audio, images, or video data. 
  • Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models. 
  • Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation. 
  • Collaborate with the product team and other engineers and researchers across Microsoft AI to identify gaps in the current generation of models. 
  • Embody our and
 

Required/Minimum Qualifications  
  • Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, data modeling or data engineering work 
  • OR Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, or data engineering work 
  • Experience with HPC (High performance computing) and/ or parallel programming?
  • Experience in the area of pretraining
  • Experience working with GPU clusters

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the .
 
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
 
#Copilot #MicrosoftAI


Similar Jobs

bytedance - Research Scientist, AI Foundation

bytedance

San Jose, California, United States (On-Site)
1 Week ago
Saronic Technologies - Software Security Engineer

Saronic Technologies

Austin, Texas, United States (On-Site)
1 Week ago
Apple - Site Reliability Engineer

Apple

Cupertino, California, United States (On-Site)
1 Month ago
Capgemini - Temenos Functional Lead

Capgemini

Pune, Maharashtra, India (On-Site)
2 Months ago
bytedance - Senior Software Engineer, Multi Cloud CDN

bytedance

Boston, Massachusetts, United States (On-Site)
1 Week ago
Epic Games - Senior AI Programmer, Fortnite

Epic Games

Montreal, Quebec, Canada (On-Site)
3 Months ago
Mistral AI - AI Scientist, Safety

Mistral AI

Paris, Île-de-France, France (On-Site)
9 Months ago
Apple - AIML - Engineering Program Manager, Search

Apple

Cupertino, California, United States (On-Site)
3 Weeks ago
ShyftLabs - AI Engineer

ShyftLabs

Noida, Uttar Pradesh, India (Hybrid)
1 Week ago
Bosch Group India - Senior ML Engineer Lead - Time Series

Bosch Group India

Bengaluru, Karnataka, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Scopely - Lead Game Designer

Scopely

Bengaluru, Karnataka, India (Hybrid)
1 Year ago
Morning Star - Vice President, Business Development

Morning Star

New York, United States (Hybrid)
2 Weeks ago
Western Digital - Principal Engineer, Enterprise Data Platform

Western Digital

Bengaluru, Karnataka, India (On-Site)
1 Week ago
Unity - Senior Machine Learning Engineer

Unity

San Francisco, California, United States (On-Site)
1 Week ago
Yahoo - Principal Software Engineer - Media Platform

Yahoo

United States (Hybrid)
1 Year ago
Cineplex - Restaurant Host - Seasonal

Cineplex

Toronto, Ontario, Canada (On-Site)
1 Year ago
Tencent - Senior Combat Planner

Tencent

Shenzhen, Guangdong Province, China (On-Site)
2 Months ago
Amber - Senior Unity Game Engineer (Project Based)

Amber

Brazil (On-Site)
1 Year ago
hitberry games - IT Service Manager

hitberry games

(Remote)
1 Year ago
Thales - Electronics Repair Technician NTI3

Thales

Toulon, Provence-Alpes-Côte D'Azur, France (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

Hawkeye Innovations - Product Owner - Sports Data

Hawkeye Innovations

United Kingdom (Hybrid)
2 Months ago
arctic7 - Senior VFX Artist

arctic7

Horsham, England, United Kingdom (Hybrid)
5 Months ago
Global Business Travel - Analytics Engineer

Global Business Travel

London, England, United Kingdom (On-Site)
1 Year ago
Epic Games - Senior Platform Programmer

Epic Games

London, England, United Kingdom (On-Site)
4 Months ago
The Rank Group - Experienced Card Room Dealer

The Rank Group

Luton, England, United Kingdom (On-Site)
8 Months ago
Clearwater Analytics - Enterprise Sales - Asset Management - FraBeLux

Clearwater Analytics

London, England, United Kingdom (On-Site)
4 Weeks ago
Fortra - Lead Data Scientist

Fortra

United Kingdom (Remote)
3 Weeks ago
Flexra Software - Principal Data & AI Engineer

Flexra Software

United Kingdom (Remote)
3 Weeks ago
ElevenLabs - Talent Operations

ElevenLabs

United Kingdom (Remote)
3 Months ago
London stock Exchange - Executive Assistant

London stock Exchange

London, England, United Kingdom (Hybrid)
1 Year ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

Apple - AIML - Backend Engineer, Evaluation

Apple

Santa Clara, California, United States (On-Site)
1 Month ago
Ansys - R&D Engineer II

Ansys

Waterloo, Ontario, Canada (On-Site)
1 Month ago
Moonvalley - AI Data Engineering Lead

Moonvalley

London, England, United Kingdom (Hybrid)
2 Weeks ago
Lorikeet - AI Implementation Strategist

Lorikeet

United States (Remote)
1 Month ago
Cricketpedia - AI Engineer

Cricketpedia

Gurugram, Haryana, India (Remote)
2 Years ago
broadcom - AI/ML Model Runtime Engineer

broadcom

United States (On-Site)
2 Weeks ago
Yahoo - Senior Research Scientist - Phish and Spam Detection

Yahoo

United States (Hybrid)
2 Months ago
Ubisoft - Lead R&D Scientist

Ubisoft

Shanghai, Shanghai, China (On-Site)
4 Months ago
Accenture - AI / ML Engineer

Accenture

Pune, Maharashtra, India (On-Site)
3 Weeks ago
Roblox - Principal Machine Learning Engineer, Content Safety

Roblox

San Mateo, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Hyderabad, Telangana, India (On-Site)

London, England, United Kingdom (On-Site)

Redmond, Washington, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Redmond, Washington, United States (On-Site)

Mountain View, California, United States (Hybrid)

Zürich, Zurich, Switzerland (On-Site)

Bengaluru, Karnataka, India (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug