Member of Technical Staff, AI Pretraining Platform

1 Week ago • All levels • Artificial Intelligence

Job Summary

Job Description

Microsoft AI is seeking a Member of Technical Staff to contribute to their cutting-edge AI pre-training platform. This role involves designing and developing Python and CUDA/HIP C++ code for distributed training of multimodal LLMs, building and maintaining infrastructure for petabyte-scale data processing, partnering with other teams to improve data recipes, and collaborating on identifying gaps in current models. Responsibilities include optimizing for scalability, performance, and reliability on a large-scale GPU cluster. The ideal candidate will be passionate about large-scale AI infrastructure, thrive in a fast-paced collaborative environment, and demonstrate a high degree of craftsmanship.
Must have:
  • Python & CUDA/HIP C++ development
  • Experience with HPC and parallel programming
  • Large-scale AI model training experience
  • GPU cluster experience

Job Details


Job Description

Help build the world’s most advanced training platform at Microsoft AI 

We are on a mission to create the leading pretraining platform to develop the world’s most capable AI frontier models. This platform will span one of the world’s most foremost GPU clusters, pushing the boundaries of scale, performance, and reliability. 

The AI Pre-training Platform team at Microsoft AI is responsible for all aspects of infrastructure including scalability, benchmarking, kernel development, performance optimizations, communications, and fault tolerance to support our model pre-training operations. We are an interdisciplinary team of engineers and scientists, learning from each other, and collaborating to create the best models, methods and products. We work closely with the teams that transform pre-trained models into the models that power the consumer Copilot experience. 

We are looking for outstanding individuals excited about contributing to the next generation of systems that will transform the field. We are looking for candidates who: 
  • Are passionate about the infrastructure enabling large-scale AI model training 
  • Will thrive in a highly collaborative, fast-paced environment 
  • Have a high degree of craftsmanship and pay close attention to details 
  • Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies 
  • Effectively manage multiple responsibilities and can adjust to shifting priorities.  
 
Responsibilities 
  • Design and develop Python and CUDA/HIP C code that enable distributed training of multimodal LLMs ingesting text, audio, images, or video data. 
  • Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models. 
  • Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation. 
  • Collaborate with the product team and other engineers and researchers across Microsoft AI to identify gaps in the current generation of models. 
  • Embody our and
 

Required/Minimum Qualifications  
  • Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, data modeling or data engineering work 
  • OR Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, or data engineering work 
  • Experience with HPC (High performance computing) and/ or parallel programming?
  • Experience in the area of pretraining
  • Experience working with GPU clusters

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the .
 
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
 
#Copilot #MicrosoftAI


Similar Jobs

Addepar - Sr. Software Engineer - Reference Data

Addepar

(Remote)
1 Day ago
Jane Street - Macro Analyst

Jane Street

New York, New York, United States (On-Site)
7 Hours ago
Playrix - Lead Unity Software Engineer (Gameplay)

Playrix

Cyprus (Remote)
6 Months ago
Gaming Innovation Group  - Associate Big Data Engineer

Gaming Innovation Group

Manchester, England, United Kingdom (Hybrid)
3 Weeks ago
PwC - Data Architect – Technology Consulting

PwC

Prague, Prague, Czechia (On-Site)
6 Months ago
Ello - Tech Lead, GenAI & Machine Learning

Ello

San Francisco, California, United States (On-Site)
2 Weeks ago
NVIDIA - Solutions Architect - Cloud Providers and Hyperscale

NVIDIA

Washington, United States (On-Site)
1 Month ago
bito - Backend Developer

bito

Pune, Maharashtra, India (Hybrid)
2 Months ago
ByteDance - Research Scientist, Multimodal Interaction & World Model

ByteDance

Singapore (On-Site)
2 Weeks ago
Google - Technical Program Manager III, AI/ML, Cloud AI Systems

Google

Austin, Texas, United States (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Google - Web Solutions Engineer

Google

Hyderabad, Telangana, India (On-Site)
2 Days ago
Google - CPU Design Verification Engineer

Google

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
2 Weeks ago
Tencent - Speech Synthesis Intern

Tencent

London, England, United Kingdom (On-Site)
2 Months ago
ByteDance - AI Security Researcher - Security - San Jose

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Google - Engineering Manager, Node Platform Infra

Google

San Francisco, California, United States (On-Site)
2 Days ago
Daybreak Game Company LLC - Business Intelligence Engineer

Daybreak Game Company LLC

San Diego, California, United States (Hybrid)
2 Months ago
Rockstar Games - Animation R&D Programmer

Rockstar Games

New York, New York, United States (On-Site)
5 Months ago
Google - System Power and Performance Architect, Silicon

Google

New Taipei, New Taipei City, Taiwan (On-Site)
2 Weeks ago
Kojima - Tools Programmer

Kojima

Minato City, Tokyo, Japan (On-Site)
1 Day ago
Sony Interactive Entertainment - System Development Engineer (PlayStation Platform Game Content Authorship and Delivery System)

Sony Interactive Entertainment

Tokyo, Japan (On-Site)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

Eleven Labs - Mobile Growth Marketer

Eleven Labs

United Kingdom (Remote)
1 Month ago
Tide - Senior Commercial Finance Broker

Tide

United Kingdom (Hybrid)
1 Day ago
Technicon Design - Digital Modeller

Technicon Design

Coventry, England, United Kingdom (Hybrid)
4 Weeks ago
Universally Speaking - Brazilian Portuguese Games Tester

Universally Speaking

England, United Kingdom (On-Site)
2 Weeks ago
2K - Localization Lead - German

2K

London, England, United Kingdom (Hybrid)
1 Day ago
Blue bolt - Editorial and Data Ops Assistant

Blue bolt

London, England, United Kingdom (On-Site)
22 Hours ago
AGS - American Gaming Systems - Senior Regulatory Compliance Specialist

AGS - American Gaming Systems

United Kingdom (Remote)
3 Weeks ago
Nagarro - Senior Staff Engineer

Nagarro

United Kingdom (Remote)
6 Months ago
Rank group - Electronic Gaming Host

Rank group

England, United Kingdom (On-Site)
5 Months ago
QS Quacquarelli Symonds - IT Operations Intern

QS Quacquarelli Symonds

London, England, United Kingdom (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Google - Technical Program Manager, AI Data Operations

Google

Hyderabad, Telangana, India (On-Site)
2 Weeks ago
ByteDance - Research Scientist Graduate (Foundation Model - Generative AI) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
4 Months ago
Google - Staff Software Engineer, AI/ML, Google Workspace

Google

Sunnyvale, California, United States (On-Site)
2 Weeks ago
Google - Senior Technical Program Manager, AI Risk Reporting Lead

Google

Seattle, Washington, United States (On-Site)
2 Days ago
ByteDance - LLM Software Engineer/Researcher (Applied Machine Learning)

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago
DNEG - Head of Machine Learning

DNEG

London, England, United Kingdom (Remote)
2 Months ago
Interface AI - Vice President of Engineering

Interface AI

United States (Remote)
2 Months ago
Google - Software Engineer III, Machine Learning, Google Ads

Google

Kirkland, Washington, United States (On-Site)
2 Weeks ago
Airlab Inc  - Junior Programmer Artificial Intelligence

Airlab Inc

Quebec, Canada (On-Site)
1 Month ago
Microsoft - Director, AI Advertising Acceleration

Microsoft

Sydney, New South Wales, Australia (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

London, England, United Kingdom (On-Site)

Redmond, Washington, United States (On-Site)

Redmond, Washington, United States (Hybrid)

Shanghai, Shanghai, China (Hybrid)

Beijing, Beijing, China (On-Site)

Washington, United States (On-Site)

Phoenix, Arizona, United States (On-Site)

Penang, Malaysia (On-Site)

London, England, United Kingdom (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug