Member of Technical Staff, AI Data

3 Months ago • All levels • Research Development

Job Summary

Job Description

The Member of Technical Staff, AI Data at Microsoft AI will contribute to building the world's most advanced multimodal dataset. Responsibilities include designing and developing data pipelines for ingesting massive amounts of multimodal data (text, audio, images, video), building and maintaining infrastructure for petabyte-scale data storage and processing, collaborating with pre-training and post-training teams to improve data quality through experimentation, and working with product teams and researchers to identify model gaps. The role requires expertise in data processing technologies, distributed computing tools (Spark, Kubernetes, TensorFlow, Flink, Pyspark), and experience with large-scale data handling. The ideal candidate is passionate about data's role in large-scale AI model training, thrives in fast-paced environments, and possesses strong problem-solving skills.
Must have:
  • Experience with data processing technologies
  • Expertise in distributed computing tools (Spark, Kubernetes, etc.)
  • Experience designing & developing data pipelines for multimodal data
  • Ability to build infrastructure for petabyte-scale data
  • Collaboration with pre-training and post-training teams
Good to have:
  • Experience with petabyte-scale data
  • Experience in ML research or as an ML Engineer/MLOps/SWE

Job Details

Overview

Help build the world’s most advanced multimodal dataset at Microsoft AI.

We are on a mission to create the largest and most advanced multimodal dataset in the world. This dataset, spanning all modalities from across the web and beyond, will power the training of the world’s most capable AI frontier models, pushing the boundaries of scale, performance, and product deployment.  

The AI Data team at Microsoft AI is responsible for all aspects of data preparation to support our model pre-training operations, including collecting data from the source, extracting and transforming the most useful data, and understanding the impact of changes to data by training and evaluating new models. We are an interdisciplinary team of engineers and scientists, learning from each other, and collaborating to create the best models and products. We work closely with the teams that transform pre-trained models into the models that power the consumer Copilot experience 

We are looking for outstanding individuals excited about contributing to the next generation of systems that will transform the field. In particular, we are looking for candidates who: 

  • Are passionate about the role of data in large-scale AI model training 
  • Will thrive in a highly collaborative, fast-paced environment 
  • Have a high degree of craftsmanship and pay close attention to details 
  • Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies 
  • Effectively manage multiple responsibilities and can adjust to shifting priorities.  

Qualifications

Required/Minimum Qualifications

  • Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, data modeling or data engineering work 
  • OR Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, or data engineering work 
  • OR equivalent experience. 
  • Experience using data processing technologies for Multimodal dataset scalability, parellel processing, data handling, streaming/batch processing, etc.
  • Experience working with distributed computing tools such as; Spark, Kubernetes, TensorFlow, Flink and Pyspark.
  • Experience conducted research in Machine Learning or worked as an ML Engineer/ MLOps/ SWE.
  • Experience designing and developing data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video) AND have the skills to be able to build infrastructure to support this work from ground up.

Preferred:

  • Experience working with large scale of data ideal Petabyte scale or above. 

 

Responsibilities

  • Design and develop data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video). 
  • Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models. 
  • Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation. 
  • Collaborate with the product team and other engineers and researchers across Microsoft AI to identify gaps in the current generation of models. 
  • Embody our and . 

Similar Jobs

sony global (Games) - Principal Platform Architect

sony global (Games)

San Diego, California, United States (On-Site)
1 Year ago
Match Group - Staff Software Engineer, Machine Learning

Match Group

Palo Alto, California, United States (Hybrid)
9 Months ago
OKX - Head of FinCrime, Internal Audit

OKX

New York, United States (On-Site)
2 Months ago
Varonis  - Cloud Security Research Team Leader

Varonis

Herzliya, Tel Aviv District, Israel (On-Site)
9 Months ago
Tekion Corp - Administrative Assistant - Part Time On Site

Tekion Corp

Pleasanton, California, United States (On-Site)
1 Month ago
Krafton - AI Adoption Specialist

Krafton

Seoul, South Korea (On-Site)
4 Months ago
Meta - Software Engineer (Leadership) - Machine Learning

Meta

Paris, Île-de-France, France (On-Site)
8 Months ago
Capgemini - Generative AI Developer

Capgemini

Hyderabad, Telangana, India (On-Site)
2 Months ago
bytedance - Student Researcher (Doubao (Seed) - Foundation Model - Video Generation) - 2025 Start (PhD)

bytedance

San Jose, California, United States (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

FalconX - Senior Frontend Engineer (Trading Systems)

FalconX

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Qualcomm - Failure Analysis Engineer - Nanoprobing

Qualcomm

San Diego, California, United States (On-Site)
2 Months ago
Ion - Data Associate - Wealthmonitor

Ion

Budapest, Hungary (On-Site)
9 Months ago
Toast - Technical Escalation Engineer

Toast

Dublin, County Dublin, Ireland (Hybrid)
3 Weeks ago
Google - Software Engineer III, Infrastructure, Core

Google

Seattle, Washington, United States (On-Site)
3 Months ago
Dentsu - Senior Manager, Transformation Strategy

Dentsu

Maryland, United States (Remote)
1 Month ago
cat daddy - Director of Product

cat daddy

Kirkland, Washington, United States (On-Site)
3 Weeks ago
Beyond Sports - Unity Developer

Beyond Sports

Alkmaar, North Holland, Netherlands (On-Site)
3 Months ago
Dynamis Inc - Data Architect

Dynamis Inc

Huntsville, Alabama, United States (On-Site)
3 Weeks ago
Electronic Arts - Senior Game Product Manager

Electronic Arts

Birmingham, England, United Kingdom (Hybrid)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Zürich, Zurich, Switzerland

dun bradstreet - Java Developer

dun bradstreet

Urdorf, Zurich, Switzerland (Hybrid)
1 Month ago
Sprinkler - Technical Success Manager - Core (Ads, Social, Marketing)

Sprinkler

Zürich, Zurich, Switzerland (On-Site)
1 Month ago
Tesla - Automotive Mechatronics Technician

Tesla

Landquart, Grisons, Switzerland (On-Site)
5 Months ago
Adobe - Product Sales Specialist

Adobe

Switzerland (Remote)
3 Months ago
FORTUNE - Business Development Manager (Digital Media Sales)

FORTUNE

Crans-Montana, Valais, Switzerland (On-Site)
2 Months ago
Sonar Source - Principal UX Designer

Sonar Source

Geneva, Geneva, Switzerland (On-Site)
2 Months ago
Tesla - Automotive Mechatronics Technician

Tesla

Zürich, Zurich, Switzerland (On-Site)
5 Months ago
Haleon - Outbound Operations Planner

Haleon

Nyon, Vaud, Switzerland (On-Site)
1 Month ago
Sonar Source - Enterprise Account Executive - German Speaker - DACH

Sonar Source

Geneva, Geneva, Switzerland (On-Site)
9 Months ago
Sonar Source - Senior Software Engineer (Java)

Sonar Source

Geneva, Geneva, Switzerland (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

bytedance - Student Researcher (Doubao (Seed) - Generative AI)

bytedance

San Jose, California, United States (Hybrid)
3 Months ago
Jane Street - Machine Learning Researcher

Jane Street

London, England, United Kingdom (On-Site)
3 Weeks ago
Tesla - EHS Technical Engineer R&D

Tesla

Baden-Württemberg, Germany (On-Site)
5 Months ago
bytedance - Senior Software Engineer - IaaS AI Infra

bytedance

San Jose, California, United States (On-Site)
5 Months ago
attentive - Staff Machine Learning Engineer

attentive

San Francisco, California, United States (Hybrid)
9 Months ago
London stock Exchange - Senior Research Specialist, Weather

London stock Exchange

Chicago, Illinois, United States (Hybrid)
2 Months ago
Fortra - Machine Learning Engineer II

Fortra

Armenia (On-Site)
2 Months ago
Ansys - R&D Engineer II (MAPDL Material Modeling)

Ansys

Canonsburg, Pennsylvania, United States (On-Site)
4 Weeks ago
Scanline VFX - Research Intern (Fall 2025)

Scanline VFX

Los Angeles, California, United States (Hybrid)
2 Months ago
Sonar Source - ML Ops Engineer

Sonar Source

Geneva, Geneva, Switzerland (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Hyderabad, Telangana, India (On-Site)

London, England, United Kingdom (On-Site)

Redmond, Washington, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Redmond, Washington, United States (On-Site)

Hyderabad, Telangana, India (On-Site)

Mountain View, California, United States (Hybrid)

Zürich, Zurich, Switzerland (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug