ML Data Engineer

1 Month ago • All levels • Data Analysis • $200,000 PA - $400,000 PA

Job Summary

Job Description

Moonvalley is seeking an ML Data Engineer to develop data pipelines for their next-generation generative video models. This role involves designing and implementing systems for data ingestion, deduplication, validation, filtering, labeling, and quality scoring. You will also fine-tune and build ML models from scratch, address dataset/model biases, and implement observability across the ML data lifecycle. Collaboration with infrastructure teams is key to developing efficient pipelines supporting large-scale video model training across thousands of GPUs. The ideal candidate will work in a fast-paced environment tackling challenging data quality and model performance issues, contributing to the creation of cutting-edge generative AI models for commercials and cinematic experiences.
Must have:
  • Design and implement data ingestion, deduplication, validation, filtering, labeling, and quality scoring systems.
  • Fine-tune and build ML models from scratch, taking them from training to production.
  • Identify and address dataset/model biases, creating scoring systems to mitigate them.
  • Implement observability and telemetry across the ML data lifecycle.
  • Collaborate with infrastructure teams to develop efficient data pipelines for large-scale video model training.
  • Strong hands-on experience in ML engineering, including training and optimizing models.
  • Deep experience in building and scaling data infrastructure for large-scale ML systems.
  • Experience managing large-scale datasets and pipelines in production.
  • Fluency with Python, Spark, Airflow, or similar frameworks.
  • Understanding of modern cloud infrastructure: Kubernetes, Terraform, S3/GCS, distributed compute.
  • Comfortable operating in environments with ambiguity and evolving priorities.
Good to have:
  • Experience working on foundational model training pipelines (image, video, or language).
  • Experience with video-specific data challenges like frame sampling, codec variability, temporal alignment, and perceptual quality scoring.

Job Details

About Moonvalley

Moonvalley is building the next generation creative studio, powered by the most capable video and image foundational models in the world. We are creating the platforms where the first generative Super Bowl ads and Oscar winning movies will be created.

We’re the most pedigreed team in generative AI, with top former Deepmind video researchers leading a research team as deep as any in the industry, product leaders who have built some of the best software products in the world, and an in-house Oscar-nominated movie studio. We’ve also raised $75m from world class investors including General Catalyst, Bessemer, Khosla Ventures & YCombinator.

About the role

Moonvalley is developing cutting-edge generative AI models designed to power Superbowl-worthy commercials and award-winning cinematic experiences. Our inaugural, cutting-edge HD model, Marey, is built on exclusively licensed and owned data for professional use in Hollywood and enterprise applications.

Our team is an unprecedented convergence of talent across industries. Our elite AI scientists from DeepMind, Microsoft, Snap and Meta, have decades of collective experience in machine learning and computational creativity. We have also established the first AI-enabled movie studio in Hollywood, filled with accomplished filmmakers and visionary creative talent. We work with the top producers, actors, and filmmakers in Hollywood as well as creative-driven global brands. So far we’ve raised over $70M from world-class investors including General Catalyst, Bessemer, Khosla Ventures & YCombinator – and we’re just getting started.

Role Summary:

We're looking for an ML Data Engineer to build the data pipelines driving our next-generation generative video models. This role is central to our mission of training models exclusively on clean, high-quality data.

You'll develop data ingestion pipelines, captioning systems, and high-throughput, distributed architectures for large-scale data processing and curation. You’ll be responsible for solving some of the toughest challenges in data quality and model performance — from training and shipping quality scoring models to analyzing large-scale datasets and uncovering new challenges

What you’ll do:

  • Design and implement systems for data ingestion, deduplication, validation, filtering, labelling, and quality scoring.

  • Fine-tune and build ML models from scratch and take them from training to production.

  • Identify and address dataset/model biases — including creating additional scoring systems to mitigate them.

  • Implement observability and telemetry across the ML data lifecycle.

  • Collaborate with infrastructure teams to develop efficient data pipelines that support large-scale video model training, running across thousands of GPUs.

  • Work in a fast-moving environment with many known and unknown challenges to tackle.

What we’re looking for:

  • Strong hands-on experience in ML engineering, including training and optimizing models (e.g., classifiers, segmentation, quality scoring), with a focus on image, video, or audio modalities.

  • Deep experience in building and scaling data infrastructure for large-scale ML systems, ideally for video or multi-modal models.

  • Experience managing large-scale datasets and pipelines in production.

  • Fluency with Python, Spark, Airflow, or similar frameworks.

  • Understanding of modern cloud infrastructure: Kubernetes, Terraform, S3/GCS, distributed compute.

  • Comfortable operating in environments with ambiguity and evolving priorities.

Nice to Haves:

  • Experience working on foundational model training pipelines (image, video, or language).

  • Experience with video-specific data challenges like frame sampling, codec variability, temporal alignment, and perceptual quality scoring.

In our team, we approach our work with the dedication similar to Olympic athletes. Anticipate occasional late nights and weekends dedicated to our mission. We understand this level of commitment may not suit everyone, and we openly communicate this expectation.

If you're motivated by deeply technical problems, a seemingly never-ending uphill battle and the opportunity to build (and own) a generational technology company, we can give you what you're looking for.

All business roles at Moonvalley are hybrid positions by default, with some fully remote depending on the job scope. We meet a few times every year, usually in London, UK or North America (LA, Toronto) as a company.

If you're excited about the opportunity to work on cutting-edge AI technology and help shape the future of media and entertainment, we encourage you to apply. We look forward to hearing from you!

The statements contained in this job description reflect general details as necessary to describe the principal functions of this job, the level of knowledge and skill typically required and the scope of responsibility. It should not be considered an all-inclusive listing of work requirements. Individuals may perform other duties as assigned, including work in other functional areas to cover absences, to equalize peak work periods, or to otherwise balance organizational work

Moonvalley AI is proud to be an equal opportunity employer. We are committed to providing accommodations. If you require accommodation, we will work with you to meet your needs.

Please be assured we'll treat any information you share with us with the utmost care, only use your information for recruitment purposes and will never sell it to other companies for marketing purposes. Please review our privacy policy and job applicant privacy policy located here for further information.

Similar Jobs

Nine - Product Manager - Metro Apps

Nine

North Sydney, New South Wales, Australia (On-Site)
1 Month ago
lifechruh - Business Insights Lead

lifechruh

Edmond, Oklahoma, United States (On-Site)
4 Months ago
Epic Games - Senior Rendering Tools Engineer

Epic Games

(On-Site)
3 Months ago
Samsung Semiconductor - Senior Corporate Counsel

Samsung Semiconductor

San Jose, California, United States (On-Site)
2 Months ago
Spyke Games - QA Specialist (Disabled)

Spyke Games

İstanbul, Türkiye (On-Site)
9 Months ago
PayPal - Manager, Data Science

PayPal

San Jose, California, United States (Hybrid)
2 Months ago
dun bradstreet - Data Science Manager

dun bradstreet

Shanghai, China (On-Site)
1 Month ago
Nagarro - Senior Staff Consultant, Business Analyst

Nagarro

New York, New York, United States (On-Site)
9 Months ago
Whatnot - Senior Business Systems Analyst (Workday)

Whatnot

Los Angeles, California, United States (On-Site)
2 Months ago
Patreon - Staff Data Scientist

Patreon

New York, United States (Hybrid)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Unity - Senior Games QA Engineer

Unity

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
FlockSafety - Enterprise Solutions Engineer

FlockSafety

United States (Remote)
5 Months ago
Gallagher - Data Scientist

Gallagher

Bengaluru, Karnataka, India (On-Site)
9 Months ago
Ubisoft - Application Specialist

Ubisoft

Saint-Mandé, Île-de-France, France (Hybrid)
3 Months ago
TT Games - Senior Enemy AI Designer

TT Games

Knutsford, England, United Kingdom (Hybrid)
2 Months ago
Capgemini - Devops

Capgemini

Bengaluru, Karnataka, India (On-Site)
2 Months ago
ISS Stoxx - IT Service Center Analyst

ISS Stoxx

Makati City, Metro Manila, Philippines (Hybrid)
2 Months ago
HappyRobot - Field Marketing Manager

HappyRobot

San Francisco, California, United States (On-Site)
2 Months ago
Plaid  - Revenue Enablement Manager - New Hire Onboarding

Plaid

Durham, North Carolina, United States (Hybrid)
1 Month ago
Coupa - Sr. Principal Software Engineer (Analytics)

Coupa

India (Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in United Kingdom

Rocket Science - Software Engineer - Backend

Rocket Science

Cardiff, Wales, United Kingdom (Hybrid)
1 Month ago
Alpha Sense - Associate Account Executive, Financial Services

Alpha Sense

London, England, United Kingdom (On-Site)
2 Months ago
GT HQ - React Native Engineer

GT HQ

United Kingdom (Remote)
2 Months ago
HP - Software Systems Engineer for Sure Click

HP

Cambridge, England, United Kingdom (On-Site)
3 Weeks ago
Gravitee - Customer Account Manager

Gravitee

London, England, United Kingdom (Hybrid)
4 Weeks ago
LeoVegas - Senior Analyst

LeoVegas

Newcastle Upon Tyne, England, United Kingdom (Hybrid)
3 Months ago
Alpha Sense - Global Strategic Account Leader

Alpha Sense

London, England, United Kingdom (On-Site)
1 Month ago
Marsh McLennan - Senior Client Executive/Account Manager - Aerospace

Marsh McLennan

Norwich, England, United Kingdom (Hybrid)
2 Months ago
Saviynt - Sales Development Representative

Saviynt

London, England, United Kingdom (Remote)
1 Month ago
Tesla - Service Technician / Automotive Mechanic

Tesla

Cardiff, Wales, United Kingdom (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Data Analysis Jobs

Electronic Arts - Senior AI Data Scientist

Electronic Arts

Kirkland, Washington, United States (On-Site)
3 Months ago
Trek - Business Analyst (Mobile)

Trek

Haryana, India (On-Site)
5 Months ago
binance - On-chain Data Analyst

binance

Taipei City, Taiwan (Remote)
4 Months ago
FalconX - Senior Software Engineer - Data

FalconX

Bengaluru, Karnataka, India (On-Site)
2 Months ago
binance - Senior QA Engineer - Big Data (Auto & BE Testing)

binance

Taipei City, Taiwan (Hybrid)
1 Year ago
playrix  - Data QA Engineer

playrix

Ukraine (Remote)
9 Months ago
zeta - Data Engineer II

zeta

Bengaluru, Karnataka, India (On-Site)
3 Months ago
easygo - Senior Data Engineer

easygo

Melbourne, Victoria, Australia (On-Site)
6 Months ago
Luxoft - Data Engineer

Luxoft

Mexico City, Mexico City, Mexico (Remote)
9 Months ago
Spyke Games - Data Scientist

Spyke Games

İstanbul, Türkiye (On-Site)
10 Months ago

Get notifed when new similar jobs are uploaded