AI Data Engineering Lead

1 Month ago • All levels • Research Development • $250,000 PA - $450,000 PA

Job Summary

Job Description

Moonvalley is seeking an AI Data Engineering Lead to architect and scale data pipelines for their next-generation generative video models. The role is crucial for training models exclusively on clean, high-quality data. Responsibilities include designing data ingestion, annotation, and distributed systems for large-scale data processing and curation. Collaboration with researchers, engineers, and infrastructure teams is key to ensure data pipeline performance, traceability, and alignment with building the cleanest generative video foundation model. The role involves optimizing end-to-end performance, scaling pipelines, and managing a team of data engineers, working closely with leadership on data team roadmaps and filmakers on data acquisition.
Must have:
  • Build and scale data infrastructure for large-scale ML systems (video/multi-modal)
  • ML engineering experience with training and optimizing classifiers
  • Manage large-scale datasets and pipelines in production
  • Manage and lead small teams of engineers
  • Expertise in Python, Spark, Airflow
  • Understanding of Kubernetes, Terraform, object stores, distributed computing
  • Strong communication and leadership skills
  • Balance rapid delivery with long-term technical vision
Good to have:
  • Experience with foundational model training pipelines
  • Familiarity with dataset licensing, governance, compliance
  • Experience with video-specific data challenges

Job Details

Moonvalley is building the next generation creative studio, powered by the most capable video and image foundational models in the world. We are creating the platforms where the first generative Super Bowl ads and Oscar winning movies will be created.

We’re the most pedigreed team in generative AI, with top former Deepmind video researchers leading a research team as deep as any in the industry, product leaders who have built some of the best software products in the world, and an in-house Oscar-nominated movie studio. We’ve also raised $75m from world class investors including General Catalyst, Bessemer, Khosla Ventures & YCombinator.

Moonvalley is developing cutting-edge generative AI models designed to power Superbowl-worthy commercials and award-winning cinematic experiences. Our inaugural, cutting-edge HD model, Marey, is built on exclusively licensed and owned data for professional use in Hollywood and enterprise applications.

Our team is an unprecedented convergence of talent across industries. Our elite AI scientists from Deepmind, Google, Microsoft, Meta & Snap, have decades of collective experience in machine learning and computational creativity. We have also established the first AI-enabled movie studio in Hollywood, filled with accomplished filmmakers and visionary creative talent. We work with the top producers, actors, and filmmakers in Hollywood as well as creative-driven global brands. So far we’ve raised over $70M from world-class investors including General Catalyst, Bessemer, Khosla Ventures & YCombinator – and we’re just getting started.

Role Summary:

We’re looking for a Data Engineering Lead to architect and scale the data pipelines that power our next-generation generative video models. This role is central to our mission of training models exclusively on clean, high-quality data.

You will lead the design of data ingestion pipelines, data annotations, and high-throughput, distributed systems that support large-scale data processing and curation. You’ll work closely with researchers, engineers, and infrastructure teams to ensure that our data pipeline is not just performant, but trusted, traceable, and aligned with our goal of building the world’s cleanest generative video foundation model.

What you'll do:

  • Design and lead scalable, high-throughput data pipelines optimized for multi-modal video model training.

  • Build systems for data ingestion, deduplication, quality assessment, validation, filtering, and labeling to ensure only clean, high-quality data flows through the pipeline.

  • Collaborate with research to define data quality benchmarks.

  • Optimize end-to-end performance across distributed data processing frameworks (e.g., Apache Spark, Ray, Airflow).

  • Work with infrastructure teams to scale pipelines across thousands of GPUs.

  • Work directly with the leadership on the data team roadmaps.

  • Manage the team of data engineers.

  • Work together with filmakers on data acquisition.

What we're looking for:

  • Deep experience in building and scaling data infrastructure for large-scale ML systems, ideally for video or multi-modal models.

  • Solid background in ML engineering, including hands-on experience in training and optimizing classifiers.

  • Experience managing large-scale datasets and pipelines in production.

  • Experience in managing and leading small teams of engineers.

  • Expertise in Python, Spark, Airflow, or similar data frameworks.

  • Understanding of modern infrastructure: Kubernetes, Terraform, object stores (e.g. S3, GCS), and distributed computing environments.

  • Strong communication and leadership skills; you can bridge the gap between engineering and research.

  • Skilled at balancing rapid, iterative delivery with a focus on long-term technical vision, ensuring solutions are both pragmatic and architecturally elegant.

Nice to Haves:

  • Experience working on foundational model training pipelines (image, video, or language).

  • Familiarity with dataset licensing, governance, and compliance workflows.

  • Experience with video-specific data challenges like frame sampling, codec variability, temporal alignment, and perceptual quality scoring

In our team, we approach our work with the dedication similar to Olympic athletes. Anticipate occasional late nights and weekends dedicated to our mission. We understand this level of commitment may not suit everyone, and we openly communicate this expectation.

If you're motivated by deeply technical problems, a seemingly never-ending uphill battle and the opportunity to build (and own) a generational technology company, we can give you what you're looking for.

All business roles at Moonvalley are hybrid positions by default, with some fully remote depending on the job scope. We meet a few times every year, usually in London, UK or North America (LA, Toronto) as a company.

If you're excited about the opportunity to work on cutting-edge AI technology and help shape the future of media and entertainment, we encourage you to apply. We look forward to hearing from you!

The statements contained in this job description reflect general details as necessary to describe the principal functions of this job, the level of knowledge and skill typically required and the scope of responsibility. It should not be considered an all-inclusive listing of work requirements. Individuals may perform other duties as assigned, including work in other functional areas to cover absences, to equalize peak work periods, or to otherwise balance organizational work

Moonvalley AI is proud to be an equal opportunity employer. We are committed to providing accommodations. If you require accommodation, we will work with you to meet your needs.

Please be assured we'll treat any information you share with us with the utmost care, only use your information for recruitment purposes and will never sell it to other companies for marketing purposes. Please review our privacy policy and job applicant privacy policy located here for further information.

Similar Jobs

Adyen - Engineering Manager - Onboarding Experience

Adyen

Chicago, Illinois, United States (On-Site)
3 Months ago
Rockstar Games - Security Risk and Compliance Lead

Rockstar Games

New York, United States (On-Site)
3 Months ago
Adyen - Team Lead Account Management

Adyen

Berlin, Berlin, Germany (On-Site)
1 Month ago
Yggdrasil Sandbox - Technical Artist

Yggdrasil Sandbox

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
1 Month ago
Zinnia - Senior Manager – NetSuite Planning and Budgeting

Zinnia

Pune, Maharashtra, India (On-Site)
1 Month ago
Perplexity - AI Machine Learning Engineer - Query Understanding

Perplexity

San Francisco, California, United States (Hybrid)
2 Months ago
Keywords International - Senior AI Engineer (AI - Customer Support Copilots) SE II

Keywords International

Pune, Maharashtra, India (Hybrid)
1 Month ago
extreme network - Senior AI/ML Engineer – Generative AI & Autonomous Agents

extreme network

Toronto, Ontario, Canada (Hybrid)
4 Months ago
bytedance - Research Scientist in Foundation Model, Speech Understanding - 2025 Start (PhD)

bytedance

San Jose, California, United States (On-Site)
9 Months ago
DevRev - Architect - Applied AI Engineer

DevRev

(Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Toast - Staff Security Engineer (Product Security & IAM)

Toast

Dublin, County Dublin, Ireland (Hybrid)
2 Months ago
PwC - Senior Manager, Azure Data Architect, Data Analytics, Advisory

PwC

Bengaluru, Karnataka, India (On-Site)
1 Year ago
Riot Games - Brand Manager, VALORANT (Contractor)

Riot Games

Shanghai, Shanghai, China (On-Site)
3 Months ago
bytedance - Global Head of Solution Architect, SealSuite

bytedance

Singapore (On-Site)
7 Months ago
AECOM - Geotechnical Engineer – Task Lead

AECOM

Piscataway, New Jersey, United States (On-Site)
1 Month ago
IGG - Producer

IGG

Singapore (On-Site)
9 Months ago
Lilt - Arabic Israel based Medical Translators needed

Lilt

Israel (Remote)
3 Months ago
Clearwater Analytics - Senior Accountant - SEC Reporting & Technical Accounting

Clearwater Analytics

San Jose, California, United States (On-Site)
1 Month ago
zeta - Associate Product Manager II

zeta

Bengaluru, Karnataka, India (On-Site)
2 Weeks ago
GoTo Group - Merchant Account Management Experience Manager

GoTo Group

Jakarta, Indonesia (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in London, England, United Kingdom

Whatnot - Sales Manager, Europe

Whatnot

London, England, United Kingdom (On-Site)
1 Month ago
Jane Street - Mobility/MDM Engineer

Jane Street

London, England, United Kingdom (On-Site)
2 Months ago
Square - Sales Advisor

Square

Belfast, Northern Ireland, United Kingdom (On-Site)
1 Month ago
Epic Games - Product Director, LiveOps

Epic Games

London, England, United Kingdom (On-Site)
3 Months ago
Nasdaq - Index Sales Manager - Europe

Nasdaq

London, England, United Kingdom (Hybrid)
3 Months ago
Yahoo - DSP Client Services Senior Account Specialist

Yahoo

United Kingdom (Hybrid)
1 Year ago
Clearwater Analytics - Risk Development Team Lead

Clearwater Analytics

London, England, United Kingdom (On-Site)
4 Weeks ago
Lunar animation studios - 3D Modelling and texture ARTIST

Lunar animation studios

Sheffield, England, United Kingdom (On-Site)
2 Months ago
Sega (UK) - Softlines Product Development Manager

Sega (UK)

Brentford, England, United Kingdom (Hybrid)
3 Months ago
Diligent Corporation - Solutions Engineer II

Diligent Corporation

London, England, United Kingdom (Hybrid)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

CyberArk - AI Value Architect (PM/PO)

CyberArk

United States (Hybrid)
3 Months ago
bytedance - Research Scientist Graduate (Foundation Model - Vision and Language)

bytedance

Seattle, Washington, United States (On-Site)
4 Months ago
Snorkel AI - Research Engineer

Snorkel AI

San Francisco, California, United States (Remote)
3 Months ago
Ion - AI Engineer - Graduate Development Program

Ion

Pisa, Tuscany, Italy (On-Site)
10 Months ago
Aeries technology - Account Research Specialist

Aeries technology

Bengaluru, Karnataka, India (On-Site)
3 Months ago
PayPal - Sr Staff Software Engineer – Machine Learning

PayPal

San Jose, California, United States (Hybrid)
2 Months ago
Instawork - Senior ML Engineer

Instawork

Bengaluru, Karnataka, India (On-Site)
2 Months ago
bytedance - Machine Learning Engineer - Inference

bytedance

Seattle, Washington, United States (On-Site)
4 Months ago
Banyan Software - AI/ML Engineer

Banyan Software

Chennai, Tamil Nadu, India (On-Site)
4 Weeks ago
Aera Technology - Client Partner | Enterprise Platform Sales | AI /ML Decision Intelligence | Texas

Aera Technology

Texas, United States (Hybrid)
9 Months ago

Get notifed when new similar jobs are uploaded