Home >

Jobs >

Staff Data Engineer

Lucas Films

California, United States (Hybrid)

Staff Data Engineer

1 Day ago • 8 Years + • Data Analysis • $166,800 PA - $223,600 PA

Job Summary

Job Description

The Skywalker Sound Development Group is seeking an experienced Data Engineer to specialize in the creation, management, and optimization of data pipelines to support cutting-edge AI/ML research. This is a critical role in preparing high-quality datasets for the training, retraining, and evaluation of machine learning models tailored to immersive and multichannel audio applications. As a Data Engineer, you will focus on developing robust pipelines for processing complex media datasets, enabling AI/ML researchers to build transformative solutions for speech processing, style transfer, and source separation. Your work will directly contribute to creating innovative soundtrack workflows for global media production.

Must have:

Design, implement, and maintain scalable, automated data pipelines for the ingestion, preprocessing, and transformation of large-scale audio datasets.
Ensure pipelines support efficient model training and retraining workflows, enabling continuous improvement of AI/ML models.
Collaborate with AI/ML researchers to define data requirements and integrate feedback to improve data pipeline functionality.
Develop advanced preprocessing techniques for immersive and multichannel audio formats (e.g., Dolby Atmos, high-order ambisonics).
Automate data cleaning, normalization, and augmentation processes to prepare datasets for various model architectures, including foundational models and transformers.
Integrate external datasets and APIs while ensuring compliance with legal and ethical data usage standards.
Monitor and optimize pipeline performance to handle complex and dynamic data structures effectively.
Create tools and workflows for annotating, labeling, and curating datasets, including the use of active learning methods.
Perform exploratory data analysis to uncover trends, validate dataset quality, and identify data gaps.
Master’s Degree with preference for PhD in Data Engineering/Science, Computer Science, Signal Processing, or a related field.
8+years of experience in data engineering or data science with a focus on building pipelines for AI/ML applications.
Proficiency in Python, with expertise in data manipulation libraries such as Pandas, NumPy, and PyTorch’s data utilities.
Hands-on experience with audio processing libraries and tools (e.g., Librosa, FFmpeg, SoX) for handling complex audio formats.
Familiarity with scalable pipeline tools like GitLab, Apache Spark, Airflow, or Luigi, and experience with containerized workflows (Docker, Kubernetes).
Strong understanding of data pipeline requirements for model training, retraining, and evaluation in iterative research workflows.
Experience with immersive and multichannel audio formats.
Knowledge of cloud-based platforms and tools for storage and processing, such as AWS S3, Redshift, or Google BigQuery.

Good to have:

Experience integrating data pipelines with AI/ML workflows, including active learning and model retraining.
Familiarity with audio-specific datasets and metadata management strategies.
Knowledge of machine learning principles and how data quality impacts model performance.
Experience with distributed training pipelines and large-scale dataset processing.
Contributions to open-source projects or published research in the fields of data science or audio processing.
Experience with visualization tools (e.g., Tableau, Matplotlib) for quality assurance and exploratory data analysis.
Expertise in designing systems to support AI/ML model monitoring and retraining over time.

17 skills required

17 skills required for this role

Add these skills to join the top 1% applicants for this job

data-analytics

data-structures

game-texts

quality-control

gitlab

aws

tableau

spark

matplotlib

data-science

numpy

pytorch

pandas

docker

kubernetes

python

machine-learning

Job Details

Job Summary:

As a Data Engineer, you will focus on developing robust pipelines for processing complex media datasets, enabling AI/ML researchers to build transformative solutions for speech processing, style transfer, and source separation. Your work will directly contribute to creating innovative soundtrack workflows for global media production.

This role is considered Hybrid, which means the employee will work 2-3 days onsite at our Nicasio, CA office and occasionally from home.

What You'll Do

Design, implement, and maintain scalable, automated data pipelines for the ingestion, preprocessing, and transformation of large-scale audio datasets.
Ensure pipelines support efficient model training and retraining workflows, enabling continuous improvement of AI/ML models.
Collaborate with AI/ML researchers to define data requirements and integrate feedback to improve data pipeline functionality.
Develop advanced preprocessing techniques for immersive and multichannel audio formats (e.g., Dolby Atmos, high-order ambisonics).
Automate data cleaning, normalization, and augmentation processes to prepare datasets for various model architectures, including foundational models and transformers.
Integrate external datasets and APIs while ensuring compliance with legal and ethical data usage standards.
Monitor and optimize pipeline performance to handle complex and dynamic data structures effectively.
Create tools and workflows for annotating, labeling, and curating datasets, including the use of active learning methods.
Perform exploratory data analysis to uncover trends, validate dataset quality, and identify data gaps.

What We’re Looking For

Master’s Degree with preference for PhD in Data Engineering/Science, Computer Science, Signal Processing, or a related field.
8+years of experience in data engineering or data science with a focus on building pipelines for AI/ML applications.
Proficiency in Python, with expertise in data manipulation libraries such as Pandas, NumPy, and PyTorch’s data utilities.
Hands-on experience with audio processing libraries and tools (e.g., Librosa, FFmpeg, SoX) for handling complex audio formats.
Familiarity with scalable pipeline tools like GitLab, Apache Spark, Airflow, or Luigi, and experience with containerized workflows (Docker, Kubernetes).
Strong understanding of data pipeline requirements for model training, retraining, and evaluation in iterative research workflows.
Experience with immersive and multichannel audio formats.
Knowledge of cloud-based platforms and tools for storage and processing, such as AWS S3, Redshift, or Google BigQuery.
Strong problem-solving skills, with a proactive mindset for addressing evolving data challenges.

Preferred Qualifications

Experience integrating data pipelines with AI/ML workflows, including active learning and model retraining.
Familiarity with audio-specific datasets and metadata management strategies.
Knowledge of machine learning principles and how data quality impacts model performance.
Experience with distributed training pipelines and large-scale dataset processing.
Contributions to open-source projects or published research in the fields of data science or audio processing.
Experience with visualization tools (e.g., Tableau, Matplotlib) for quality assurance and exploratory data analysis.
Expertise in designing systems to support AI/ML model monitoring and retraining over time.

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Nicasio, California, United States

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Data Analysis Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

lucas films

31 Active Jobs

Get notified when new jobs are added by lucas films

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Staff Data Engineer

Job Summary

Job Description

17 skills required

17 skills required for this role

Job Details

Job Summary:

Similar Jobs

Looks like we're out of matches

Similar Skill Jobs

Looks like we're out of matches

Jobs in Nicasio, California, United States

Looks like we're out of matches

Data Analysis Jobs

Looks like we're out of matches

About The Company

Staff Data Engineer

Desktop Systems Specialist I

Sr Media Data Specialist

Line Producer -ILM London

HR Advisor (HR Operations)

Financial Analyst

Asset Technical Assistant (PH)

Lighting Technical Director (All Levels) - ILM London

Lead Environment Artist - ILM London

Creature Supervisor

Level Up Your Career in Game Development!