Apple is where individual imaginations gather together, committing to the values that lead to great work! Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other’s ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It’s the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you’ll do more than join something — you’ll add something! We are seeking an exceptional and highly-motivated Engineer to lead our team’s data quality and curation process across a variety of core products and technologies. Our team helps develop multimodal data modeling validation, and curation process and pipeline. We have a passion for design and the execution of data assurance operations to deliver trustworthy and widespread accepted data.
In this role, you will be responsible for the quality and accessibility of the multimodal data (including image, video, text, audio, sensor data, metadata, etc.) generated from various data collections, processing pipelines, and annotations. You'll design and implement systematic processes, automated pipelines, and collaborate with data collection, data processing, and ML & product engineers to create high quality data, support data-driven product and AIML development, and ensure data compliance with security and privacy regulations.
- Collaborate with cross-functional teams to establish comprehensive data quality assurance and curation processes, encompassing manual validation and automated workflows
- Define data quality metrics, implement data validation rules, and develop a scalable framework to execute diverse data validation and curation software components on the multimodal data
- Design and execute data assurance operations to run data validations, report its quality and facilitate the quality improvement throughout the data collection, processing and annotation
- Collaborate with cross-functional teams to implement a scalable framework and pipeline to extract, clean, transform, and standardize the multimodal data and metadata generated from a wide range of sources in order to make the data trustworthy and widespread discoverable and accessible
- Develop and drive the feedback loop between data consumers and data generation
Key Qualifications
- 6+ years of industry experience architecting and developing scalable and reliable software, pipeline and platforms for validation, analytics and curation on the multimodal data (including image, video, text, audio, sensor data, etc.)
- B.S. in Computer Science and/or an equivalent engineering field
- Proficiency with programming languages Python, Java, SQL or equivalent
- Proficiency with data pipeline, modeling, database and query tools, like Dagster, PostgreSQL, MangoDB, Trino or equivalent
- Experience with vision data processing tools like FFmpeg, GStreamer, OpenCV, or equivalent
- Able to rotate on-call for mission-critical operations and applications
Additional Requirements
- Passion for data quality and curation, code elegance, clear documentation, operational excellence, attention to details and delivering outstanding user experiences
- Excellent communication skills with ability to confidently express the benefits and constraints of technology solutions to cross-functional technical and non-technical teams
- Experience in building Cloud Data Warehouses in Snowflake, Redshift, BigQuery or analogous architectures
- Experience with the practical application of data warehousing concepts, methodologies, and frameworks
- Experience with AI/ML frameworks like TensorFlow or PyTorch
- Experience with machine learning algorithms for data curation and annotation
- Experience in managing a team
- Experience with data collection and/or annotation operations