Apple Intelligence is driven by intentional data design—spanning careful sampling, creation, and curation of high-quality datasets, enriched with precise annotations. Our data powers our ability to evaluate and mitigate safety risks in new generative AI features. This role sits at the intersection of applied data science, empirical analysis, cultural and linguistic expertise, and stakeholder communication. It requires strong scientific judgment, cross-functional collaboration, and the ability to translate evaluation findings into actionable insights. - Develop metrics for evaluation of safety and fairness risks inherent to generative models and Gen-AI features - Design datasets, identify data needs, and work on creative solutions, scaling and expanding data coverage through human and synthetic generation methods - Collaborate with cross-functional partners—including engineering, product, and research teams—to ensure evaluations align with feature goals and deployment plans - Partner with policy teams to translate regional safety and inclusivity requirements into measurable evaluation criteria - Build expertise in machine translation and data synthesis techniques to generate localized and culturally aligned evaluation datasets at scale - Develop ML-based enhancements to red teaming, model evaluation, and other processes to improve the quality of Apple Intelligence’s user-facing products - Work with highly-sensitive content with exposure to offensive and controversial content