AI Evaluation Manager

2 Months ago • 5 Years + • $300,000 PA - $350,000 PA
Research Development

Job Description

Luma is seeking an AI Evaluation Manager to shape and scale the understanding, measurement, and improvement of generative AI model performance. This role involves partnering with researchers, engineers, and technical artists to evaluate models against real-world creative use cases. The manager will design frameworks for qualitative nuance and identify actionable insights to guide development, focusing on building evaluative systems that match the complexity of human perception and creativity, rather than simply checking metrics.
Good To Have:
  • Background in motion, visual effects, or storytelling.
  • Experience evaluating AI-generated media.
  • Experience building internal qualitative data tools.
  • Familiarity with prompt engineering.
Must Have:
  • Evaluate generative model performance.
  • Identify failure modes and regressions.
  • Develop scalable qualitative evaluation frameworks.
  • Collaborate with technical artists and engineers.
  • Translate product goals into evaluative criteria.
  • Lead qualitative studies and human-in-the-loop evaluations.
  • Provide feedback for model fine-tuning.
  • Stay informed about generative AI evaluation standards.
  • Master's degree in relevant field or equivalent experience.
  • 5+ years in product evaluation or UX research.
  • Familiarity with creative workflows and generative models.
  • Strong systems thinking for defining abstract qualities.
  • Experience working cross-functionally.
  • Excellent written communication and synthesis skills.

Add these skills to join the top 1% applicants for this job

real-time-vfx
communication
storytelling

About the Role

Luma is pushing the boundaries of generative AI, building tools that redefine how visual content is created. We’re seeking a candidate to help shape and scale the way we understand, measure, and improve model performance. In this role, you’ll partner with researchers, engineers, and technical artists to evaluate our models against real-world creative use cases, design frameworks that capture qualitative nuance, and identify actionable insights that guide development.

This is not a checkbox metrics role — it's about building evaluative systems that match the complexity of human perception, creativity, and intention.

Responsibilities

  • Evaluate generative model performance across diverse tasks, prompts, and modalities.

  • Identify key failure modes, regression patterns, and edge cases that impact product quality.

  • Develop and maintain qualitative evaluation frameworks that are scalable and reusable.

  • Collaborate closely with technical artists and engineers to align evaluations with model capabilities and target use cases.

  • Translate high-level product goals into concrete evaluative criteria.

  • Lead qualitative studies, side-by-side comparisons, and human-in-the-loop evaluation efforts.

  • Provide detailed feedback that informs model fine-tuning, dataset curation, and product UX.

  • Stay informed about emerging evaluation standards in generative AI and creative tools.

Qualifications

  • Master’s degree or higher in Cognitive Science, Human-Computer Interaction (HCI), Design Research, Psychology, Media Studies, or a related field.

  • 5+ years of experience in product evaluation, UX research, model testing, or similar roles that involve structured qualitative assessment.

  • Deep familiarity with creative workflows and real-world use cases for generative models (e.g., animation, filmmaking, digital art, VFX).

  • Strong systems thinking and the ability to define abstract qualities (like believability, identity retention, or scene coherence) in clear evaluative terms.

  • Experience working cross-functionally with engineers, researchers, and creatives.

  • Excellent written communication skills and the ability to synthesize nuanced judgments into clear, actionable insights.

Nice to Have

  • Background in motion, visual effects, or storytelling pipelines

  • Experience evaluating AI-generated media (video, images, 3D)

  • Prior work on building internal tools for qualitative data collection or scoring

  • Familiarity with prompt engineering and reference-based input methods

Set alerts for more jobs like AI Evaluation Manager
Set alerts for new jobs by Luma
Set alerts for new Research Development jobs in United States
Set alerts for new jobs in United States
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙