Machine Learning Engineer, GenAI Quality

2 Months ago • 3 Years + • Quality Assurance • $172,000 PA - $300,000 PA

Job Summary

Job Description

This role focuses on developing ML systems to automate data quality evaluation and generation using large language models. You will build scalable systems to assess quality across accuracy, instruction adherence, factuality, and reasoning — and design robust evaluation frameworks to ensure alignment with human standards. You will be deeply involved in the full lifecycle: from model design and fine-tuning, to prototyping, deployment, and monitoring. You will partner closely with engineering, research, and product teams to deliver cutting-edge solutions for both customers and internal GenAI data engines.
Must have:
  • 3+ years of experience designing, training, and deploying ML models
  • Strong background in NLP, LLMs, and deep learning frameworks
  • Experience building microservices and deploying ML pipelines
  • Practical knowledge of LLM fine-tuning and evaluation
  • Strong programming skills and a solid foundation in algorithms
Good to have:
  • Experience with post-training LLM techniques
  • Familiarity with data evaluation pipelines, dataset curation
  • Background in multimodal ML or model evaluation

Job Details

About Scale:

Scale’s Generative AI ML team develops models and services to power high-quality data generation and evaluation for the most advanced large language models on earth. We also conduct applied research on model supervision and algorithmic approaches that support frontier models for Scale’s applied-ML teams and the broader AI community. Scale is uniquely positioned at the center of the AI ecosystem as a leading provider of training and evaluation data, end-to-end ML lifecycle solutions, and frontier evaluations for public and private institutions.

About The Role:

This role focuses on developing ML systems to automate data quality evaluation and generation using large language models. You’ll build scalable systems to assess quality across accuracy, instruction adherence, factuality, and reasoning — and design robust evaluation frameworks to ensure alignment with human standards. This is one of the highest impact areas in the company and directly accelerates the development of aligned, performant foundation models.

You’ll be deeply involved in the full lifecycle: from model design and fine-tuning, to prototyping, deployment, and monitoring. You’ll partner closely with engineering, research, and product teams to deliver cutting-edge solutions for both customers and internal GenAI data engines — Scale’s fastest-growing business.

If you’re excited about combining human-machine evaluation, scaling high-quality training data, and shaping the next generation of foundation models, we’d love to hear from you.

You will:

  • Design, fine-tune, and evaluate large language models for structured quality evaluation and data generation tasks
  • Develop robust evaluation frameworks to assess performance across accuracy, instruction following, reasoning, and other critical dimensions
  • Build and maintain scalable ML services to automatically assess and generate high-quality training and evaluation data
  • Research and apply state-of-the-art techniques in LLM training, post-training alignment (e.g., instruction tuning, RLHF), and tool-augmented reasoning
  • Collaborate with research scientists, engineers, and product teams to integrate your work into production services used by top AI developers

Ideally you’d have:

  • 3+ years of experience designing, training, and deploying ML models in production environments
  • Strong background in NLP, LLMs, and deep learning frameworks like PyTorch, TensorFlow, or JAX
  • Experience building microservices and deploying ML pipelines in cloud environments (e.g., AWS or GCP)
  • Practical knowledge of LLM fine-tuning and evaluation for tasks like factuality, instruction adherence, and chain-of-thought reasoning
  • Strong programming skills (e.g., Python) and a solid foundation in algorithms and data structures
  • Strong communication skills and experience working cross-functionally

Nice to haves:

  • Experience with post-training LLM techniques (instruction tuning, RLHF, tool use, or agent-based reasoning)
  • Familiarity with data evaluation pipelines, dataset curation, or scalable annotation workflows
  • Background in multimodal ML or model evaluation across domains such as code or long-context generation

Similar Jobs

Canonical - Observability Engineering Manager

Canonical

(Remote)
1 Month ago
Canonical - Software Architect - Containers / Virtualisation

Canonical

(Remote)
1 Month ago
Mozilla - Staff Software Engineer

Mozilla

New Zealand (Remote)
1 Month ago
bytedance - Senior Payroll Analyst

bytedance

Bangkok, Bangkok, Thailand (On-Site)
3 Months ago
BetterMe - Senior Backend (Node.js) Engineer (Web)

BetterMe

Kyiv, Kyiv City, Ukraine (Remote)
1 Month ago
Everi - Analyst QA IV

Everi

Winnipeg, Manitoba, Canada (Hybrid)
2 Months ago
Luxoft - Senior QA Analyst - AML & FinCrime

Luxoft

Chennai, Tamil Nadu, India (On-Site)
7 Months ago
reality twist - QA Analyst (Manual)

reality twist

Vaughan, Ontario, Canada (On-Site)
2 Months ago
hogarth - QA Engineer

hogarth

Hyderabad, Telangana, India (Hybrid)
1 Month ago
singularity 6 - QA Application Drop Box

singularity 6

United States (Hybrid)
7 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Scale AI - Security Engineer, Federal Assurance

Scale AI

Washington, District Of Columbia, United States (On-Site)
2 Months ago
Notion - Enterprise Technical Support Specialist

Notion

Hyderabad, Telangana, India (On-Site)
1 Month ago
PwC - Expert AI Developer

PwC

Kraków, Lesser Poland Voivodeship, Poland (Hybrid)
9 Months ago
legion - Senior Integration Engineer

legion

Bucharest, Bucharest, Romania (Hybrid)
3 Weeks ago
PayPal - Manager, Data Science

PayPal

San Jose, California, United States (Hybrid)
3 Weeks ago
Safe security - Regional Vice President, Sales (Midwest)

Safe security

(Remote)
2 Months ago
miniclip - SuccessFactors System Specialist

miniclip

Lisbon, Lisbon, Portugal (On-Site)
1 Month ago
Penrose studios - Blockchain Engineer

Penrose studios

San Francisco, California, United States (On-Site)
4 Years ago
Xsolla - Software Architect

Xsolla

Los Angeles, California, United States (Hybrid)
1 Month ago
PwC - Finance Transformation Consultant

PwC

Bangkok, Bangkok, Thailand (On-Site)
9 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

Roblox - Motion Designer

Roblox

San Mateo, California, United States (On-Site)
2 Weeks ago
Ion - Senior Technical Consultant - Endur

Ion

Uniondale, New York, United States (On-Site)
8 Months ago
Visa - Sr. Site Reliability Engineer - ServiceNow

Visa

Ashburn, Virginia, United States (Hybrid)
3 Weeks ago
Wolters Kluwer - Senior Sales Business Retention Associate

Wolters Kluwer

Kennesaw, Georgia, United States (Hybrid)
4 Weeks ago
Addepar - Sr. Manager, Product Management - Trading

Addepar

United States (Remote)
3 Weeks ago
BioFire - Field System Engineer

BioFire

Boise, Idaho, United States (On-Site)
1 Month ago
PayPal - Analyst, Compliance Investigations

PayPal

Omaha, Nebraska, United States (Hybrid)
3 Weeks ago
Axel springer - Business Development Director, Media Business

Axel springer

Arlington, Virginia, United States (On-Site)
3 Weeks ago
Hudl - Senior Quality Assurance Engineer

Hudl

Omaha, Nebraska, United States (Hybrid)
2 Months ago
Glocomms - IAM CyberArk Engineer

Glocomms

St. Petersburg, Florida, United States (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Quality Assurance Jobs

gameexcite - QA Intern

gameexcite

Germany (Remote)
2 Months ago
FICO - Software Quality Assurance-Engineer II

FICO

Bengaluru, Karnataka, India (On-Site)
1 Month ago
smartbear - Major Account Executive - Test Hub

smartbear

Somerville, Massachusetts, United States (On-Site)
4 Weeks ago
NVIDIA - System Design Power Validation Engineer

NVIDIA

Taipei City, Taiwan (On-Site)
2 Months ago
Wargaming - Head of QA (World of Warships, PC)

Wargaming

Belgrade, Serbia (On-Site)
2 Months ago
Glera - QA Team Lead

Glera

Vilnius, Vilnius County, Lithuania (On-Site)
1 Month ago
extreme network - SR QA SW ENGINEER

extreme network

Bengaluru, Karnataka, India (Hybrid)
2 Months ago
Entain group - QA Lead

Entain group

Australia (Remote)
3 Weeks ago
Contentstack - Senior Engineer I - QA

Contentstack

Pune, Maharashtra, India (Hybrid)
1 Month ago
Sika Group - Assistant / Deputy Manager - QA/QC

Sika Group

Mangaluru, Karnataka, India (On-Site)
3 Days ago

Get notifed when new similar jobs are uploaded

About The Company

New York, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Scale AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug