Machine Learning Research Scientist / Research Engineer, LLM Evaluation

1 Day ago • All levels • $220,000 PA - $325,000 PA

Job Summary

Job Description

As the leading data and evaluation partner for frontier AI companies, Scale is dedicated to advancing the evaluation and benchmarking of large language models (LLMs). The role involves designing and developing novel evaluation benchmarks for large language models, covering areas such as coding, instruction following, factuality, robustness, and fairness. Responsibilities include conducting research on the effectiveness and limitations of existing LLM evaluation techniques, collaborating with internal teams and external partners, implementing scalable and reproducible evaluation pipelines, and publishing research findings. Successful candidates will partner with top foundation model labs, providing both technical and strategic input on the development of the next generation of generative AI models.
Must have:
  • Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field.
  • Strong background in deep learning and LLMs, with experience in model evaluation.
  • Familiarity with benchmarking tools and datasets for LLM evaluation.
  • Hands-on experience large-scale model training and deployment.
  • Excellent written and verbal communication skills.
  • Published research in areas of machine learning at major conferences.
Good to have:
  • Previous experience in a customer facing role.

Job Details

As the leading data and evaluation partner for frontier AI companies, Scale is dedicated to advancing the evaluation and benchmarking of large language models (LLMs). We are building industry-leading LLM leaderboards, setting new standards for model performance assessment. Our mission is to develop rigorous, scalable, and fair evaluation methodologies to drive the next generation of AI capabilities.

We are seeking Research Scientists and Research Engineers with expertise in LLM evaluation. You will play a key role in developing and implementing novel evaluation methodologies, metrics, and benchmarks to assess the capabilities and limitations of our cutting-edge LLMs. We encourage collaborations within the industry and academia, and support the publication of research findings. Successful candidates will partner with top foundation model labs, providing both technical and strategic input on the development of the next generation of generative AI models.

You will:

  • Design and develop novel evaluation benchmarks for large language models, covering areas such as coding, instruction following, factuality, robustness, and fairness.
  • Conduct research on the effectiveness and limitations of existing LLM evaluation techniques.
  • Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols.
  • Implement scalable and reproducible evaluation pipelines using modern ML frameworks.
  • Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives.

Ideally you’d have:

  • Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field.
  • Strong background in deep learning and LLMs, with experience in model evaluation.
  • Familiarity with benchmarking tools and datasets for LLM evaluation.
  • Hands-on experience large-scale model training and deployment.
  • Excellent written and verbal communication skills.
  • Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals.
  • Previous experience in a customer facing role.

Similar Jobs

Riot Games - Manager, Data Science - League Studio, League Data Central

Riot Games

Los Angeles, California, United States (On-Site)
5 Months ago
Moloco - Machine Learning Engineer

Moloco

Seoul, South Korea (On-Site)
7 Hours ago
NVIDIA - Partner Business Manager, CDW - West

NVIDIA

Illinois, United States (Remote)
1 Week ago
NVIDIA - Senior GPU Architect, Profiling System

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
ByteDance - LLM Software Engineer/Researcher (Applied Machine Learning)

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Electronic Arts - Senior Software Engineer

Electronic Arts

Orlando, Florida, United States (On-Site)
4 Weeks ago
ByteDance - Research Scientist - Multimodal Foundation Model - 2025 Start

ByteDance

Singapore (On-Site)
5 Months ago
Kokotree - Artificial Intelligence Developers

Kokotree

Wilmington, North Carolina, United States (On-Site)
5 Months ago
NVIDIA - Senior Software Engineer - Distributed Inference

NVIDIA

Texas, United States (Remote)
1 Month ago
Google - Customer Engineer, AI/ML, HCLS, Google Cloud

Google

Chicago, Illinois, United States (On-Site)
2 Weeks ago
Mashgin - Senior Software Engineer, Infrastructure

Mashgin

Palo Alto, California, United States (Hybrid)
6 Months ago
Krafton  - Technical Project Manager

Krafton

Seoul, South Korea (On-Site)
1 Month ago
Google - Accelerator Architect and Performance Engineer, Generative AI

Google

Mountain View, California, United States (On-Site)
1 Week ago
NVIDIA - Senior Mixed Signal Design Engineer

NVIDIA

Taipei City, Taiwan (On-Site)
3 Months ago
NVIDIA - Senior Software Engineer - Triton Tools

NVIDIA

California, United States (Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

AGS - American Gaming Systems - PR and Communications Director

AGS - American Gaming Systems

Nevada, United States (On-Site)
1 Month ago
The Walt Disney Company - Sr. Marketing Data Scientist

The Walt Disney Company

Burbank, California, United States (On-Site)
1 Month ago
Inkittt - Director of Engineering

Inkittt

San Francisco, California, United States (Hybrid)
3 Months ago
ByteDance - Software Development Engineer (SDN Traffic Intelligence & Control)

ByteDance

San Jose, California, United States (On-Site)
1 Month ago
Epic Games - Narrative Systems Designer

Epic Games

United States (On-Site)
3 Months ago
Gupta - Marketing Data Analyst II

Gupta

Boston, Massachusetts, United States (On-Site)
1 Week ago
IGT - Field Service Technician II

IGT

Washington, United States (On-Site)
4 Months ago
Nintendo - Field Operations Analyst

Nintendo

Redmond, Washington, United States (Hybrid)
1 Month ago
InfoStretch Corporation - Data Warehouse Architect

InfoStretch Corporation

Lansing, Michigan, United States (On-Site)
1 Month ago
Microsoft - Member of Technical Staff, Platform Engineer

Microsoft

Redmond, Washington, United States (Hybrid)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Doha, Doha Municipality, Qatar (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Scale AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug