Machine Learning Research Scientist / Research Engineer, LLM Evaluation

2 Months ago • All levels • Research Development • $220,000 PA - $325,000 PA

Job Summary

Job Description

As the leading data and evaluation partner for frontier AI companies, Scale is dedicated to advancing the evaluation and benchmarking of large language models (LLMs). The role involves designing and developing novel evaluation benchmarks for large language models, covering areas such as coding, instruction following, factuality, robustness, and fairness. Responsibilities include conducting research on the effectiveness and limitations of existing LLM evaluation techniques, collaborating with internal teams and external partners, implementing scalable and reproducible evaluation pipelines, and publishing research findings. Successful candidates will partner with top foundation model labs, providing both technical and strategic input on the development of the next generation of generative AI models.
Must have:
  • Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field.
  • Strong background in deep learning and LLMs, with experience in model evaluation.
  • Familiarity with benchmarking tools and datasets for LLM evaluation.
  • Hands-on experience large-scale model training and deployment.
  • Excellent written and verbal communication skills.
  • Published research in areas of machine learning at major conferences.
Good to have:
  • Previous experience in a customer facing role.

Job Details

As the leading data and evaluation partner for frontier AI companies, Scale is dedicated to advancing the evaluation and benchmarking of large language models (LLMs). We are building industry-leading LLM leaderboards, setting new standards for model performance assessment. Our mission is to develop rigorous, scalable, and fair evaluation methodologies to drive the next generation of AI capabilities.

We are seeking Research Scientists and Research Engineers with expertise in LLM evaluation. You will play a key role in developing and implementing novel evaluation methodologies, metrics, and benchmarks to assess the capabilities and limitations of our cutting-edge LLMs. We encourage collaborations within the industry and academia, and support the publication of research findings. Successful candidates will partner with top foundation model labs, providing both technical and strategic input on the development of the next generation of generative AI models.

You will:

  • Design and develop novel evaluation benchmarks for large language models, covering areas such as coding, instruction following, factuality, robustness, and fairness.
  • Conduct research on the effectiveness and limitations of existing LLM evaluation techniques.
  • Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols.
  • Implement scalable and reproducible evaluation pipelines using modern ML frameworks.
  • Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives.

Ideally you’d have:

  • Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field.
  • Strong background in deep learning and LLMs, with experience in model evaluation.
  • Familiarity with benchmarking tools and datasets for LLM evaluation.
  • Hands-on experience large-scale model training and deployment.
  • Excellent written and verbal communication skills.
  • Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals.
  • Previous experience in a customer facing role.

Similar Jobs

SparkCognition - Software Engineer (Frontend)

SparkCognition

Bengaluru, Karnataka, India (On-Site)
9 Months ago
nubank - Lead Network Engineer

nubank

State Of São Paulo, Brazil (Hybrid)
1 Month ago
Palo Alto Networks - Partner Technical Enablement Manager

Palo Alto Networks

Bengaluru, Karnataka, India (On-Site)
1 Month ago
gitlab - Support Engineer (EMEA)

gitlab

(Remote)
1 Month ago
Power Integrations - Systems & Infrastructure Applications Engineer

Power Integrations

Pasig, Metro Manila, Philippines (On-Site)
9 Months ago
bytedance - AI Security Researcher - Security Flow

bytedance

San Jose, California, United States (On-Site)
8 Months ago
Tesla - Associate Application Engineer R&D

Tesla

Prüm, Rhineland-Palatinate, Germany (On-Site)
4 Months ago
Balbix - AI/ML Architect

Balbix

Bengaluru, Karnataka, India (On-Site)
8 Months ago
ISS Stoxx - ESG Research Analyst (Diverse Sectors)

ISS Stoxx

Makati City, Metro Manila, Philippines (On-Site)
1 Year ago
Alpha Sense - Senior AI Engineer

Alpha Sense

Bengaluru, Karnataka, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

BioFire - Materials Handler I (B Shift)

BioFire

Salt Lake City, Utah, United States (On-Site)
2 Weeks ago
Zones - Field Services Technician

Zones

San Francisco, California, United States (On-Site)
7 Months ago
T systems - ServiceNow Architect

T systems

Pune, Maharashtra, India (On-Site)
4 Days ago
techholding - Platform Software Engineer

techholding

Mexico (Remote)
1 Month ago
Games talent (Staffing and recruiting) - Senior Data Scientist

Games talent (Staffing and recruiting)

(Remote)
2 Months ago
bytedance - Procurement Manager - Travel Resources

bytedance

San Jose, California, United States (On-Site)
2 Months ago
NVIDIA - Senior Firmware Bringup Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
version 1 - Senior Azure Administrator

version 1

Birmingham, England, United Kingdom (On-Site)
2 Weeks ago
OKX - HR Business Partner Director

OKX

Singapore (On-Site)
1 Month ago
Accenture - Delivery Operations Associate Manager

Accenture

Bengaluru, Karnataka, India (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

HCL Tech - Technical Support Specialist

HCL Tech

Illinois, United States (On-Site)
3 Days ago
bytedance - Organizational Effectiveness Consultant - Advertising

bytedance

New York, New York, United States (On-Site)
3 Months ago
Progress - Business Development Representative - eCommerce (Digital Assist Team)

Progress

Raleigh, North Carolina, United States (Hybrid)
1 Month ago
Kavalirio - Part-time Systems Analyst

Kavalirio

Woodland Park, Colorado, United States (On-Site)
1 Month ago
Opendoor - Business Operations Lead, Partnerships

Opendoor

San Francisco, California, United States (Hybrid)
1 Month ago
Next Level Business Services - SAP PP

Next Level Business Services

Naples, Florida, United States (On-Site)
8 Months ago
Google - Software Engineer III, Infrastructure, Google TV

Google

San Jose, California, United States (On-Site)
7 Months ago
UPF Industries  - Truck Driver

UPF Industries

Hillsboro, Texas, United States (On-Site)
1 Month ago
Ansys - Senior R&D Engineer

Ansys

Canonsburg, Pennsylvania, United States (On-Site)
1 Month ago
bytedance - Senior Site Reliability Engineer - Data Infrastructure (San Jose)

bytedance

San Jose, California, United States (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

Apple - Machine Learning Engineer - LLM

Apple

San Diego, California, United States (On-Site)
6 Days ago
FICO - Analytic Science — Pre-Sales Lead Scientist

FICO

London, England, United Kingdom (On-Site)
1 Month ago
Qualcomm - AI SDK Software Engineer

Qualcomm

Shanghai, China (On-Site)
1 Month ago
AppLovin - Machine Learning Engineer

AppLovin

Beijing, Beijing, China (On-Site)
10 Months ago
NVIDIA - Senior Computer Architect - Deep Learning

NVIDIA

Santa Clara, California, United States (On-Site)
5 Months ago
bytedance - Machine Learning Graduate (E-Commerce Governance-CV/NLP/Multimodal/LLM)

bytedance

Seattle, Washington, United States (On-Site)
3 Weeks ago
Zscaler - Sr. Staff Machine Learning Engineer

Zscaler

San Jose, California, United States (Hybrid)
1 Month ago
NVIDIA - Senior AI-HPC Cluster Engineer

NVIDIA

Santa Clara, California, United States (Hybrid)
3 Months ago
Google - Senior Staff Software Engineer, AI/ML GenAI, Google Ads

Google

Mountain View, California, United States (On-Site)
2 Months ago
bytedance - Senior Software Engineer - IaaS AI Infra

bytedance

Seattle, Washington, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

New York, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Scale AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug