Machine Learning Engineer (Data & Evaluation Infrastructure)

1 Month ago • All levels • Data Analysis

Job Summary

Job Description

We are seeking a Machine Learning Engineer (MLE) to manage our post-training evaluation pipeline. The role involves building and scaling evaluation processes to assess model capabilities across various tasks, pinpointing areas of failure, and driving improvements. Key responsibilities include identifying tasks for evaluation, creating or curating test cases and measurement methods, implementing evaluations through objective verification, LLM judging, reward modeling, or human evaluation. You will also be responsible for expanding coverage, deeply analyzing failure cases, identifying solutions, and developing scalable and accessible internal evaluation presentation methods, such as GUIs or Slurm scripts.
Must have:
  • Experience with evaluation frameworks
  • Experience with automated and human evaluation
  • Ability to build evaluation infrastructure from scratch
  • Scale existing systems
Good to have:
  • History of OSS contributions

Job Details

We’re looking for an MLE to own our post-training evaluation pipeline. You’ll build and scale evals depth and breadth that measure model capabilities across diverse tasks, identify failure modes, and drive model improvements.

Responsibilities:

  • Identifying tasks for evaluation coverage
  • Creating, curating, or generating test cases and ways to measure these tasks
  • Implementing evaluation through objective output verification, LLM judge/reward modeling, human evaluation, or any tricks of the trade you may bring to the table
  • Adding coverage and diving deep into analyzing what’s really gone wrong in failure cases
  • Identifying ways to remedy failure cases
  • Developing ways to present and make the evals scalable and accessible internally (e.g. light GUIs, scalable Slurm scripts, etc for running the evals)

Qualifications:

  • Strong experience with evaluation frameworks
  • Experience with both automated and human evaluation methodologies
  • Ability to build evaluation infrastructure from scratch and scale existing systems

Preferred:

  • History of OSS contributions

Similar Jobs

Simcorp - Senior Business Consultant - Investment Compliance

Simcorp

Manila, Metro Manila, Philippines (On-Site)
3 Months ago
Sega (UK) - Localisation QA Project Manager (LATAM - Spanish)

Sega (UK)

London, England, United Kingdom (On-Site)
3 Weeks ago
Welltech - Paid Social Growth Manager

Welltech

Barcelona, Catalonia, Spain (Remote)
3 Months ago
extreme network - Senior QA Escalation Engineer

extreme network

San Jose, California, United States (On-Site)
2 Months ago
HCL Tech - Senior Embedded Device Tester

HCL Tech

California, United States (On-Site)
2 Months ago
Expedia - Data Engineer III

Expedia

Austin, Texas, United States (Hybrid)
1 Month ago
bytedance - Data Analyst - Corporate Information System

bytedance

Singapore (On-Site)
4 Months ago
Socialpoint - Principal Data Analyst

Socialpoint

Barcelona, Catalonia, Spain (On-Site)
1 Month ago
N-ix - Middle Data Engineer

N-ix

Poland (Hybrid)
1 Month ago
The Walt Disney Company - Business Systems Analyst

The Walt Disney Company

Hong Kong (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

FalconX - Senior Software Engineer

FalconX

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Capgemini - TAP BA

Capgemini

Chennai, Tamil Nadu, India (On-Site)
3 Months ago
Ariens Company - Seasonal Part-Time/Occasional Field Test Operator

Ariens Company

Sebring, Florida, United States (On-Site)
1 Month ago
Interactive Brokers - Senior Automation Quality Assurance Engineer

Interactive Brokers

Mumbai, Maharashtra, India (Hybrid)
3 Months ago
Marvell - Senior Principal Product Engineer

Marvell

Westlake Village, California, United States (On-Site)
4 Weeks ago
Fearless - Test Engineer II (Cloud Based Systems Engineering) - Navy

Fearless

Charleston, South Carolina, United States (On-Site)
1 Month ago
Novomatic - QA Engineer with C#

Novomatic

Zabierzów, Lesser Poland Voivodeship, Poland (Remote)
2 Months ago
luxsoft - Business Analyst with Post Trade and Corporate Actions

luxsoft

Bengaluru, Karnataka, India (On-Site)
1 Month ago
kaizen gaming  - Software Engineer in Test

kaizen gaming

Thessaloniki, Greece (Hybrid)
2 Months ago
Syniverse - Roaming Tester

Syniverse

San José Province, Costa Rica (On-Site)
1 Year ago

Get notifed when new similar jobs are uploaded

Jobs in Worldwide

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Data Analysis Jobs

Apple - Senior / Staff Data Infrastructure Engineer for Lakehouse, Apple Data Platform

Apple

Cupertino, California, United States (On-Site)
2 Months ago
kaizen gaming  - Data Analyst

kaizen gaming

São Paulo, Brazil (Hybrid)
1 Month ago
Super.com - Manager, Data Analytics

Super.com

United States (Remote)
4 Months ago
Salesforce - Data Architect

Salesforce

Bengaluru, Karnataka, India (On-Site)
1 Year ago
Brillio - Lead Data Engineer

Brillio

Chicago, Illinois, United States (Hybrid)
3 Weeks ago
Xepelin - Data Scientist

Xepelin

Santiago, Santiago Metropolitan Region, Chile (Hybrid)
3 Months ago
The game - Data & Insights Analyst

The game

London, England, United Kingdom (Hybrid)
4 Months ago
ShyftLabs - Data Architect (Data Modernization)

ShyftLabs

Toronto, Ontario, Canada (Hybrid)
1 Month ago
Addepar - Sr. Software Data Engineer

Addepar

Pune, Maharashtra, India (Hybrid)
3 Weeks ago
Thumbtack - Senior Data Scientist, Product

Thumbtack

Ontario, California, United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded