Machine Learning Engineer (Data & Evaluation Infrastructure)

Nousresearch

| Worldwide (On Site) | Full Time | 5 months ago

Apply Now

Job Summary

We are seeking a Machine Learning Engineer (MLE) to manage our post-training evaluation pipeline. The role involves building and scaling evaluation processes to assess model capabilities across various tasks, pinpointing areas of failure, and driving improvements. Key responsibilities include identifying tasks for evaluation, creating or curating test cases and measurement methods, implementing evaluations through objective verification, LLM judging, reward modeling, or human evaluation. You will also be responsible for expanding coverage, deeply analyzing failure cases, identifying solutions, and developing scalable and accessible internal evaluation presentation methods, such as GUIs or Slurm scripts.

Must Have

Experience with evaluation frameworks
Experience with automated and human evaluation
Ability to build evaluation infrastructure from scratch
Scale existing systems

Good to Have

History of OSS contributions

Job Description

We’re looking for an MLE to own our post-training evaluation pipeline. You’ll build and scale evals depth and breadth that measure model capabilities across diverse tasks, identify failure modes, and drive model improvements.

Responsibilities:

Identifying tasks for evaluation coverage
Creating, curating, or generating test cases and ways to measure these tasks
Implementing evaluation through objective output verification, LLM judge/reward modeling, human evaluation, or any tricks of the trade you may bring to the table
Adding coverage and diving deep into analyzing what’s really gone wrong in failure cases
Identifying ways to remedy failure cases
Developing ways to present and make the evals scalable and accessible internally (e.g. light GUIs, scalable Slurm scripts, etc for running the evals)

Qualifications:

Strong experience with evaluation frameworks
Experience with both automated and human evaluation methodologies
Ability to build evaluation infrastructure from scratch and scale existing systems

Preferred:

History of OSS contributions

2 Skills Required For This Role

Test Coverage Machine Learning

Similar Jobs

Data Analysis

Analyst, Data Science & Analytics

TransUnion • Pune, Maharashtra, India (Hybrid)