Machine Learning Engineer (Data & Evaluation Infrastructure)

2 Months ago • All levels
Data Analysis

Job Description

We are seeking a Machine Learning Engineer (MLE) to manage our post-training evaluation pipeline. The role involves building and scaling evaluation processes to assess model capabilities across various tasks, pinpointing areas of failure, and driving improvements. Key responsibilities include identifying tasks for evaluation, creating or curating test cases and measurement methods, implementing evaluations through objective verification, LLM judging, reward modeling, or human evaluation. You will also be responsible for expanding coverage, deeply analyzing failure cases, identifying solutions, and developing scalable and accessible internal evaluation presentation methods, such as GUIs or Slurm scripts.
Good To Have:
  • History of OSS contributions
Must Have:
  • Experience with evaluation frameworks
  • Experience with automated and human evaluation
  • Ability to build evaluation infrastructure from scratch
  • Scale existing systems

Add these skills to join the top 1% applicants for this job

test-coverage
machine-learning

We’re looking for an MLE to own our post-training evaluation pipeline. You’ll build and scale evals depth and breadth that measure model capabilities across diverse tasks, identify failure modes, and drive model improvements.

Responsibilities:

  • Identifying tasks for evaluation coverage
  • Creating, curating, or generating test cases and ways to measure these tasks
  • Implementing evaluation through objective output verification, LLM judge/reward modeling, human evaluation, or any tricks of the trade you may bring to the table
  • Adding coverage and diving deep into analyzing what’s really gone wrong in failure cases
  • Identifying ways to remedy failure cases
  • Developing ways to present and make the evals scalable and accessible internally (e.g. light GUIs, scalable Slurm scripts, etc for running the evals)

Qualifications:

  • Strong experience with evaluation frameworks
  • Experience with both automated and human evaluation methodologies
  • Ability to build evaluation infrastructure from scratch and scale existing systems

Preferred:

  • History of OSS contributions

Set alerts for more jobs like Machine Learning Engineer (Data & Evaluation Infrastructure)
Set alerts for new jobs by Nousresearch
Set alerts for Data Analysis (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙