We are seeking a Machine Learning Engineer (MLE) to manage our post-training evaluation pipeline. The role involves building and scaling evaluation processes to assess model capabilities across various tasks, pinpointing areas of failure, and driving improvements. Key responsibilities include identifying tasks for evaluation, creating or curating test cases and measurement methods, implementing evaluations through objective verification, LLM judging, reward modeling, or human evaluation. You will also be responsible for expanding coverage, deeply analyzing failure cases, identifying solutions, and developing scalable and accessible internal evaluation presentation methods, such as GUIs or Slurm scripts.