The Evaluation Engineer will be responsible for designing and implementing evaluation methodologies for Large Language Models (LLMs) and agentic workflows. This includes utilizing tools like Galileo, DeepEval, and Ragas for assessing retrieval quality and response accuracy, as well as using Arize for performance monitoring. The role involves identifying failure points, optimizing model responses, and collaborating with AI researchers and engineers. The ideal candidate will have experience with AI assessment tools and a strong understanding of LLMs such as GPT, Claude, Llama, and Mixtral. The engineer will work on optimizing efficiency and effectiveness in AI deployments. The role offers a chance to work on cutting-edge projects at leading financial institutions.