AI Evaluation Lead (JetBrains AI)

8 Minutes ago • All levels

Research Development

Job Description

JetBrains is seeking an AI Evaluation Lead to join their AI team, which focuses on integrating advanced AI capabilities into JetBrains products. This role involves defining and executing strategies for evaluating AI-powered features and LLMs, ensuring models deliver value to users. The lead will shape evaluation pipelines, influence model development, collaborate with product and research teams, and contribute to open-source projects. The team works on various AI aspects, from classical ML to agents and RAG.

Good To Have:

Preparing public evaluation reports for feature or model releases.
Managing data annotation efforts, including crowdsourcing and in-house labeling.
CI systems, workflow automation, and experiment tracking.
The Kotlin programming language.

Must Have:

Design and develop rigorous offline and online evaluation benchmarks for AI features and LLMs.
Manage the team, prioritize tasks, and mentor teammates.
Define evaluation methodology and benchmarks for open-source models and public releases.
Communicate findings and best practices across the organization.
Expertise in evaluating generative AI methods.
A strong understanding of statistics and data analysis.
Excellent management and communication skills.
Solid practical experience with Python and evaluation frameworks.
Attention to detail in everything you do.

Add these skills to join the top 1% applicants for this job

communication

data-analytics

github

game-texts

aws

teamcity

git

kotlin

python

algorithms

machine-learning

At JetBrains, code is our passion. Ever since we started back in 2000, we have been striving to make the world’s most robust and effective developer tools. By automating routine checks and corrections, our tools speed up production, freeing developers to grow, discover, and create.

The JetBrains AI team is focused on bringing advanced AI capabilities to JetBrains products, which includes supporting the internal AI platform used across JetBrains and conducting long-term R&D in AI and machine learning. We collaborate closely with product teams to brainstorm and prioritize AI-driven features, as well as support product marketing and release planning. Our team includes about 50 people working on everything from classical ML algorithms and code completion to agents, retrieval-augmented generation, and more.

We’re looking to strengthen our team with an AI Evaluation Lead who will help define and execute our strategy for evaluating AI-powered features and LLMs. In this role, you will be instrumental in ensuring our models deliver meaningful value to users, by shaping evaluation pipelines, influencing model development, collaborating with product and research teams across the company, and publishing your work to open source.

We value engineers who:

Plan their work and make decisions independently, consulting with others if needed.
Follow the latest advances in AI and ML fields, think long-term, and take ownership of their scope of work.
Prefer simplicity, opting for sound, robust, and efficient solutions.

In this role, you will:

Design and develop rigorous offline and online evaluation benchmarks for AI features and LLMs.
Manage the team, prioritize tasks, and mentor teammates.
Define evaluation methodology and benchmarks for our open-source models and public releases.
Communicate your findings and best practices across the organization.

We’ll be happy to have you on our team if you have:

Expertise in evaluating generative AI methods.
A strong understanding of statistics and data analysis.
Excellent management and communication skills.
Solid practical experience with Python and evaluation frameworks.
Attention to detail in everything you do.

We’d be especially thrilled if you have experience with:

Preparing public evaluation reports for feature or model releases.
Managing data annotation efforts, including crowdsourcing and in-house labeling.
CI systems, workflow automation, and experiment tracking.
The Kotlin programming language.