Agentic Analyst

5 Minutes ago • 2-4 Years • $37,440 PA - $49,920 PA

Business Analysis

Job Description

LILT is seeking an Agentic Analyst to join its BizOps-led Agentic Experimentation function. This role involves designing and running rapid experiments to enhance AI agents and production workflows. The analyst will convert business problems into structured experiments, script data pulls, iterate prompts, evaluate quality versus cost, and publish findings to guide product and workflow decisions across various AI variants.

Good To Have:

Experience evaluating LLM systems with AI judges, scripted checks, or human sampling.
Familiarity with MQM/LQE or similar linguistics QA frameworks.
Knowledge of RAG, vector stores, and retrieval verification strategies.
Familiarity with agentic workflows or translation/linguistics domains.
Basic MLOps/experimentation tools and prompt versioning best practices.

Must Have:

Design and execute end-to-end agentic experiments.
Script data pulls, iterate prompts, and analyze quality vs. cost.
Publish clear findings to inform product and workflow decisions.
Develop and optimize prompts for LILT’s AI agents and LLM workflows.
Automate data processing and experiment execution using Python.
Evaluate outputs with AI, programmatic checks, and human feedback.
Partner with Production to quantify workflow tradeoffs.
Communicate findings crisply and share reproducible artifacts.
Contribute to prompt and experiment hygiene.
2-4 years in Data Science, Analytics, or Applied AI with Python.
Hands-on experience with LLMs and prompt engineering.
Strong analytical rigor for production settings.
Proficiency in SQL and cloud data warehouses like BigQuery.
Clear written communication for executive summaries.

Add these skills to join the top 1% applicants for this job

data-analytics

github

game-texts

quality-control

salesforce

machine-translation

data-visualization

looker

data-science

pandas

notion

canva

python

sql

jira

machine-learning

About LILT

AI is changing how the world communicates — and LILT is leading that transformation.

We're on a mission to make the world's information accessible to everyone, regardless of the language they speak. We use cutting-edge AI, machine translation, and human-in-the-loop expertise to translate content faster, more accurately, and more cost-effectively without compromising on brand, voice, or quality.

At LILT, we empower our teammates with leading tools, global collaboration, and growth opportunities to do their best work. Our company virtues—Work together, win together; Find a way or make one; Quicker than they expect; Quality is Job 1—guide everything we do. We are trusted by Intel Corporation, Canva, the United States Department of Defense, the United States Air Force, ASICS, and hundreds of global Enterprises. Backed by Sequoia, Intel Capital, and Redpoint, we’re building a category-defining company in a $50B+ global translation market being redefined by AI.

Contract Data Analyst (LLMs, Prompt Engineering) — Agentic Experimentation

About the role

LILT’s BizOps-led Agentic Experimentation function designs and runs rapid experiments that upgrade our AI agents and production workflows, working closely with Production, Product, Engineering, and Research. You will turn business problems into structured experiments, script data pulls, iterate prompts, evaluate quality vs. cost, and publish clear findings that inform product and workflow decisions across AI Review, AI QA, Human‑Optimized, and Workflow 5 variants.

What you’ll do

Design and run agentic experiments end to end using our standard process:
Frame the problem and success criteria
Script data pulls from BigQuery, assemble representative datasets
Integrate and iterate prompts in code
Execute runs, collect outputs, and perform cost/quality analysis
Summarize results with examples, metrics, and recommendations for rollout or follow‑ups
Develop, version, and optimize prompts for LILT’s agents and LLM-backed workflows, leveraging our model-agnostic stack and evaluation practices
Write robust Python to automate data processing, experiment execution, and result aggregation; use notebooks and lightweight scripts where appropriate
Evaluate outputs with a mix of AI and programmatic checks aligned to production realities, including error detection, terminology/style adherence, and “human-in-the-loop” checkpoints
Partner with Production on workflow trials across multiple configurations; quantify tradeoffs in quality, speed, and cost
Communicate findings crisply in docs and presentations; open or update Jira tickets and share reproducible artifacts (datasets, scripts, prompts, and dashboards)
Contribute to shared prompt and experiment hygiene: versioning, datasets, eval suites, and guardrails to prevent regressions
Test agent capabilities in our sandbox environment and provide structured feedback to Product/Engineering

Must-haves

2–4 years in Data Science, Analytics, or Applied AI with demonstrable Python proficiency (pandas, data parsing, APIs, basic ETL)
Hands-on experience with LLMs and prompt engineering across providers (e.g., OpenAI, Anthropic, Vertex/Gemini, Bedrock), including practical eval and iteration cycles
Strong analytical rigor: can define success metrics, compare workflows, and reason clearly about quality/cost/speed tradeoffs in production settings
SQL and data wrangling skills; experience with BigQuery or equivalent cloud data warehouse
Clear written communication with exec-ready summaries and artifact links (reports, notebooks, Sheets, slides)

Nice-to-haves

Experience evaluating LLM systems with AI judges, scripted checks, or human sampling; familiarity with MQM/LQE or similar linguistics QA frameworks
Knowledge of RAG, vector stores, and retrieval verification strategies
Familiarity with agentic workflows or translation/linguistics domains
Basic MLOps/experimentation tools and prompt versioning best practices

Tooling and environment at LILT

Primary: Python, BigQuery, notebooks/Colab, Google Drive; Jira for work tracking; Notion for experiment documentation
Model orchestration is provider-agnostic; AI Review may route to different backends (e.g., Claude via Bedrock), so comfort with switching models and comparing output is key
Dashboarding: for new reporting, follow the LILT Dashboarding/Data Visualization Policy. Preference is LILT Analytics and approved internal platforms over new Looker Studio reports

Ways of working

You’ll sit with BizOps’ Agentic Experimentation team and partner closely with Production, Product, and Research to scope, execute, and land changes in real workflows and product
Operate within our experiment playbook: short cycles, reproducibility, realistic data, and decision‑ready summaries

Why LILT

Be part of the team advancing agentic AI in enterprise translation, improving real production outcomes across quality, speed, and cost
Work with a company founded and led by AI researchers, shipping model‑agnostic, production‑grade agent capabilities to customers at global scale

How to apply

If you’re excited to build and evaluate agentic workflows with measurable impact, we’d love to hear from you.

Our Story

Our founders, Spence and John met at Google working on Google Translate. As researchers at Stanford and Berkeley, they both worked on language technology to make information accessible to everyone. While together at Google, they were amazed to learn that Google Translate wasn’t used for enterprise products and services inside the company.The quality just wasn’t there. So they set out to build something better. LILT was born.

LILT has been a machine learning company since its founding in 2015. At the time, machine translation didn’t meet the quality standard for enterprise translations, so LILT assembled a cutting-edge research team tasked with closing that gap. While meeting customer demand for translation services, LILT has prioritized investments in Large Language Models, human-in-the-loop systems, and now agentic AI.

With AI innovation accelerating and enterprise demand growing, the next phase of LILT’s journey is just beginning.

Our Tech

What sets our platform apart:

Brand-aware AI that learns your voice, tone, and terminology to ensure every translation is accurate and consistent
Agentic AI workflows that automate the entire translation process from content ingestion to quality review to publishing
100+ native integrations with systems like Adobe Experience Manager, Webflow, Salesforce, GitHub, and Google Drive to simplify content translation
Human-in-the-loop reviews via our global network of professional linguists, for high-impact content that requires expert review

LILT in the News

Featured in The Software Report’s Top 100 Software Companies!
LILT makes it onto the Inc. 5000 List.
LILT’s continues to be an intellectual powerhouse, holding numerous patents that help power the most efficient and sophisticated AI and language models in the industry.
Check out all our news on our website.