Member of Technical Staff: AI Performance

14 Minutes ago • 3-7 Years • Research Development

Job Summary

Job Description

This role involves designing and implementing agent evaluation pipelines to benchmark AI capabilities across enterprise use cases like product support and engineering ops. Key responsibilities include building domain-specific benchmarks, optimizing for latency, safety, and cost-efficiency, and developing search-oriented benchmarks. The position is crucial for defining KPIs for AI-native enterprises, bringing rigor to reasoning systems, and shaping how AI performance is measured in SaaS 2.0.
Must have:
  • Design and implement agent evaluation pipelines.
  • Build domain-specific benchmarks for SaaS verticals.
  • Develop performance benchmarks for latency, safety, cost.
  • Create search- and retrieval-oriented benchmarks.
  • Partner with AI and infra teams for telemetry.
  • Drive human-in-the-loop and programmatic testing methodologies.
  • Contribute to DevRev’s open evaluation tooling and frameworks.
  • 3-7 years experience in systems, infra, or performance engineering.
  • Fluency in Python and comfort with full-stack/backend services.
  • Experience with LLMs, vector search, agentic frameworks in production.
  • Familiarity with LLM model serving infrastructure.
  • Experience with model tuning workflows.
  • Familiarity with evaluation techniques in NLP, IR, or human-centered AI.
Good to have:
  • Experience contributing to academic or open-source benchmarking projects

Job Details

DevRev’s AgentOS, purpose-built for SaaS companies, comprises three modern CRM apps for support, product, and growth teams. It connects end users, sellers, support, product people, and developers, reducing 9 business apps and converging 6 teams onto a common platform. Unlike horizontal CRMs, DevRev takes a blank canvas approach to collaboration, AI, and analytics, enabling SaaS companies to increase product velocity and reduce customer churn. DevRev is used by thousands of companies in search of low latency analytics and customizable LLMs to thrive in this era of GenAI. Headquartered in Palo Alto, California, DevRev has offices in seven global locations. We have raised $100 million in funding from investors like Khosla Ventures and Mayfield at a $1.1 billion valuation. We are also honored to be named on the Forbes 2024 list of America’s Best Startup Employers. Founded in October 2020 by Dheeraj Pandey, former co-founder and CEO of Nutanix, and Manoj Agarwal, former SVP of Engineering at Nutanix, DevRev continues to push the boundaries of innovation, helping thousands of companies thrive in the rapidly evolving landscape of AI-driven SaaS.

What You’ll Do

  • Design and implement agent evaluation pipelines that benchmark AI capabilities across real-world enterprise use cases.
  • Build domain-specific benchmarks for product support, engineering ops, GTM insights, and other verticals relevant to modern SaaS
  • Develop performance benchmarks that measure and optimize for latency, safety, cost-efficiency, and user-perceived quality.
  • Create search- and retrieval-oriented benchmarks, including multilingual query handling, annotation-aware scoring, and context relevance.
  • Partner with AI and infra teams to instrument models and agents with detailed telemetry for outcome-based evaluation.
  • Drive human-in-the-loop and programmatic testing methodologies for fuzzy metrics like helpfulness, intent alignment, and resolution effectiveness.
  • Contribute to DevRev’s open evaluation tooling and benchmarking frameworks, shaping how the broader ecosystem thinks about SaaS AI performance.

What We’re Looking For

  • 3–7 years of experience in systems, infra, or performance engineering roles with strong ownership of metrics and benchmarking.
  • Fluency in Python and comfort working across full-stack and backend services.
  • Experience building or using LLMs, vector-based search, or agentic frameworks in production environments.
  • Familiarity with LLM model serving infrastructure (e.g., vLLM, Triton, Ray, or custom Kubernetes-based deployments), including observability, autoscaling, and token streaming
  • Experience working with model tuning workflows, including prompt engineering, fine-tuning (e.g., LoRA, DPO), or evaluation loops for post-training optimization
  • Deep appreciation for measuring what matters — whether it’s latency under load, degradation in retrieval precision, or regression in AI output quality
  • Familiarity with evaluation techniques in NLP, information retrieval, or human-centered AI (e.g. RAGAS, Recall@K, BLEU, etc.)
  • Strong product and user intuition — you care about what the benchmark represents, not just what it measures

Bonus: experience contributing to academic or open-source benchmarking projects

Why This Role Matters

  • Agents are not APIs — they reason, adapt, and learn. But with that power comes ambiguity in how we measure success. At DevRev, we believe the benchmarks of the past aren’t enough for the software of the future.
  • This role is your opportunity to design the KPIs of the AI-native enterprise — to bring rigor to systems that reason, and structure to software that thinks.
  • Join us to shape how intelligence is measured in SaaS 2.0

Culture

The foundation of DevRev is its culture -- our commitment to those who are hungry, humble, honest, and who act with heart. Our vision is to help build the earth’s most customer-centric companies. Our mission is to leverage design, data engineering, and machine intelligence to empower engineers to embrace their customers. That is DevRev!

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Philippines

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Research Development Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

London, England, United Kingdom (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (Remote)

Bengaluru, Karnataka, India (Remote)

London, England, United Kingdom (On-Site)

Palo Alto, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by DevRev

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug
Contact Us
hello@outscal.com
Made in INDIA 💛💙