Senior Member of Technical Staff: ML Systems and Infrastructure

DevRev

5+ Years | Bangalore, Karnataka, India (On Site) | Full Time | 1 day ago

Apply Now

Job Summary

DevRev is seeking a Senior Member of Technical Staff for ML Systems and Infrastructure to design, build, and own the end-to-end platform for ML models, from massive-scale distributed training to ultra-low-latency inference. This role involves optimizing and scaling LLM inference stacks using frameworks like vLLM and TensorRT-LLM, empowering AI research teams, and automating model lifecycle management with CI/CD/CT pipelines. Candidates should have 5+ years of experience, deep Kubernetes and cloud-native expertise, and strong programming skills in Python or Go.

Must Have

Architect, build, and own the end-to-end platform for ML models, from distributed training to ultra-low-latency inference.
Implement and scale sophisticated inference stacks for LLMs using frameworks like vLLM, TensorRT-LLM, or SGLang.
Act as a strategic partner to AI Research and Data Science teams, creating a seamless developer experience.
Develop robust CI/CD/CT pipelines using Argo Workflows, ArgoCD, and GitHub Actions.
5+ years in infrastructure or software engineering, with at least 2+ years focused on MLOps or ML infrastructure.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
Deep, hands-on expertise with Kubernetes in production, including Helm, ArgoCD, and Argo Workflows.
Optimize platform performance and scalability, considering GPU resource utilization, data ingestion, model training, and deployment.
Hands-on experience with modern LLM inference serving frameworks (e.g., vLLM, SGLang, Triton Inference Server, Ray Serve).
Strong programming proficiency in Python or Go, with experience using ML frameworks like PyTorch, Jax, TensorFlow.
Passion for building observable and resilient systems using modern monitoring tools (e.g., Prometheus, Grafana, OpenTelemetry).

Good to Have

Deep performance optimization skills, including writing custom inference kernels in CUDA or Triton.
Experience with model optimization techniques like quantization, distillation, and speculative decoding.
Exposure to training and serving multi-modal models (e.g., text-to-image, vision-language).
Knowledge of AI safety and evaluation frameworks for monitoring model performance.

Job Description

At DevRev, we’re building the future of work with Computer – your AI teammate.

Computer is not just another tool. It’s built on the belief that the future of work should be about genuine human connection and collaboration – not piling on more apps. Computer is the best kind of teammate: it amplifies your strengths, takes repetition and frustration out of your day, and gives you more time and energy to do your best work.

How?

Easy: it’s the only platform capable of…

Complete data unification

Most AI products focus on either structured data (like CRM records and support tickets), or unstructured data (like documents and emails). Computer AirSync connects everything, unifying all your data sources (like Google Workspace, Jira, Notion) into one AI-ready source of truth: Computer Memory.

Powerful search, reasoning, and action

Once connected to all your tools and apps, Computer is embedded in your full business context. It can find and summarize, sure. Even more impressive: it offers employees insights, strategic and proactive suggestions, plus powerful agentic actions.

Extensions for your teams and customers

Computer doesn’t make you choose between new software and old. Its AI-native platform lets you extend existing tools with sophisticated apps and agents. So your teams – and your customers – can take action, seamlessly. These agents work alongside you: updating workflows, coordinating across teams, and syncing back to your systems.

This isn’t just software. Computer brings people back together, breaking down silos and ushering in the future of teamwork, through human-AI collaboration. Stop managing software. Stop wasting time. Start solving bigger problems, building better products, and making your customers happier.

We call this Team Intelligence. It’s why DevRev exists.

Trusted by global companies across multiple industries, DevRev is backed by Khosla Ventures and Mayfield, with $150M+ raised. We are 650+ people, across eight global offices.

What You’ll Do:

Architect the Future of AI Infrastructure: You will design, build, and own the end-to-end platform that supports the entire lifecycle of our ML models—from massive-scale distributed training to ultra-low-latency, highly-available inference.
Optimize and Serve Cutting-Edge Models: You'll implement and scale sophisticated inference stacks for LLMs using frameworks like vLLM, TensorRT-LLM, or SGLang. You’ll solve complex challenges in throughput, latency, token streaming, and automated scaling to deliver a seamless user experience.
Empower AI Innovation: You will act as a strategic partner to our AI Research and Data Science teams. You’ll create a seamless developer experience that accelerates their ability to experiment, fine-tune, and deploy groundbreaking models with velocity and confidence.
Automate Everything: You'll develop robust CI/CD/CT (Continuous Training) pipelines using tools like Argo Workflows, ArgoCD, and GitHub Actions to automate model validation, deployment, and lifecycle management, ensuring our systems are both agile and rock-solid.

What are we looking for

Experience: 5+ years in infrastructure or software engineering, with at least 2+ years laser-focused on MLOps or ML infrastructure for large-scale distributed systems.
Education: A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
Kubernetes & Cloud Native Expertise: Deep, hands-on expertise with Kubernetes in production. You are fluent in the cloud-native ecosystem, including Helm, ArgoCD, and Argo Workflows.
GPU & Cloud Mastery: Optimize the platform’s performance and scalability, considering factors such as GPU resource utilization, data ingestion, model training, and deployment.
Modern LLM Serving Experience: Hands-on experience with modern LLM inference serving frameworks (e.g., vLLM, SGLang, Triton Inference Server, Ray Serve). You understand the unique challenges of serving generative models.
Strong Coder: Strong programming proficiency in Python or Go, with experience using ML frameworks like PyTorch, Jax, TensorFlow.
Observability Mindset: A passion for building observable and resilient systems using modern monitoring tools (e.g., Prometheus, Grafana, OpenTelemetry).

We would love to see:

Deep performance optimization skills, including writing custom inference kernels in CUDA or Triton to accelerate model performance beyond what off-the-shelf frameworks provide.
Experience with model optimization techniques like quantization, distillation, and speculative decoding.
Exposure to training and serving multi-modal models (e.g., text-to-image, vision-language).
Knowledge of AI safety and evaluation frameworks for monitoring model performance for things like bias, toxicity, and hallucinations.

As part of our hiring process, shortlisted candidates will undergo a Background Verification (BGV). By applying, you consent to sharing personal information required for this process. Any offer made will be subject to successful completion of the BGV.

Culture

The foundation of DevRev is its culture -- our commitment to those who are hungry, humble, honest, and who act with heart. Our vision is to help build the earth’s most customer-centric companies. Our mission is to leverage design, data engineering, and machine intelligence to empower engineers to embrace their customers.

That is DevRev!

18 Skills Required For This Role

Team Management Github Game Texts Agile Development Cuda User Experience Ux Prometheus Grafana Helm Data Science Pytorch Ci Cd Kubernetes Notion Python Github Actions Jira Tensorflow

Similar Jobs

Research Development

Team Lead - Annotations

GoMotive • Islamabad, Islamabad Capital Territory, Pakistan (Remote)