ML Systems Engineer

17 Minutes ago • 3 Years +
Research Development

Job Description

We are seeking an ML Systems Engineer to optimize the performance and efficiency of large language model inference powering our agentic AI platform. This role focuses on low-level systems optimization, implementing performance optimizations, building evaluation harnesses, and architecting multi-node clusters for training and inference. Your work will directly impact the responsiveness and cost-efficiency of AI agents used by leading semiconductor companies to design chips.
Good To Have:
  • Familiarity with distributed computing frameworks (Ray, multi-node training/inference)
Must Have:
  • Design, deploy, and optimize LLM inference systems across multi-node clusters
  • Implement and benchmark concrete inference optimizations
  • Profile and analyze inference bottlenecks at the systems level
  • Build robust evaluation harnesses and benchmarking frameworks
  • Collaborate with research scientists to integrate new model architectures and optimizations
  • Investigate and apply emerging techniques to continuously improve inference performance
  • B.S., M.S., or PhD in Computer Science, Electrical Engineering, or related field (or equivalent experience)
  • 3+ years of experience with large-scale ML systems, GPU computing, or high-performance inference optimization
  • Strong proficiency in Python and C++/CUDA
  • Hands-on experience with SGLang, vLLM, PyTorch, or similar inference frameworks
  • Deep understanding of GPU architecture, memory hierarchies, and parallel computing paradigms
  • Experience deploying and optimizing LLMs in production
  • Strong systems-level debugging and profiling skills
  • Comfort working at multiple layers of the stack from CUDA kernels to application logic
  • Self-directed problem solver interested in ambitious optimization challenges
Perks:
  • Work on cutting-edge LLM inference optimization problems with real-world production impact
  • Access to substantial GPU compute resources for experimentation and benchmarking
  • Collaborate with a world-class team spanning AI research, systems engineering, and EDA
  • Shape the performance characteristics of AI systems used by leading semiconductor companies
  • Competitive compensation
  • Benefits
  • Meaningful equity
  • Professional growth opportunities

Add these skills to join the top 1% applicants for this job

problem-solving
github
cpp
game-texts
cuda
model-serving
pytorch
python

ML Systems Engineer

Santa Barbara, CA / Santa Clara, CA

About ChipAgents

ChipAgents is redefining the future of chip design and verification with agentic AI workflows. Our platform leverages cutting-edge generative AI to assist engineers in RTL design, simulation, and verification, dramatically accelerating chip development. Founded by experts in AI and semiconductor engineering, we partner with top semiconductor firms, cloud providers, and innovative startups to build intelligent AI agents. The company is a Series A company backed by tier-1 VC firms. ChipAgents is deployed in production to companies that have shipped 16B chips.

Position Overview

We are seeking an ML Systems Engineer to optimize the performance and efficiency of large language model inference powering our agentic AI platform. This is a technical role focused on low-level systems optimization. You will implement performance optimizations, build evaluation harnesses, and architect multi-node clusters for training and inference that push the limits of LLM throughput and latency. Your work will directly impact the responsiveness and cost-efficiency of AI agents used by leading semiconductor companies to design chips.

Key Responsibilities

  • Design, deploy, and optimize LLM inference systems across multi-node clusters, maximizing throughput and minimizing latency for production workloads.
  • Implement and benchmark concrete inference optimizations.
  • Profile and analyze inference bottlenecks at the systems level—from GPU kernel execution to memory bandwidth constraints.
  • Build robust evaluation harnesses and benchmarking frameworks that measure accuracy, throughput, latency, and resource utilization across various parallelism strategies.
  • Collaborate with research scientists to integrate new model architectures and optimizations into production inference infrastructure.
  • Investigate and apply emerging techniques from research papers and open-source projects to continuously improve inference performance.

Qualifications

  • B.S., M.S., or PhD in Computer Science, Electrical Engineering, or related field (or equivalent experience).
  • 3+ years of experience with large-scale ML systems, GPU computing, or high-performance inference optimization.
  • Strong proficiency in Python and C++/CUDA; hands-on experience with SGLang, vLLM, PyTorch, or similar inference frameworks.
  • Deep understanding of GPU architecture, memory hierarchies, and parallel computing paradigms.
  • Experience deploying and optimizing LLMs in production: model serving, batching strategies, distributed inference, or quantization.
  • Strong systems-level debugging and profiling skills; comfort working at multiple layers of the stack from CUDA kernels to application logic.
  • Familiarity with distributed computing frameworks (Ray, multi-node training/inference) is a plus.
  • Self-directed problem solver who is interested in working on ambitious optimization challenges.

Why Join Us

  • Work on cutting-edge LLM inference optimization problems with real-world production impact.
  • Access to substantial GPU compute resources for experimentation and benchmarking.
  • Collaborate with a world-class team spanning AI research, systems engineering, and EDA.
  • Shape the performance characteristics of AI systems used by leading semiconductor companies.
  • Competitive compensation, benefits, meaningful equity, and professional growth opportunities.

To apply: Send your résumé, GitHub/portfolio/open-source contributions (if available), and a brief note on your interest and work you're most proud of to ww@alphadesign.ai

Set alerts for more jobs like ML Systems Engineer
Set alerts for new jobs by ChipAgents
Set alerts for new Research Development jobs in United States
Set alerts for new jobs in United States
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙