Staff Software Engineer, Machine Learning Runtime Engines

1 Month ago • 8-13 Years • Artificial Intelligence • $197,000 PA - $291,000 PA

Job Summary

Job Description

This Staff Software Engineering role focuses on designing the next generation of Machine Learning (ML) accelerator runtimes. Responsibilities include managing the ML runtime engines strategy, supporting Google’s ML ecosystem across firmware, hardware, and tools, migrating existing frameworks (TensorFlow, JAX, PyTorch) runtimes to GPUs, managing open-source components, and mentoring junior engineers. The ideal candidate will have extensive experience in software development, ML/AI algorithms, building large-scale ML systems, and experience with JAX, PyTorch, or TensorFlow. The role involves addressing user needs, minimizing use-case specific code, and ensuring highly performant execution across accelerator hardware for various models, including LLMs and LEMs.
Must have:
  • 8+ years software development experience
  • 5+ years ML/AI algorithm & tool experience
  • 5+ years building large-scale ML systems
  • 3+ years experience with JAX, PyTorch, or TensorFlow
  • Experience with data structures/algorithms
  • Manage ML runtime strategy
Good to have:
  • Master's or PhD in related field
  • ML & HPC experience
  • Framework/runtime experience
  • ML community involvement
  • Concurrent/parallel computation debugging
  • Debugging at all stack levels
Perks:
  • Bonus
  • Equity
  • Benefits

Job Details

Minimum qualifications:

  • Bachelor's degree or equivalent practical experience.
  • 8 years of experience in software development, and with data structures/algorithms.
  • 5 years of experience with developing Machine Learning (ML)/Artificial Intelligence (AI) algorithms and tools.
  • 5 years of experience building and architecting large-scale, production quality ML systems.
  • 3 years of experience in development with JAX, PyTorch, or TensorFlow.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • Experience with Machine Learning and High Performance Computing (HPC).
  • Experience in framework and runtime.
  • Experience in the Machine Learning (ML) community through publications, open-source contributions, or conference participation.
  • Ability to debug and program concurrent/parallel computations.
  • Ability to debug correctness and performance issues at all levels of the stack.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

With your technical expertise you will manage project priorities, deadlines, and deliverables. You will design, develop, test, deploy, maintain, and enhance software solutions.

In this role, you will be responsible for designing the new generation of Machine Learning (ML) accelerator runtimes, how they interact with ML frameworks and compilers to deliver highly performant execution across accelerator hardware for the cutting edge Large Language Models (LLMs), Large Embedding Model (LEM), Language model (LM) and non LLM models used by all of Google.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about .

Responsibilities

  • Manage the ML runtime engines strategy towards solving the greater ML infrastructure problems and addressing users needs.
  • Support Google’s ML ecosystem needs across firmware, hardware, tools. This will minimize use-case specific code paths and turn down investments in maintenance of such fragmented efforts in the current stack.
  • Migrate existing frameworks (TensorFlow, JAX, PyTorch) runtimes (TFExecutor, TensorFlow Runtime (TFRT), PJRT, JetStream) and custom workflows from Tensor Processing Unit (TPU) to Graphics Processing Unit (GPU) and minimizing any user disruption.
  • Manage open-source components to enable integration with frameworks, compilers in Operational Support Systems (OSS).
  • Mentor and grow junior engineers, and provide career development guidance and advice.

Similar Jobs

ByteDance - Machine Learning Engineer - AML Algorithm

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
NVIDIA - Principal Engineer

NVIDIA

United States (Remote)
3 Months ago
NVIDIA - Deep Learning Software Engineer, Performance Optimization

NVIDIA

Tokyo, Japan (On-Site)
4 Months ago
Arkose Labs - Senior Machine Learning Researcher

Arkose Labs

Pune, Maharashtra, India (Hybrid)
8 Months ago
ByteDance - Software Engineer, Model Inference

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
Google - Software Developer III, AI/ML GenAI, Applied AI

Google

Waterloo, Ontario, Canada (On-Site)
1 Month ago
ByteDance - Research Scientist- Foundation Model, Video Generation

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Google - Senior Software Engineer, Machine Learning, Google Play Books

Google

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Google - Technical Program Manager III, Embedded Systems, Cloud AI Systems

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
Microsoft - Senior Data Scientist

Microsoft

(On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Research Engineer in Large Model System

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Google - Software Engineer III, Machine Learning, Labs

Google

Mountain View, California, United States (On-Site)
1 Month ago
Google - Customer Engineer, AI Infrastructure

Google

Seattle, Washington, United States (On-Site)
1 Month ago
Riot Games - Staff Software Engineer, Machine Learning - AI Foundations

Riot Games

United States (On-Site)
1 Month ago
ByteDance - Applied Scientist Intern (Computational Modeling & Optimization)

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
Google - Software Engineer, Research, Computational Imaging

Google

Mountain View, California, United States (On-Site)
1 Month ago
Razer - Solutions Architect

Razer

Singapore (On-Site)
8 Months ago
NVIDIA - Deep Learning Intern - Fall 2025

NVIDIA

Shanghai, Shanghai, China (On-Site)
1 Month ago
Google - Solution Engineer, Innovation, Cloud Solution Accelerator Workshops

Google

New York, New York, United States (On-Site)
1 Month ago
NVIDIA - Senior Technical Marketing Engineer - AI Infrastructure

NVIDIA

Santa Clara, California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Sunnyvale, California, United States

Epoch Games - 3D Creature Artist

Epoch Games

Winston-Salem, North Carolina, United States (Remote)
11 Months ago
NVIDIA - Senior Software Architect - Data Center Systems

NVIDIA

Austin, Texas, United States (Remote)
3 Months ago
Google - Technical Program Manager II, Mobile Networks, Google Fi

Google

Mountain View, California, United States (On-Site)
1 Month ago
Epic Games - Lead Engine Programmer, Verse Framework

Epic Games

Cary, North Carolina, United States (On-Site)
1 Month ago
The Walt Disney Company - Senior Financial Accounting Analyst

The Walt Disney Company

Bristol, Connecticut, United States (On-Site)
1 Month ago
NVIDIA - Senior ASIC Verification Engineer - GPU

NVIDIA

Durham, North Carolina, United States (Hybrid)
2 Months ago
Epic Games - Marketing Performance Manager

Epic Games

Cary, North Carolina, United States (On-Site)
4 Months ago
Google - Technical Lead, Networking, Design and Automation

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
The Walt Disney Company - Pest Control Operator

The Walt Disney Company

Florida, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Tencent - AI Technical Project Manager

Tencent

London, England, United Kingdom (On-Site)
3 Months ago
Google - Lead Group Product Manager, Developer AI, Core

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
Interface AI - Vice President of Engineering

Interface AI

United States (Remote)
3 Months ago
Spell Brush - AI Infrastructure Engineer

Spell Brush

San Francisco, California, United States (On-Site)
2 Months ago
ByteDance - Research Engineer Graduate (Vision AI Platform)

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago
Google - Software Engineer III, AI/ML, Google Cloud

Google

Gurugram, Haryana, India (On-Site)
5 Months ago
ByteDance - Senior Research Scientist, Foundation Model, Speech Understanding

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
NVIDIA - Solutions Architect - Generative AI

NVIDIA

Seoul, South Korea (Hybrid)
2 Months ago
Google - Cloud AI Engineer, Global Services Delivery

Google

Mexico City, Mexico City, Mexico (On-Site)
1 Month ago
NetEase Games - Game AI Research Leader

NetEase Games

Singapore (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded