Staff Software Engineer, Machine Learning Runtime Engines

1 Week ago • 8-13 Years • Artificial Intelligence • $197,000 PA - $291,000 PA

Job Summary

Job Description

This Staff Software Engineering role focuses on designing the next generation of Machine Learning (ML) accelerator runtimes. Responsibilities include managing the ML runtime engines strategy, supporting Google’s ML ecosystem across firmware, hardware, and tools, migrating existing frameworks (TensorFlow, JAX, PyTorch) runtimes to GPUs, managing open-source components, and mentoring junior engineers. The ideal candidate will have extensive experience in software development, ML/AI algorithms, building large-scale ML systems, and experience with JAX, PyTorch, or TensorFlow. The role involves addressing user needs, minimizing use-case specific code, and ensuring highly performant execution across accelerator hardware for various models, including LLMs and LEMs.
Must have:
  • 8+ years software development experience
  • 5+ years ML/AI algorithm & tool experience
  • 5+ years building large-scale ML systems
  • 3+ years experience with JAX, PyTorch, or TensorFlow
  • Experience with data structures/algorithms
  • Manage ML runtime strategy
Good to have:
  • Master's or PhD in related field
  • ML & HPC experience
  • Framework/runtime experience
  • ML community involvement
  • Concurrent/parallel computation debugging
  • Debugging at all stack levels
Perks:
  • Bonus
  • Equity
  • Benefits

Job Details

Minimum qualifications:

  • Bachelor's degree or equivalent practical experience.
  • 8 years of experience in software development, and with data structures/algorithms.
  • 5 years of experience with developing Machine Learning (ML)/Artificial Intelligence (AI) algorithms and tools.
  • 5 years of experience building and architecting large-scale, production quality ML systems.
  • 3 years of experience in development with JAX, PyTorch, or TensorFlow.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • Experience with Machine Learning and High Performance Computing (HPC).
  • Experience in framework and runtime.
  • Experience in the Machine Learning (ML) community through publications, open-source contributions, or conference participation.
  • Ability to debug and program concurrent/parallel computations.
  • Ability to debug correctness and performance issues at all levels of the stack.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

With your technical expertise you will manage project priorities, deadlines, and deliverables. You will design, develop, test, deploy, maintain, and enhance software solutions.

In this role, you will be responsible for designing the new generation of Machine Learning (ML) accelerator runtimes, how they interact with ML frameworks and compilers to deliver highly performant execution across accelerator hardware for the cutting edge Large Language Models (LLMs), Large Embedding Model (LEM), Language model (LM) and non LLM models used by all of Google.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about .

Responsibilities

  • Manage the ML runtime engines strategy towards solving the greater ML infrastructure problems and addressing users needs.
  • Support Google’s ML ecosystem needs across firmware, hardware, tools. This will minimize use-case specific code paths and turn down investments in maintenance of such fragmented efforts in the current stack.
  • Migrate existing frameworks (TensorFlow, JAX, PyTorch) runtimes (TFExecutor, TensorFlow Runtime (TFRT), PJRT, JetStream) and custom workflows from Tensor Processing Unit (TPU) to Graphics Processing Unit (GPU) and minimizing any user disruption.
  • Manage open-source components to enable integration with frameworks, compilers in Operational Support Systems (OSS).
  • Mentor and grow junior engineers, and provide career development guidance and advice.

Similar Jobs

Google - Field Solutions Architect, Generative AI, Google Cloud

Google

Stockholm, Stockholm County, Sweden (On-Site)
1 Week ago
Luxoft - Regular Data Engineer

Luxoft

(Remote)
4 Months ago
Trendyol - Data Science Professionals - Trendyol GO

Trendyol

Ankara, Ankara, Türkiye (Hybrid)
5 Months ago
Google - Customer Engineer, Cloud AI, Google Cloud

Google

New York, New York, United States (On-Site)
1 Week ago
Ubisoft - Lead R&D Scientist

Ubisoft

Shanghai, Shanghai, China (On-Site)
3 Months ago
NVIDIA - Senior Deep Learning Research Engineer, Advanced AI Systems

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
NVIDIA - Global Developer Relations Account Manager – Ansys

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
Lucid Reality Labs - Machine Learning Engineer

Lucid Reality Labs

Poland (Remote)
3 Months ago
NVIDIA - Global Developer Relations Account Manager – Ansys

NVIDIA

Canada (On-Site)
1 Month ago
Krafton  - Head of Deep Learning PM & Ops

Krafton

Seoul, South Korea (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Tencent - Senior Staff Researcher

Tencent

California, United States (On-Site)
1 Month ago
ByteDance - Software Engineer in Large Model System Graduate (Machine Learning Sys-US) - 2024 Start (BS/MS)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
NVIDIA - Senior Math Libraries Engineers - Python APIs

NVIDIA

Louisiana, United States (Remote)
1 Month ago
Every matrix - Experienced CRM Data Scientist

Every matrix

London, England, United Kingdom (Hybrid)
4 Weeks ago
Trendyol - Data Science Team Lead - Dolap

Trendyol

İstanbul, İstanbul, Türkiye (Hybrid)
4 Months ago
Meta - Research Scientist Intern, Machine Perception for Input and Interaction (PhD)

Meta

Burlingame, California, United States (On-Site)
5 Months ago
ByteDance - Software Engineer Graduate (Applied Machine Learning - Engine) - 2025 Start (BS/MS)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
ByteDance - Software Engineer, ML System Scheduling

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
Google - Field Solutions Architect, Generative AI, Google Cloud

Google

Madrid, Community Of Madrid, Spain (On-Site)
1 Week ago
ByteDance - Tech Lead Machine Learning Engineer

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Sunnyvale, California, United States

Crunchyroll - Senior Data Engineer - Platform Engineering

Crunchyroll

San Francisco, California, United States (Remote)
4 Months ago
Google - Software Engineer III, Infrastructure, Google TV

Google

San Jose, California, United States (On-Site)
5 Months ago
Zoox - Senior/Staff Software Engineer, Robot Link Platform

Zoox

Foster City, California, United States (Hybrid)
6 Months ago
Evolution - Online Game Presenter (Waiter/Waitress Alternative) No Experience Necessary

Evolution

Atlantic City, New Jersey, United States (On-Site)
10 Months ago
Pragma - People Operations Coordinator

Pragma

Culver City, California, United States (Hybrid)
1 Month ago
Stardock - Senior Game Developer

Stardock

Plymouth, Michigan, United States (On-Site)
4 Weeks ago
ByteDance - Network Engineer, Optical Long-Haul and Submarine

ByteDance

Ashburn, Virginia, United States (On-Site)
2 Months ago
Twitch - Senior Manager - Corporate Communications

Twitch

Irvine, California, United States (On-Site)
4 Weeks ago
NVIDIA - Senior System Power Management Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
2 Weeks ago
ByteDance - Strategy Analyst – Strategy & Operations

ByteDance

Seattle, Washington, United States (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Trend Micro - Large Language Models (LLM) Expert (VicOne_Automotive Security)

Trend Micro

Taipei City, Taiwan (On-Site)
6 Months ago
The Walt Disney Company - Senior Data Scientist - NLP/LLM

The Walt Disney Company

Glendale, California, United States (On-Site)
1 Week ago
NVIDIA - Deep Learning Performance Architect

NVIDIA

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
Meta - Research Engineer - Conversational AI - Reality Labs

Meta

Menlo Park, California, United States (On-Site)
6 Days ago
NVIDIA - Senior Solutions Architect, Global Partner Team

NVIDIA

Canada (On-Site)
3 Months ago
Google - Software Engineer III, AI/ML GenAI

Google

New York, New York, United States (On-Site)
1 Week ago
Ubisoft - Senior ML Data Scientist

Ubisoft

Montreal, Quebec, Canada (On-Site)
3 Months ago
Canva - Machine Learning Research Engineering Manager - Image Generation

Canva

Vienna, Vienna, Austria (Remote)
3 Weeks ago
ByteDance - Software Engineer Intern (Doubao (Seed) - Machine Learning System) - 2025 Summer (MS)

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
Canva - Machine Learning Engineering Manager (m/f/x) - Canva Austria

Canva

Vienna, Vienna, Austria (Remote)
5 Months ago

Get notifed when new similar jobs are uploaded

About The Company

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

Fremont, California, United States (On-Site)

Mountain View, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Dublin, County Dublin, Ireland (On-Site)

Atlanta, Georgia, United States (On-Site)

Ho Chi Minh City, Ho Chi Minh City, Vietnam (On-Site)

Seattle, Washington, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug