Staff Software Engineer, Machine Learning Performance, TPU

17 Hours ago • 8-13 Years • Artificial Intelligence • $197,000 PA - $291,000 PA

Job Summary

Job Description

This Staff Software Engineer role focuses on maximizing the performance of Machine Learning (ML) and Artificial Intelligence (AI) workloads, particularly on TPUs. Responsibilities include establishing and maintaining LLM benchmarks, optimizing ML models through techniques like quantization and sparsity, collaborating with product teams to onboard LLMs onto new TPU hardware, and analyzing performance metrics to identify and resolve bottlenecks. The position requires extensive experience in software development, performance analysis, and ML system understanding. The role involves working with TensorFlow/JAX and contributing to Google's TPU infrastructure.
Must have:
  • 8+ years software product experience
  • 5+ years software development (Python, C, C++)
  • Performance analysis expertise
  • Experience with ML infrastructure
  • TensorFlow/JAX knowledge
Good to have:
  • Master's/PhD in related field
  • Technical leadership experience
  • ML system expertise
  • Compiler optimization experience
Perks:
  • Bonus
  • Equity
  • Benefits

Job Details


Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 8 years of experience in testing, and launching software products.
  • 5 years of experience with software development in one or more programming languages (e.g., Python, C, C++).
  • Experience in performance analysis including system architecture, performance modeling, benchmarking or machine learning infrastructure.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • 3 years of experience in a multiplex organization including technical leadership role leading project teams and setting technical direction.
  • Experience in Machine Learning System (e.g., Background Theory, TensorFlow, etc.).
  • Experience in compiler optimizations or related fields.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

In this role, you will be responsible for the performance and extracting maximum efficiency for Machine Learning (ML) and Artificial Intelligence (AI) workloads. You will drive Google ML performance using fleet-scale and benchmark analysis and auto-optimizations.

The ML, Systems, and Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about .

Responsibilities

  • Identify and maintain Large Language Model (LLM) training and serving benchmarks, used by industry and Machine Learning (ML) community to identify performance opportunities and drive TensorFlow/JAX Tensor Processing Unit (TPU) performance.
  • Work on scaling numeric and algorithmic optimizations to Google products and ML models including quantization, sparsity, and other model compression techniques, new ML model architecture/optimizer/training techniques to solve ML tasks more efficiently.
  • Engage with Google product teams to solve their Large Language Model (LLM) performance problems including onboarding new LLM models and products on Google new TPU hardware, enabling LLMs to train efficiently on thousands of TPUs.
  • Analyze performance and efficiency metrics to identify bottlenecks. Design, and implement solutions at Google.

Similar Jobs

ByteDance - Machine Learning Engineer, E-commerce Governance Algorithms

ByteDance

Seattle, Washington, United States (On-Site)
2 Weeks ago
Arrise Solutions (India)   - Senior ML Engineer

Arrise Solutions (India)

Hyderabad, Telangana, India (On-Site)
7 Months ago
NVIDIA - Deep Learning Software Engineering Intern, Test Development

NVIDIA

Shanghai, Shanghai, China (On-Site)
3 Weeks ago
NVIDIA - Senior AI Video Architecture Engineer

NVIDIA

Shanghai, Shanghai, China (On-Site)
3 Weeks ago
NVIDIA - Senior Solution Engineer, Mission Control

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
Meta - AI Research Scientist, Language - Generative AI

Meta

New York, New York, United States (On-Site)
5 Months ago
Match Group - Machine Learning Engineer

Match Group

New York, New York, United States (Hybrid)
6 Months ago
Microsoft - Member of Technical Staff, AI Pre-Training

Microsoft

Zürich, Zurich, Switzerland (On-Site)
1 Week ago
Google - Intel Strategist, Scaled Intel Collection, Trust and Safety

Google

Austin, Texas, United States (On-Site)
1 Week ago
NVIDIA - Senior Technical Marketing Engineer - AI Inference at Scale

NVIDIA

California, United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Trendyol - Data Science Team Lead - Dolap

Trendyol

İstanbul, İstanbul, Türkiye (Hybrid)
4 Months ago
Inworld AI - Staff / Principal AI Researcher - USA

Inworld AI

Mountain View, California, United States (Remote)
4 Months ago
Dream Sports - Senior ML Scientist

Dream Sports

Mumbai, Maharashtra, India (On-Site)
6 Months ago
ByteDance - Architect - AML Engine

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
DNEG - Head of Machine Learning

DNEG

London, England, United Kingdom (Remote)
1 Month ago
Google - Solution Engineer, Innovation, Cloud Solution Accelerator Workshops

Google

New York, New York, United States (On-Site)
1 Week ago
Canva - Machine Learning Engineer - Ecosystem Experiences

Canva

Surry Hills, New South Wales, Australia (Remote)
3 Weeks ago
Hedra - Applied Research Scientist

Hedra

San Francisco, California, United States (On-Site)
1 Month ago
Ubisoft - Lead R&D Scientist

Ubisoft

Shanghai, Shanghai, China (On-Site)
2 Weeks ago
Netflix - Machine Learning Scientist (L5) - Payments DSE

Netflix

United States (Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Mountain View, California, United States

Epic Games - Senior Desktop Engineer, Fortnite Tech

Epic Games

Cary, North Carolina, United States (On-Site)
3 Months ago
The Walt Disney Company - Technical Program Manager II

The Walt Disney Company

San Francisco, California, United States (On-Site)
2 Weeks ago
Google - Senior Physical Design CAD Manager

Google

Mountain View, California, United States (On-Site)
1 Week ago
Fluence - Lead Engineer - Battery Module

Fluence

Houston, Texas, United States (Hybrid)
6 Months ago
Google - Failure Analysis Engineer, Materials and Defect Analysis

Google

Fremont, California, United States (On-Site)
1 Week ago
Epic Games - Environment Outsource Lead

Epic Games

Cary, North Carolina, United States (On-Site)
2 Weeks ago
ByteDance - Machine Learning Scientist Graduate, Scaling AI for Biology (AML - AI-for-Science) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
6 Months ago
Google - Senior Technical Program Manager II, Enterprise Architecture, Core

Google

Sunnyvale, California, United States (On-Site)
1 Week ago
Mythical Games - Senior Analytics Engineer

Mythical Games

United States (Remote)
1 Week ago
Netflix - Data Engineer (L5) - Commerce Product Data Engineering

Netflix

United States (Remote)
1 Week ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Microsoft - Principal Researcher-Artificial Specialized Intelligence

Microsoft

Vancouver, British Columbia, Canada (On-Site)
1 Week ago
Google - Software Engineer II, Education AI Platform

Google

Mexico City, Mexico City, Mexico (On-Site)
1 Week ago
Egnyte - Principal Machine Learning Engineer - AI

Egnyte

India (Remote)
1 Month ago
Zoox - Staff/Senior Staff Software Engineer, ML Performance Optimization

Zoox

Foster City, California, United States (On-Site)
6 Months ago
NVIDIA - Director of Product - AI Training Platform Software

NVIDIA

Canada (On-Site)
1 Month ago
NVIDIA - Technical Marketing Engineer - AI Platform Software

NVIDIA

Canada (Hybrid)
1 Month ago
Google - Senior Software Engineer, Distributed Machine Learning

Google

Mountain View, California, United States (On-Site)
1 Week ago
Google - Software Engineer III, AI/ML Recommendations, Rankings, Predictions, YouTube

Google

Mountain View, California, United States (On-Site)
20 Hours ago
Google - Software Engineer III, AI/ML, Technical Infrastructure

Google

New Taipei, New Taipei City, Taiwan (On-Site)
1 Week ago
Inworld AI - AI Trainer (Contractor) - Writing & Gaming

Inworld AI

Mountain View, California, United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

Dublin, County Dublin, Ireland (On-Site)

New York, New York, United States (On-Site)

Waterloo, Ontario, Canada (On-Site)

Taipei City, Taiwan (On-Site)

San Francisco, California, United States (On-Site)

Saint-Ghislain, Wallonia, Belgium (On-Site)

Bengaluru, Karnataka, India (On-Site)

Austin, Texas, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug