Staff Software Engineer, Machine Learning Performance, TPU

2 Days ago • 8-13 Years • Artificial Intelligence • $197,000 PA - $291,000 PA

Job Summary

Job Description

This Staff Software Engineer role focuses on maximizing the performance of Machine Learning (ML) and Artificial Intelligence (AI) workloads, particularly on TPUs. Responsibilities include establishing and maintaining LLM benchmarks, optimizing ML models through techniques like quantization and sparsity, collaborating with product teams to onboard LLMs onto new TPU hardware, and analyzing performance metrics to identify and resolve bottlenecks. The position requires extensive experience in software development, performance analysis, and ML system understanding. The role involves working with TensorFlow/JAX and contributing to Google's TPU infrastructure.
Must have:
  • 8+ years software product experience
  • 5+ years software development (Python, C, C++)
  • Performance analysis expertise
  • Experience with ML infrastructure
  • TensorFlow/JAX knowledge
Good to have:
  • Master's/PhD in related field
  • Technical leadership experience
  • ML system expertise
  • Compiler optimization experience
Perks:
  • Bonus
  • Equity
  • Benefits

Job Details


Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 8 years of experience in testing, and launching software products.
  • 5 years of experience with software development in one or more programming languages (e.g., Python, C, C++).
  • Experience in performance analysis including system architecture, performance modeling, benchmarking or machine learning infrastructure.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • 3 years of experience in a multiplex organization including technical leadership role leading project teams and setting technical direction.
  • Experience in Machine Learning System (e.g., Background Theory, TensorFlow, etc.).
  • Experience in compiler optimizations or related fields.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

In this role, you will be responsible for the performance and extracting maximum efficiency for Machine Learning (ML) and Artificial Intelligence (AI) workloads. You will drive Google ML performance using fleet-scale and benchmark analysis and auto-optimizations.

The ML, Systems, and Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about .

Responsibilities

  • Identify and maintain Large Language Model (LLM) training and serving benchmarks, used by industry and Machine Learning (ML) community to identify performance opportunities and drive TensorFlow/JAX Tensor Processing Unit (TPU) performance.
  • Work on scaling numeric and algorithmic optimizations to Google products and ML models including quantization, sparsity, and other model compression techniques, new ML model architecture/optimizer/training techniques to solve ML tasks more efficiently.
  • Engage with Google product teams to solve their Large Language Model (LLM) performance problems including onboarding new LLM models and products on Google new TPU hardware, enabling LLMs to train efficiently on thousands of TPUs.
  • Analyze performance and efficiency metrics to identify bottlenecks. Design, and implement solutions at Google.

Similar Jobs

Games talent (Staffing and recruiting) - Senior Data Engineer

Games talent (Staffing and recruiting)

(Remote)
23 Hours ago
Pentair - Engineer- Data Science

Pentair

Noida, Uttar Pradesh, India (On-Site)
20 Hours ago
Hedra - Research Scientist

Hedra

San Francisco, California, United States (On-Site)
1 Month ago
ByteDance - Research Scientist in ML Systems

ByteDance

Seattle, Washington, United States (On-Site)
6 Months ago
NVIDIA - Performance Engineer - Deep Learning

NVIDIA

Santa Clara, California, United States (On-Site)
1 Week ago
NVIDIA - Research Scientist, Deep Learning and Computer Vision

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
2 Months ago
ByteDance - Student Researcher (Doubao (Seed) - Machine Learning System) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
6 Months ago
Bragg - Principal AI/ML Engineer

Bragg

Ljubljana, Ljubljana, Slovenia (Hybrid)
2 Weeks ago
Google - Cloud Product Strategy and Operations Lead

Google

Kirkland, Washington, United States (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

NVIDIA - Senior Solutions Architect, Generative AI

NVIDIA

Mumbai, Maharashtra, India (On-Site)
1 Month ago
Canva - Senior Computer Vision Engineer - Photo AI

Canva

Vienna, Vienna, Austria (Remote)
3 Months ago
Socure - Data Science Intern - DocV

Socure

(Remote)
1 Day ago
Bohemia Interactive - Python Programmer

Bohemia Interactive

Brno, South Moravian Region, Czechia (On-Site)
1 Week ago
ByteDance - Algorithm Engineer - Enterprise Solution R&D

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
Every matrix - Experienced CRM Data Scientist

Every matrix

London, England, United Kingdom (Hybrid)
1 Month ago
Riot Games - Staff Software Engineer, Machine Learning - AI Foundations

Riot Games

United States (On-Site)
2 Weeks ago
Google - Senior ML Systems Engineer, AICore

Google

Taipei City, Taiwan (On-Site)
2 Days ago
Playrix - Generative AI Engineer

Playrix

Cyprus (Remote)
2 Weeks ago
Rackspace Technology - Machine Learning Architect (AWS)

Rackspace Technology

(Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Mountain View, California, United States

Tencent - Partner Development Manager

Tencent

California, United States (On-Site)
1 Month ago
Impact Theory - Executive Assistant to Co-Founder, President & Head of Production

Impact Theory

West Hollywood, California, United States (On-Site)
2 Months ago
Nintendo - DevOps Engineer

Nintendo

Redmond, Washington, United States (On-Site)
3 Months ago
Google - Staff Software Engineer, Google Cloud Business Platforms

Google

Seattle, Washington, United States (On-Site)
2 Days ago
ByteDance - Research Scientist Graduate (Foundation Model - Generative AI) - 2025 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
4 Months ago
ByteDance - Research Scientist Intern (Doubao (Seed) - Machine Learning System) - 2025 Summer (PhD)

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Next Level Business Services - SQL BI Developer

Next Level Business Services

Redmond, Washington, United States (On-Site)
6 Months ago
ByteDance - Video Analysis and Quality Algorithm Intern 2023 Summer/Fall (MS)

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
ByteDance - Backend Software Engineer - Global E-Commerce Supply Chain Operation Platform

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Penumbra - Facilities HVAC Tech I

Penumbra

Alameda, California, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Google - Software Engineer III, AI/ML, Google Play

Google

Mountain View, California, United States (On-Site)
2 Weeks ago
PlayStation Global - Sr. ML Software Engineer

PlayStation Global

United States (Remote)
1 Month ago
Rackspace Technology - Principal MLOps Engineer

Rackspace Technology

San Antonio, Texas, United States (Remote)
1 Month ago
Google - Software Engineer III, Machine Learning Services

Google

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
3 Days ago
Meta - AI Research Scientist, Language - Generative AI

Meta

Burlingame, California, United States (On-Site)
5 Months ago
Lionbridge Games - Games Language AI Specialist (Linguist)

Lionbridge Games

Masovian Voivodeship, Poland (On-Site)
2 Weeks ago
The Walt Disney Company - Senior Machine Learning Engineer - Ad Platforms

The Walt Disney Company

San Francisco, California, United States (On-Site)
2 Months ago
Google - Software Engineer III, AI/ML, Cloud AI

Google

Sunnyvale, California, United States (On-Site)
2 Weeks ago
Inworld AI - Forward Deployed Engineer (AI Gameplay Engineer)

Inworld AI

Vancouver, British Columbia, Canada (On-Site)
1 Month ago
NVIDIA - Senior Software Engineer - Distributed Inference

NVIDIA

Texas, United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

Mountain View, California, United States (On-Site)

Mountain View, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug