Staff Software Engineer, Machine Learning Performance, TPU

1 Month ago • 8-13 Years • Artificial Intelligence • $197,000 PA - $291,000 PA

Job Summary

Job Description

This Staff Software Engineer role focuses on maximizing the performance of Machine Learning (ML) and Artificial Intelligence (AI) workloads, particularly on TPUs. Responsibilities include establishing and maintaining LLM benchmarks, optimizing ML models through techniques like quantization and sparsity, collaborating with product teams to onboard LLMs onto new TPU hardware, and analyzing performance metrics to identify and resolve bottlenecks. The position requires extensive experience in software development, performance analysis, and ML system understanding. The role involves working with TensorFlow/JAX and contributing to Google's TPU infrastructure.
Must have:
  • 8+ years software product experience
  • 5+ years software development (Python, C, C++)
  • Performance analysis expertise
  • Experience with ML infrastructure
  • TensorFlow/JAX knowledge
Good to have:
  • Master's/PhD in related field
  • Technical leadership experience
  • ML system expertise
  • Compiler optimization experience
Perks:
  • Bonus
  • Equity
  • Benefits

Job Details


Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 8 years of experience in testing, and launching software products.
  • 5 years of experience with software development in one or more programming languages (e.g., Python, C, C++).
  • Experience in performance analysis including system architecture, performance modeling, benchmarking or machine learning infrastructure.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • 3 years of experience in a multiplex organization including technical leadership role leading project teams and setting technical direction.
  • Experience in Machine Learning System (e.g., Background Theory, TensorFlow, etc.).
  • Experience in compiler optimizations or related fields.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

In this role, you will be responsible for the performance and extracting maximum efficiency for Machine Learning (ML) and Artificial Intelligence (AI) workloads. You will drive Google ML performance using fleet-scale and benchmark analysis and auto-optimizations.

The ML, Systems, and Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about .

Responsibilities

  • Identify and maintain Large Language Model (LLM) training and serving benchmarks, used by industry and Machine Learning (ML) community to identify performance opportunities and drive TensorFlow/JAX Tensor Processing Unit (TPU) performance.
  • Work on scaling numeric and algorithmic optimizations to Google products and ML models including quantization, sparsity, and other model compression techniques, new ML model architecture/optimizer/training techniques to solve ML tasks more efficiently.
  • Engage with Google product teams to solve their Large Language Model (LLM) performance problems including onboarding new LLM models and products on Google new TPU hardware, enabling LLMs to train efficiently on thousands of TPUs.
  • Analyze performance and efficiency metrics to identify bottlenecks. Design, and implement solutions at Google.

Similar Jobs

Ubisoft - Senior R&D Engineer

Ubisoft

Pune, Maharashtra, India (Hybrid)
1 Month ago
ByteDance - Machine Learning Engineer - Inference

ByteDance

Seattle, Washington, United States (On-Site)
2 Months ago
Canva - Senior Computer Vision Engineer - Photo AI

Canva

Vienna, Vienna, Austria (Remote)
2 Months ago
PwC - Senior Data Scientist

PwC

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (On-Site)
8 Months ago
Qualcomm - Software Engineer, Machine Learning Group

Qualcomm

San Diego, California, United States (On-Site)
3 Weeks ago
Microsoft - Member of Technical Staff, AI Pretraining Platform

Microsoft

London, England, United Kingdom (On-Site)
1 Month ago
Microsoft - Engineering Manager

Microsoft

Mountain View, California, United States (Hybrid)
1 Month ago
ByteDance - Research Scientist in Foundation Model, Speech Understanding - 2025 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
7 Months ago
NVIDIA - Deep Learning Intern - Fall 2025

NVIDIA

Shanghai, Shanghai, China (On-Site)
1 Month ago
ByteDance - Research Scientist Graduate (Foundation Model, Video Generation) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

INTEL - AI Frameworks Engineer

INTEL

Phoenix, Arizona, United States (Hybrid)
3 Weeks ago
Samsung Semiconductor - Staff Engineer, AI

Samsung Semiconductor

San Jose, California, United States (On-Site)
1 Month ago
NVIDIA - Senior Research Engineer for Reinforcement Learning

NVIDIA

Canada (On-Site)
3 Months ago
ByteDance - Tech Lead Machine Learning Engineer

ByteDance

Seattle, Washington, United States (On-Site)
2 Months ago
Moloco - Machine Learning Engineer

Moloco

Seoul, South Korea (On-Site)
1 Month ago
Upwork - Lead AI/ML Engineer - AI Agents

Upwork

(Remote)
1 Month ago
Unity - Senior Machine Learning/MLOps Engineer

Unity

San Francisco, California, United States (On-Site)
1 Month ago
ByteDance - Machine Learning Engineer Graduate (AML Algorithm) - 2025 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
7 Months ago
Granicus - Data Scientist 4

Granicus

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
ByteDance - Research Scientist in Foundation Model (Music) - 2025 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Mountain View, California, United States

PlayStation Global - Technical Product Manager II

PlayStation Global

Aliso Viejo, California, United States (Hybrid)
2 Months ago
Philips - Sales, Territory Manager - Sleep and Respiratory Care

Philips

San Diego, California, United States (On-Site)
3 Weeks ago
Linden Lab - Customer Success Specialist

Linden Lab

Atlanta, Georgia, United States (On-Site)
4 Months ago
Falcon X - Senior Manager, FP&A

Falcon X

New York, New York, United States (Hybrid)
1 Month ago
Visa - Visa Consulting Analytics (VCA) Analyst, New College Grad - 2025

Visa

Atlanta, Georgia, United States (Hybrid)
3 Weeks ago
Apple - Analog Mixed Signal IP Post Silicon Validation – DDR Memory

Apple

Cupertino, California, United States (On-Site)
3 Weeks ago
Meta - Art Director

Meta

New York, New York, United States (On-Site)
6 Months ago
2K - Senior Release Manager

2K

Las Vegas, Nevada, United States (On-Site)
5 Months ago
HCL Tech - Technical Architect

HCL Tech

Texas, United States (On-Site)
2 Weeks ago
ByteDance - Software Engineer / Researcher, AI-Native Database Systems

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Virtuos - Senior Machine Learning Engineer (Game)

Virtuos

Singapore (On-Site)
2 Months ago
Google - Senior Software Engineer, AI/ML GenAI, Google Cloud Business Platforms

Google

Seattle, Washington, United States (On-Site)
1 Month ago
NVIDIA - Solutions Architect, AI and ML

NVIDIA

Redmond, Washington, United States (On-Site)
2 Months ago
Google - Physical Verification and Convergence Engineer

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
ByteDance - Researcher Graduate (Applied Machine Learning - Enterprise)

ByteDance

San Jose, California, United States (On-Site)
1 Month ago
Tencent - NLP Research Intern 104493

Tencent

London, England, United Kingdom (On-Site)
5 Months ago
SparkCognition - Data Scientist

SparkCognition

Bengaluru, Karnataka, India (On-Site)
8 Months ago
Ubisoft - Programmeur senior ML _ Groupe Technologique Création de Contenu

Ubisoft

Montreal, Quebec, Canada (On-Site)
5 Months ago
C3 AI - Solution Engineer

C3 AI

Bengaluru, Karnataka, India (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

About The Company

London, England, United Kingdom (On-Site)

Bengaluru, Karnataka, India (On-Site)

Mountain View, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Taipei City, Taiwan (On-Site)

Zürich, Zurich, Switzerland (On-Site)

Kirkland, Washington, United States (On-Site)

New Taipei, New Taipei City, Taiwan (On-Site)

Seattle, Washington, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug