Staff Software Engineer, ML Performance, GPUs

2 Months ago • 8-13 Years • $197,000 PA - $291,000 PA

Job Summary

Job Description

This Staff Software Engineer role focuses on ML performance optimization, particularly for LLMs, on Google's GPU infrastructure. Responsibilities include analyzing LLM performance, identifying and maintaining benchmarks, collaborating with product teams to onboard new models, running architecture-level simulations, and implementing performance solutions. The ideal candidate possesses extensive software development experience, expertise in ML infrastructure optimization, GPU programming, and performance analysis, and a strong understanding of LLM training and serving.
Must have:
  • 8+ years software development experience
  • 5+ years ML design & infrastructure optimization
  • Experience with performance analysis & GPU programming
  • Experience testing and launching software products
  • Data structures/algorithms expertise
Good to have:
  • Master's/PhD in relevant field
  • Experience with TensorFlow or other ML tools
  • Compiler optimization experience
  • Experience in complex organizations
  • Architecture analysis and optimization
Perks:
  • Bonus
  • Equity
  • Benefits

Job Details


Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 8 years of experience in software development, and with data structures/algorithms.
  • 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
  • 5 years of experience leading ML design and optimizing ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).
  • Experience with performance analysis and GPU programming.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, a related technical field, or equivalent practical experience.
  • 5 years of experience working in a complex, matrixed organization.
  • Experience with machine learning systems (e.g., background theory, TensorFlow, or other ML tools).
  • Experience working on compiler optimizations or related fields.
  • Experience with architecture analysis and optimization.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about .

Responsibilities

  • Analyze Large Language Model (LLM) performance and optimizations for partner teams including Google Gemini, Search, Cloud LLM and Application programming interfaces (APIs).
  • Identify and maintain LLM training and serving benchmarks, and use them to identify performance opportunities and drive Accelerated Linear Algebra (XLA):GPU/Triton performance and to guide future XLA releases.
  • Engage with Google Product teams, to solve their ML model performance challenges, including onboarding new LLM models and products onto Google’s GPU hardware and enabling LLMs to train efficiently on a very large scale (i.e., thousands of GPUs).
  • Run architecture-level simulations on GPU designs and perform roofline analysis to guide partner teams.
  • Analyze performance and efficiency metrics to identify bottlenecks, design, and implement solutions.

Similar Jobs

Loyalty Juggernaut - Senior Product Engineer (ML)

Loyalty Juggernaut

Hyderabad, Telangana, India (On-Site)
1 Year ago
bytedance - Senior Research Scientist, Foundation Model, Speech Understanding

bytedance

Seattle, Washington, United States (On-Site)
8 Months ago
Scale AI - Machine Learning Engineer, GenAI Applied ML

Scale AI

San Francisco, California, United States (On-Site)
2 Months ago
NVIDIA - Deep Learning Intern - Fall 2025

NVIDIA

Shanghai, Shanghai, China (On-Site)
2 Months ago
PwC - IN-Senior Associate_ML Engineer_Data and Analytics_Advisory_Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
8 Months ago
bytedance - Research Scientist in Foundation Model (Music) - 2025 Start (PhD)

bytedance

San Jose, California, United States (On-Site)
8 Months ago
Canva - Senior Applied Scientist - AI Research

Canva

Surry Hills, New South Wales, Australia (Remote)
3 Months ago
Tesla - Senior Machine Learning, AI Engineer

Tesla

Brandenburg, Germany (On-Site)
4 Months ago
Microsoft - Engineering Manager

Microsoft

Mountain View, California, United States (Hybrid)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

bytedance - Research Scientist- Applied Machine learning Graduates (AML) - 2024 Start (PhD)

bytedance

San Jose, California, United States (On-Site)
8 Months ago
Monzo - Senior Staff Machine Learning Scientist

Monzo

London, England, United Kingdom (Hybrid)
1 Month ago
Thales - Quantum-AI Research Scientist

Thales

Montreal, Quebec, Canada (On-Site)
1 Month ago
Reddit - Machine Learning Manager - Ads Engagement Modeling

Reddit

Canada (Remote)
1 Month ago
The Walt Disney Company - Lead Software Engineer - Applied AI & Machine Learning

The Walt Disney Company

Santa Monica, California, United States (On-Site)
2 Months ago
Rackspace Technology - Machine Learning Architect (AWS)

Rackspace Technology

(Remote)
2 Months ago
Reddit - Senior Machine Learning Engineer

Reddit

London, England, United Kingdom (On-Site)
1 Month ago
Ion - Senior AI Engineer, Italy

Ion

Pisa, Tuscany, Italy (On-Site)
8 Months ago
Match Group - Machine Learning Engineer (MG AI)

Match Group

Seoul, South Korea (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Sunnyvale, California, United States

Valve corporation - Level Designer

Valve corporation

Bellevue, Washington, United States (On-Site)
7 Months ago
CyberArk - Global Partner Program Manager

CyberArk

United States (On-Site)
1 Month ago
that game company - Gameplay Engineer

that game company

United States (Remote)
3 Months ago
Zones - Human Resources Business Partner - Sales

Zones

Auburn, Washington, United States (Hybrid)
2 Months ago
Nintendo - Software Engineer I, Game Development

Nintendo

Redmond, Washington, United States (Hybrid)
5 Months ago
The Walt Disney Company - Animal Health Lab Technician

The Walt Disney Company

Lake Buena Vista, Florida, United States (On-Site)
2 Months ago
NVIDIA - Senior Developer Technology Engineer, Public Sector

NVIDIA

Washington, District Of Columbia, United States (Remote)
3 Months ago
Ariens Company - Senior Industrial Electrician

Ariens Company

Fayetteville, Tennessee, United States (On-Site)
2 Months ago
Saviynt - Principal Engineer, Quality Engineering

Saviynt

El Segundo, California, United States (Hybrid)
8 Months ago
Next Level Business Services - Java Developer

Next Level Business Services

El Segundo, California, United States (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Meta - Research Intern, Computer Vision for Egocentric Representation Learning (PhD)

Meta

Redmond, Washington, United States (On-Site)
7 Months ago
Google - Staff Software Engineer, Network Management

Google

Sunnyvale, California, United States (On-Site)
2 Months ago
bytedance - Research Scientist, Foundation Model, Vision

bytedance

Singapore (On-Site)
8 Months ago
zoox - Staff/Senior Staff Software Engineer, ML Performance Optimization

zoox

Foster City, California, United States (On-Site)
8 Months ago
AI Fund - General Manager - New Business Unit (College Admissions)

AI Fund

California, United States (Remote)
8 Months ago
Epic Games - Machine Learning Engineer

Epic Games

London, England, United Kingdom (On-Site)
3 Months ago
Meta - Software Engineer, Systems ML - SW/HW Co-design

Meta

Fremont, California, United States (Remote)
7 Months ago
bytedance - Research Scientist Graduate (Foundation Model - Generative AI) - 2025 Start (PhD)

bytedance

Seattle, Washington, United States (On-Site)
6 Months ago
Google - Staff Software Engineer, AI/ML

Google

Sunnyvale, California, United States (On-Site)
2 Months ago
bytedance - Student Researcher (Doubao (Seed) - Foundation Model - Speech & Audio) - 2025 Start (PhD)

bytedance

Seattle, Washington, United States (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Atlanta, Georgia, United States (On-Site)

London, England, United Kingdom (On-Site)

Taipei City, Taiwan (On-Site)

Kirkland, Washington, United States (On-Site)

Sunnyvale, California, United States (On-Site)

Sunnyvale, California, United States (On-Site)

Sunnyvale, California, United States (On-Site)

Kraków, Lesser Poland Voivodeship, Poland (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug