Staff Software Engineer, ML Performance, GPUs

1 Month ago • 8-13 Years • Artificial Intelligence • $197,000 PA - $291,000 PA

Job Summary

Job Description

This Staff Software Engineer role focuses on ML performance optimization, particularly for LLMs, on Google's GPU infrastructure. Responsibilities include analyzing LLM performance, identifying and maintaining benchmarks, collaborating with product teams to onboard new models, running architecture-level simulations, and implementing performance solutions. The ideal candidate possesses extensive software development experience, expertise in ML infrastructure optimization, GPU programming, and performance analysis, and a strong understanding of LLM training and serving.
Must have:
  • 8+ years software development experience
  • 5+ years ML design & infrastructure optimization
  • Experience with performance analysis & GPU programming
  • Experience testing and launching software products
  • Data structures/algorithms expertise
Good to have:
  • Master's/PhD in relevant field
  • Experience with TensorFlow or other ML tools
  • Compiler optimization experience
  • Experience in complex organizations
  • Architecture analysis and optimization
Perks:
  • Bonus
  • Equity
  • Benefits

Job Details


Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 8 years of experience in software development, and with data structures/algorithms.
  • 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
  • 5 years of experience leading ML design and optimizing ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).
  • Experience with performance analysis and GPU programming.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, a related technical field, or equivalent practical experience.
  • 5 years of experience working in a complex, matrixed organization.
  • Experience with machine learning systems (e.g., background theory, TensorFlow, or other ML tools).
  • Experience working on compiler optimizations or related fields.
  • Experience with architecture analysis and optimization.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about .

Responsibilities

  • Analyze Large Language Model (LLM) performance and optimizations for partner teams including Google Gemini, Search, Cloud LLM and Application programming interfaces (APIs).
  • Identify and maintain LLM training and serving benchmarks, and use them to identify performance opportunities and drive Accelerated Linear Algebra (XLA):GPU/Triton performance and to guide future XLA releases.
  • Engage with Google Product teams, to solve their ML model performance challenges, including onboarding new LLM models and products onto Google’s GPU hardware and enabling LLMs to train efficiently on a very large scale (i.e., thousands of GPUs).
  • Run architecture-level simulations on GPU designs and perform roofline analysis to guide partner teams.
  • Analyze performance and efficiency metrics to identify bottlenecks, design, and implement solutions.

Similar Jobs

Google - Customer Engineer II, Cloud AI, Google Cloud

Google

San Francisco, California, United States (On-Site)
1 Month ago
Every matrix - LLM Algorithm Engineer

Every matrix

Changsha, Hunan, China (On-Site)
1 Month ago
bytedance - Research Engineer (Machine Learning Training System) - 2025 Start

bytedance

Singapore (On-Site)
7 Months ago
Qualcomm - Senior Engineer

Qualcomm

San Diego, California, United States (On-Site)
2 Weeks ago
N-ix - Middle Data Science/AI Engineer

N-ix

Langenfeld, North Rhine-Westphalia, Germany (Hybrid)
3 Weeks ago
NVIDIA - Principal Engineer

NVIDIA

United States (Remote)
3 Months ago
AI Fund - AI Fund-Principal

AI Fund

Palo Alto, California, United States (Hybrid)
7 Months ago
Hedra - Senior Research Engineer

Hedra

San Francisco, California, United States (On-Site)
2 Months ago
Google - Software Engineer III, AI/ML GenAI

Google

New York, New York, United States (On-Site)
1 Month ago
NVIDIA - Senior Research Engineer, Foundation Model Training Infrastructure

NVIDIA

Santa Clara, California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Canva - Senior Computer Vision Engineer - Photo AI

Canva

London, England, United Kingdom (Remote)
2 Months ago
Moloco - Staff Machine Learning Engineer

Moloco

Seoul, South Korea (On-Site)
2 Weeks ago
Ubisoft - Senior ML Data Scientist

Ubisoft

Montreal, Quebec, Canada (On-Site)
2 Months ago
Electronic Arts - Data Science Engineer

Electronic Arts

Hyderabad, Telangana, India (Hybrid)
4 Days ago
Meta - Research Scientist Intern, Machine Perception for Input and Interaction (PhD)

Meta

Pittsburgh, Pennsylvania, United States (On-Site)
6 Months ago
Tekion Corp - Machine Learning Architect

Tekion Corp

Pleasanton, California, United States (On-Site)
1 Month ago
Nintendo - Senior Data Scientist

Nintendo

Redmond, Washington, United States (On-Site)
4 Months ago
Riot Games - Staff Software Engineer, Machine Learning - AI Foundations

Riot Games

United States (On-Site)
1 Month ago
Zurora - AI/ML Engineer

Zurora

Chennai, Tamil Nadu, India (Hybrid)
3 Weeks ago
Thales - Quantum-AI Research Scientist

Thales

Montreal, Quebec, Canada (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Sunnyvale, California, United States

bytedance - Software Engineer Intern (AI Platform)

bytedance

San Jose, California, United States (On-Site)
1 Month ago
Riot Games - Staff Software Engineer, Gameplay & Simulation

Riot Games

Los Angeles, California, United States (On-Site)
1 Month ago
Penn Interactive - Technical Project Manager, Digital

Penn Interactive

Philadelphia, Pennsylvania, United States (Hybrid)
2 Months ago
SBM Management - Custodial Lead

SBM Management

Irving, Texas, United States (On-Site)
1 Month ago
USE Insider - Customer Success Manager - US

USE Insider

United States (On-Site)
7 Months ago
Riot Games - Principal Insights Researcher - VALORANT

Riot Games

Los Angeles, California, United States (On-Site)
1 Month ago
Apple - Admin Assistant

Apple

Sunnyvale, California, United States (On-Site)
2 Weeks ago
Google - Software Engineer III, Infrastructure, Google Cloud Platforms

Google

Kirkland, Washington, United States (On-Site)
6 Months ago
Riot Games - 3D Character Art Lead - Unannounced R&D Product

Riot Games

Los Angeles, California, United States (On-Site)
1 Month ago
Ion - Senior Business Consultant - Aspect

Ion

Houston, Texas, United States (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Google - Software Engineer III, Artificial Intelligence/Machine Learning

Google

Hyderabad, Telangana, India (On-Site)
1 Month ago
Genies - 2025 Summer Backend Engineer Intern

Genies

San Mateo, California, United States (On-Site)
2 Months ago
Keywords Studios - Technical Research Associate - AI

Keywords Studios

(Remote)
2 Months ago
PwC - Risk Services - AI Solution Specialist

PwC

Singapore (On-Site)
8 Months ago
Krafton - [Global Strategy & BD Div.] Strategy Manager(AI Ethics) (4년 ~ 7년)

Krafton

Seoul, South Korea (On-Site)
5 Months ago
Meta - AI Research Scientist, Language - Generative AI

Meta

Menlo Park, California, United States (On-Site)
6 Months ago
NVIDIA - Principal Technical Program Manager, AI and Enterprise Apps

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
Smilegate - AI Data Curation Specialist

Smilegate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
1 Month ago
Ion - AI Engineer - Graduate Development Program

Ion

Pisa, Tuscany, Italy (On-Site)
7 Months ago
NVIDIA - Senior Applied LLM Engineer, AI – Chip Design

NVIDIA

Santa Clara, California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

London, England, United Kingdom (On-Site)

Bengaluru, Karnataka, India (On-Site)

Mountain View, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Taipei City, Taiwan (On-Site)

Zürich, Zurich, Switzerland (On-Site)

Kirkland, Washington, United States (On-Site)

New Taipei, New Taipei City, Taiwan (On-Site)

Seattle, Washington, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug