Staff Software Engineer, ML Performance, GPUs

2 Hours ago • 8-13 Years • Artificial Intelligence • $197,000 PA - $291,000 PA

Job Summary

Job Description

This Staff Software Engineer role focuses on ML performance optimization, particularly for LLMs, using GPUs. Responsibilities include analyzing LLM performance, identifying and maintaining benchmarks, collaborating with product teams to solve performance challenges, running architecture-level simulations, and implementing solutions to improve efficiency. The ideal candidate possesses extensive experience in software development, ML design, GPU programming, and performance analysis, with a strong background in data structures, algorithms, and software architecture. They will work with Google's Gemini, Search, Cloud LLM, and APIs.
Must have:
  • 8+ years software development experience
  • 5+ years ML design & infrastructure optimization
  • Experience with GPU programming & performance analysis
  • Data structures/algorithms expertise
  • Software design & architecture experience
Good to have:
  • Master's or PhD in related field
  • Experience with TensorFlow or other ML tools
  • Compiler optimization experience
  • Architecture analysis and optimization experience
  • Experience in a complex organization
Perks:
  • Bonus
  • Equity
  • Benefits

Job Details


Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 8 years of experience in software development, and with data structures/algorithms.
  • 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
  • 5 years of experience leading ML design and optimizing ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).
  • Experience with performance analysis and GPU programming.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, a related technical field, or equivalent practical experience.
  • 5 years of experience working in a complex, matrixed organization.
  • Experience with machine learning systems (e.g., background theory, TensorFlow, or other ML tools).
  • Experience working on compiler optimizations or related fields.
  • Experience with architecture analysis and optimization.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about .

Responsibilities

  • Analyze Large Language Model (LLM) performance and optimizations for partner teams including Google Gemini, Search, Cloud LLM and Application programming interfaces (APIs).
  • Identify and maintain LLM training and serving benchmarks, and use them to identify performance opportunities and drive Accelerated Linear Algebra (XLA):GPU/Triton performance and to guide future XLA releases.
  • Engage with Google Product teams, to solve their ML model performance challenges, including onboarding new LLM models and products onto Google’s GPU hardware and enabling LLMs to train efficiently on a very large scale (i.e., thousands of GPUs).
  • Run architecture-level simulations on GPU designs and perform roofline analysis to guide partner teams.
  • Analyze performance and efficiency metrics to identify bottlenecks, design, and implement solutions.

Similar Jobs

PlayStation Global - Machine Learning Engineer for Game Technology

PlayStation Global

London, England, United Kingdom (On-Site)
9 Months ago
ByteDance - Research Scientist, Reinforcement Learning

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Razer - Solutions Architect

Razer

Singapore (On-Site)
6 Months ago
Samsung Semiconductor - Senior Engineer, AI

Samsung Semiconductor

San Jose, California, United States (Hybrid)
6 Months ago
Canva - Machine Learning Engineer Intern

Canva

Sydney, New South Wales, Australia (Remote)
1 Week ago
Google - Software Engineer III, AI/ML, Platforms and Devices

Google

Bengaluru, Karnataka, India (On-Site)
1 Day ago
Google - Silicon Architecture/Design Engineer

Google

Bengaluru, Karnataka, India (On-Site)
5 Hours ago
ByteDance - Student Researcher (Doubao (Seed) - Foundation Model - Speech & Audio) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
NVIDIA - Research Scientist, Deep Learning and Computer Vision

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
1 Month ago
Google - Software Engineer III, AI/ML, Technical Infrastructure

Google

New Taipei, New Taipei City, Taiwan (On-Site)
1 Day ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Google - Customer Engineer, AI Infrastructure

Google

Seattle, Washington, United States (On-Site)
1 Day ago
Google - Software Engineer, Research, Computational Imaging

Google

Mountain View, California, United States (On-Site)
1 Day ago
Stylumia - Senior Machine Learning Engineer - Time Series & Computer Vision

Stylumia

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
WebFX - Full Stack JavaScript Developer (Remote PH)

WebFX

Philippines (Remote)
5 Months ago
ByteDance - Research Scientist in Machine Learning for Science (AML - AI-for-Science) - 2024 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
Google - Software Engineering Manager, Google Kubernetes AI Infrastructure

Google

Kirkland, Washington, United States (On-Site)
1 Day ago
NVIDIA - Performance Engineer Intern, Deep Learning and HPC

NVIDIA

Shanghai, Shanghai, China (On-Site)
3 Months ago
ByteDance - Algorithm Engineer - Enterprise Solution RD

ByteDance

San Jose, California, United States (On-Site)
3 Days ago
ByteDance - Software Engineer, ML System Scheduling

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Light Speed Studios - Game AI Researcher

Light Speed Studios

Tokyo, Japan (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Sunnyvale, California, United States

Google - Senior Staff Software Engineer, Full Stack, Google Ads

Google

Mountain View, California, United States (On-Site)
1 Day ago
Salesforce - Vice President, Product Research & Insights

Salesforce

San Francisco, California, United States (Remote)
2 Weeks ago
Google - Senior Account Manager, Large Customer Sales

Google

Chicago, Illinois, United States (On-Site)
3 Hours ago
ByteDance - Site Reliability Engineer, Traffic Platform

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
Epic Games - Senior Policy Manager, Trust & Safety

Epic Games

Cary, North Carolina, United States (On-Site)
3 Months ago
Epic Games - Principal Engine Programmer, Verse Framework

Epic Games

Cary, North Carolina, United States (On-Site)
2 Months ago
Skybound Entertainment - Senior Franchise Producer, The Walking Dead

Skybound Entertainment

Los Angeles, California, United States (On-Site)
5 Months ago
Microsoft - Research Intern - Multimodal AI Research

Microsoft

Redmond, Washington, United States (On-Site)
1 Day ago
DraftKings - Lottery Fulfillment Supervisor

DraftKings

Tempe, Arizona, United States (On-Site)
1 Month ago
Universal Music - Manager, Revenue

Universal Music

Los Angeles, California, United States (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

NVIDIA - Solutions Architect for NCP

NVIDIA

Dubai, Dubai, United Arab Emirates (On-Site)
2 Days ago
PwC - Associate

PwC

Bengaluru, Karnataka, India (On-Site)
5 Months ago
NVIDIA - Senior Applied LLM Engineer, AI – Chip Design

NVIDIA

Canada (On-Site)
1 Month ago
Google - Software Engineer III, XBorg, Google Cloud

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Day ago
Meta - Research Scientist Intern, Machine Perception for Input and Interaction (PhD)

Meta

Redmond, Washington, United States (On-Site)
5 Months ago
Google - Applied ML Engineer for AICore

Google

Taipei City, Taiwan (On-Site)
1 Day ago
Lionbridge Games - Games Language AI Specialist (Linguist)

Lionbridge Games

Masovian Voivodeship, Poland (On-Site)
3 Days ago
Canva - Senior Computer Vision Engineer - Photo AI

Canva

Vienna, Vienna, Austria (Remote)
2 Months ago
ByteDance - Research Scientist in Foundation Model, Music Core Machine Learning Graduates - 2024 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Keywords Studios - AI - Senior Research Associate (Prompts)

Keywords Studios

Silesian Voivodeship, Poland (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

Hyderabad, Telangana, India (On-Site)

Fremont, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

London, England, United Kingdom (On-Site)

Atlanta, Georgia, United States (On-Site)

San Francisco, California, United States (On-Site)

Atlanta, Georgia, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug