Software Engineering Manager, TPU Systems, Platforms Infrastructure

2 Hours ago • 8-13 Years • Artificial Intelligence

Job Summary

Job Description

This role involves managing a team of engineers developing software for Google's TPU (Tensor Processing Unit) AI chips. Responsibilities include setting team priorities, aligning strategies, providing performance feedback and coaching, developing technical roadmaps, designing systems, reviewing code, and ensuring best practices. The team works on various software components within the TPU stack, from single machine enablement to large-scale AI hypercomputers. The ideal candidate will possess strong technical expertise in C/C++, embedded systems, and technical leadership experience. The work impacts cutting-edge AI innovations across Google and its cloud customers, encompassing all stages from initial design to production deployment and lifecycle management.
Must have:
  • Bachelor's degree in CS/Engineering or equiv.
  • 8+ years software development (C/C++)
  • 3+ years technical leadership
  • 3+ years embedded systems experience
  • Team management & project oversight
Good to have:
  • Master's/PhD in related field
  • Experience with hardware interaction
  • Production monitoring & observability
  • Networking protocols knowledge
  • Machine learning concepts

Job Details


Minimum qualifications:

  • Bachelor's degree in Engineering, Computer Science, or equivalent practical experience.
  • 8 years of experience with software development in C or C++.
  • 3 years of experience in a technical leadership role; overseeing projects, with 2 years of experience in a people management, supervision/team leadership role.
  • 3 years of experience building and developing on embedded systems.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • Experience in developing software that interacts with hardware.
  • Experience in production monitoring, logging, and observability tools.
  • Familiarity with networking protocols and technologies.
  • Familiarity with machine-learning concepts.

About the job

We develop software that enables TPU, Google’s custom-built AI computation chip, to run large-scale AI hypercomputation in Google’s data centers, and thus empowering all the cutting edge AI innovations for Google (Deepmind, Search, Ads, everything) and other Cloud customers.

Our team covers a broad range of software in the TPU software stack, including system software that enable a single TPU machine, superpod software that connects thousands of TPU chips into a AI hypercomputer, and health monitoring software that ensures the TPUs and their interconnection and networking are healthy, etc.

We play a key role in the introduction of each new TPU chip, from design, system bringup, to productionization of individual machines and large-scale AI hypercomputers including thousands of machines. We are involved in all stages of the project from concept, planning, development, deployment, and end of life in the data centers.

Software Engineering Managers have not only the technical expertise to take on and provide technical leadership to major projects, but also manage a team of Engineers. You not only optimize your own code but make sure Engineers are able to optimize theirs. As a Software Engineering Manager you manage your project goals, contribute to product strategy and help develop your team. Teams work all across the company, in areas such as information retrieval, artificial intelligence, natural language processing, distributed computing, large-scale system design, networking, security, data compression, user interface design; the list goes on and is growing every day. Operating with scale and speed, our exceptional software engineers are just getting started -- and as a manager, you guide the way.

The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

Responsibilities

  • Set and communicate team priorities that support the broader organization's goals. Align strategy, processes, and decision-making across teams. 
  • Set clear expectations with individuals based on their level and role and aligned to the broader organization's goals. Meet regularly with individuals to discuss performance and development and provide feedback and coaching. 
  • Develop the mid-term technical vision and roadmap within the scope of your (often multiple) team(s). Evolve the roadmap to meet anticipated future requirements and infrastructure needs. 
  • Design, guide and vet systems designs within the scope of the broader area, and write product or system development code to solve ambiguous problems. 
  • Review code developed by other engineers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).

Similar Jobs

ByteDance - Software Engineer, Cloud Infrastructure

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Google - Senior Engineering Manager, Google Distributed Cloud

Google

Sunnyvale, California, United States (On-Site)
1 Day ago
Google - Network Operations Engineer

Google

Dublin, County Dublin, Ireland (On-Site)
1 Day ago
Google - Systems Engineer III, Site Reliability Engineering, Google Cloud

Google

Seattle, Washington, United States (On-Site)
2 Hours ago
Britive - STRATEGIC ACCOUNT EXECUTIVE

Britive

(Remote)
4 Months ago
NVIDIA - Director of AI Research

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
Inworld AI - AI Trainer (Contractor) - Writing & Gaming

Inworld AI

Mountain View, California, United States (Remote)
1 Month ago
NVIDIA - Senior Deep Learning Performance Architect

NVIDIA

Canada (On-Site)
1 Month ago
ByteDance - Solutions Architect

ByteDance

Gurugram, Haryana, India (On-Site)
3 Days ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Google - Staff Software Engineer, AI/ML, Google Ads

Google

Mountain View, California, United States (On-Site)
1 Day ago
NVIDIA - STA Backend Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
Keywords Studios - Regional Service Delivery Manager

Keywords Studios

United States (On-Site)
1 Week ago
Rackspace Technology - Cloud Architect

Rackspace Technology

India (Remote)
3 Weeks ago
NVIDIA - Senior Chip Design Verification Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
Rackspace Technology - Trainee Cloud Engineer

Rackspace Technology

Dubai, Dubai, United Arab Emirates (Hybrid)
4 Weeks ago
Games For Love - Esports Game Player

Games For Love

Lynnwood, Washington, United States (Remote)
8 Months ago
NVIDIA - Senior Chip Design Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
3 Months ago
Zeta - Sr. Site Reliability Engineer

Zeta

Bengaluru, Karnataka, India (On-Site)
6 Months ago
Google - Staff Software Engineer, Authentication and Autofill, Android

Google

Beijing, Beijing, China (On-Site)
1 Day ago

Get notifed when new similar jobs are uploaded

Jobs in Taipei City, Taiwan

Google - Silicon Quality and Reliability Engineer

Google

Taipei City, Taiwan (On-Site)
3 Hours ago
Google - Firmware Engineer, Modem IP Multimedia Subsystem Protocol

Google

New Taipei, New Taipei City, Taiwan (On-Site)
1 Day ago
Google - Senior Software Engineer, Pixel Sensor

Google

New Taipei, New Taipei City, Taiwan (On-Site)
1 Day ago
Google - Software Engineer II

Google

New Taipei, New Taipei City, Taiwan (On-Site)
1 Day ago
NVIDIA - AI Computing Software Development Engineer, TensorRT

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
1 Month ago
NVIDIA - DFX Software Engineer (RDSS Intern)

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
1 Month ago
Corsair - Labor Hygienist

Corsair

Taiwan (On-Site)
3 Weeks ago
NVIDIA - Diagnostic Software Manager - Server

NVIDIA

Taipei City, Taiwan (On-Site)
3 Weeks ago
Corsair - Junior Customization Design Program Manager

Corsair

Taiwan (On-Site)
3 Weeks ago
Google - Cloud Program Manager, Google Cloud Professional Services

Google

Taipei City, Taiwan (On-Site)
1 Day ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Google - Software Engineer, PhD

Google

Sunnyvale, California, United States (On-Site)
1 Day ago
NVIDIA - Global Developer Relations Account Manager – Ansys

NVIDIA

Canada (On-Site)
1 Month ago
NVIDIA - Senior Application Software Engineer, Performance

NVIDIA

Shanghai, Shanghai, China (On-Site)
2 Weeks ago
GT - AI/ML Engineer

GT

(Remote)
2 Weeks ago
Google - Senior Software Engineer, Core Machine Learning, Google Cloud

Google

New York, New York, United States (On-Site)
5 Months ago
NVIDIA - Deep Learning Intern - Fall 2025

NVIDIA

Shanghai, Shanghai, China (On-Site)
2 Days ago
Orion Innovation - Data Engineer-AI,ML

Orion Innovation

Chennai, Tamil Nadu, India (On-Site)
6 Months ago
Google - Senior Software Engineer, SDLC, Gemini Code Assist

Google

Sunnyvale, California, United States (On-Site)
1 Day ago
Microsoft - Technical Advisor, Microsoft AI

Microsoft

Mountain View, California, United States (Hybrid)
3 Days ago
Pika - Research Engineer (Applied Research)

Pika

Palo Alto, California, United States (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

Seoul, South Korea (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Hyderabad, Telangana, India (On-Site)

Atlanta, Georgia, United States (On-Site)

Fremont, California, United States (On-Site)

Milan, Lombardy, Italy (On-Site)

Eemshaven, Groningen, Netherlands (On-Site)

Bengaluru, Karnataka, India (On-Site)

Sunnyvale, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug