Software Engineering Manager, TPU Systems, Platforms Infrastructure

1 Month ago • 8-13 Years • Artificial Intelligence

Job Summary

Job Description

This role involves managing a team of engineers developing software for Google's TPU (Tensor Processing Unit) AI chips. Responsibilities include setting team priorities, aligning strategies, providing performance feedback and coaching, developing technical roadmaps, designing systems, reviewing code, and ensuring best practices. The team works on various software components within the TPU stack, from single machine enablement to large-scale AI hypercomputers. The ideal candidate will possess strong technical expertise in C/C++, embedded systems, and technical leadership experience. The work impacts cutting-edge AI innovations across Google and its cloud customers, encompassing all stages from initial design to production deployment and lifecycle management.
Must have:
  • Bachelor's degree in CS/Engineering or equiv.
  • 8+ years software development (C/C++)
  • 3+ years technical leadership
  • 3+ years embedded systems experience
  • Team management & project oversight
Good to have:
  • Master's/PhD in related field
  • Experience with hardware interaction
  • Production monitoring & observability
  • Networking protocols knowledge
  • Machine learning concepts

Job Details


Minimum qualifications:

  • Bachelor's degree in Engineering, Computer Science, or equivalent practical experience.
  • 8 years of experience with software development in C or C++.
  • 3 years of experience in a technical leadership role; overseeing projects, with 2 years of experience in a people management, supervision/team leadership role.
  • 3 years of experience building and developing on embedded systems.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • Experience in developing software that interacts with hardware.
  • Experience in production monitoring, logging, and observability tools.
  • Familiarity with networking protocols and technologies.
  • Familiarity with machine-learning concepts.

About the job

We develop software that enables TPU, Google’s custom-built AI computation chip, to run large-scale AI hypercomputation in Google’s data centers, and thus empowering all the cutting edge AI innovations for Google (Deepmind, Search, Ads, everything) and other Cloud customers.

Our team covers a broad range of software in the TPU software stack, including system software that enable a single TPU machine, superpod software that connects thousands of TPU chips into a AI hypercomputer, and health monitoring software that ensures the TPUs and their interconnection and networking are healthy, etc.

We play a key role in the introduction of each new TPU chip, from design, system bringup, to productionization of individual machines and large-scale AI hypercomputers including thousands of machines. We are involved in all stages of the project from concept, planning, development, deployment, and end of life in the data centers.

Software Engineering Managers have not only the technical expertise to take on and provide technical leadership to major projects, but also manage a team of Engineers. You not only optimize your own code but make sure Engineers are able to optimize theirs. As a Software Engineering Manager you manage your project goals, contribute to product strategy and help develop your team. Teams work all across the company, in areas such as information retrieval, artificial intelligence, natural language processing, distributed computing, large-scale system design, networking, security, data compression, user interface design; the list goes on and is growing every day. Operating with scale and speed, our exceptional software engineers are just getting started -- and as a manager, you guide the way.

The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

Responsibilities

  • Set and communicate team priorities that support the broader organization's goals. Align strategy, processes, and decision-making across teams. 
  • Set clear expectations with individuals based on their level and role and aligned to the broader organization's goals. Meet regularly with individuals to discuss performance and development and provide feedback and coaching. 
  • Develop the mid-term technical vision and roadmap within the scope of your (often multiple) team(s). Evolve the roadmap to meet anticipated future requirements and infrastructure needs. 
  • Design, guide and vet systems designs within the scope of the broader area, and write product or system development code to solve ambiguous problems. 
  • Review code developed by other engineers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).

Similar Jobs

N-iX - Senior DevOps Engineer

N-iX

Ukraine (Remote)
3 Months ago
Aryaka - Technical Lead - Automation

Aryaka

Bengaluru, Karnataka, India (On-Site)
1 Year ago
Palo Alto Networks - Senior Technical Support Engineer - Focused Services

Palo Alto Networks

Amsterdam, North Holland, Netherlands (On-Site)
2 Weeks ago
Veeam Software - Virtualization Backup Engineer (German Speaker)

Veeam Software

Bucharest, Bucharest, Romania (On-Site)
1 Month ago
Polygon Labs - Business Development Manager (Payments) APAC

Polygon Labs

Hong Kong, Hong Kong (Remote)
2 Months ago
Meta - Software Engineer, Systems ML - SW/HW Co-design

Meta

Austin, Texas, United States (On-Site)
6 Months ago
NetEase Games - Game AI Research Leader

NetEase Games

Singapore (On-Site)
3 Months ago
Google - Machine Learning Algorithm Engineer, Silicon

Google

Mountain View, California, United States (On-Site)
1 Month ago
Google - Software Engineer III, AI/ML, Google Cloud AI

Google

Sunnyvale, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Site Reliability Engineer, Edge Services

ByteDance

Boston, Massachusetts, United States (On-Site)
7 Months ago
Google - Software Engineer III, Front End, Google Cloud AI

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
Gigamon - Sales Engineer

Gigamon

Orange, California, United States (On-Site)
3 Months ago
ILogos Game Studios - Business Development Manager

ILogos Game Studios

(Remote)
3 Months ago
PwC - FY25 - Talent Pool - Consulting - Associate

PwC

Jakarta, Jakarta, Indonesia (On-Site)
8 Months ago
NVIDIA - Senior VLSI Integration Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (Hybrid)
3 Months ago
Google - Engineering Manager, Google Distributed Cloud air-gapped

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
NVIDIA - Senior High Speed Optical Transceiver Design Engineer

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
1 Month ago
Google - Software Engineer III, CorpEng

Google

Hyderabad, Telangana, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Taipei City, Taiwan

Crypto - Senior Designer (2D/3D Motion)

Crypto

Taipei City, Taiwan (Remote)
10 Months ago
Google - Thermal Test Engineer

Google

Taipei City, Taiwan (On-Site)
1 Month ago
Qualcomm - Director, Hardware Program Management

Qualcomm

Hsinchu City, Taiwan (On-Site)
1 Week ago
Corsair - HR Specialist

Corsair

Taiwan (On-Site)
1 Month ago
Google - Senior Software Engineer, Pixel Sensor

Google

New Taipei, New Taipei City, Taiwan (On-Site)
1 Month ago
Ansys - Technical Account Manager

Ansys

Taipei City, Taiwan (On-Site)
1 Week ago
NVIDIA - Senior Mechanical Application Engineer

NVIDIA

Taipei City, Taiwan (On-Site)
1 Month ago
NVIDIA - AI Algorithms Software Engineer (RDSS Intern)

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
4 Months ago
Corsair gaming - Product Manager

Corsair gaming

New Taipei City, Taiwan (On-Site)
2 Weeks ago
NVIDIA - Silicon Validation Engineer (RDSS Intern)

NVIDIA

Taipei City, Taiwan (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Ubisoft - Scientifique principal en données ML _ Groupe Technologique Content Creation

Ubisoft

Montreal, Quebec, Canada (On-Site)
5 Months ago
Meta - Research Engineer - Conversational AI - Reality Labs

Meta

Menlo Park, California, United States (On-Site)
1 Month ago
Google - Applied AI Customer Engineer, Google Cloud

Google

Paris, Île-de-France, France (On-Site)
1 Month ago
ByteDance - Software Engineer Intern (Doubao (Seed) - Machine Learning System) - 2025 Summer (MS)

ByteDance

San Jose, California, United States (On-Site)
7 Months ago
Google - Software Engineer III, AI/ML, Google Cloud AI

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
Zoox - Staff/Senior Staff Software Engineer, ML Performance Optimization

Zoox

Foster City, California, United States (On-Site)
7 Months ago
Microsoft - Senior Researcher

Microsoft

Singapore (On-Site)
1 Month ago
Google - Staff Software Engineer, Generative AI, Google Workspace

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
Google - Staff Image Quality Evaluation Engineer, Silicon

Google

Mountain View, California, United States (On-Site)
1 Month ago
ByteDance - Student Researcher (Doubao (Seed) - Foundation Model - Vision Generative AI)

ByteDance

San Jose, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

London, England, United Kingdom (On-Site)

Bengaluru, Karnataka, India (On-Site)

Mountain View, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Taipei City, Taiwan (On-Site)

Zürich, Zurich, Switzerland (On-Site)

Kirkland, Washington, United States (On-Site)

New Taipei, New Taipei City, Taiwan (On-Site)

Seattle, Washington, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug