Staff Software Engineer, Google Compute Engine, Telemetry Insights

4 Days ago • 8-13 Years • Artificial Intelligence • $197,000 PA - $291,000 PA

Job Summary

Job Description

This Staff Software Engineer role focuses on Google Compute Engine (GCE) fleet observability and reliability. Responsibilities include building the technical roadmap, acting as an AI/ML expert, partnering with internal teams, defining business metrics and SLAs, implementing processes and tools, establishing data best practices, and mentoring team members. The ideal candidate will have extensive experience in software development, data structures/algorithms, testing, launching software products, software design and architecture, and ML infrastructure optimization. Experience leading ML design and working with large-scale infrastructure is essential.
Must have:
  • 8+ years software development experience
  • 5+ years testing and launching software
  • 5+ years experience with ML infrastructure
  • 5+ years leading ML design and optimization
  • Expertise in AI/ML and data analysis
Good to have:
  • Master's/PhD in relevant field
  • Technical leadership experience
  • Experience in complex, matrixed organizations
  • Data management expertise
  • GPU reliability experience
Perks:
  • Bonus
  • Equity
  • Benefits

Job Details


Minimum qualifications:

  • Bachelor's degree or equivalent practical experience.
  • 8 years of experience in software development, and with data structures/algorithms.
  • 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
  • 5 years of experience with one or more of the following: Speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement learning (e.g., sequential decision making), Machine Learning (ML) infrastructure, or specialization in another ML field.
  • 5 years of experience leading ML design and optimizing ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • 3 years of experience in a technical leadership role leading project teams and setting technical direction.
  • 3 years of experience working in a complex, matrixed organization involving cross-functional, or cross-business projects.
  • Experience in data management - data quality and data governance, Data architecture and Data modeling.
  • Experience using data to identify opportunities, mitigate risks, and take on the highest quality and reliability for GPUs.
  • Experience in delivering reliability of large scale infrastructure using data driven insights and Machine learning.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about .

Responsibilities

  • Own and build the technical road map of Google Compute Engine (GCE) fleet observability and reliability based on analysis.
  • Act as a subject matter expert in AI/ML, driving innovation in GCE observability to meet the demands of customer base.
  • Partner with internal customers, Site Reliability Engineers, product managers, and project managers to align priorities and manage staffing needs.
  • Define business metrics and Service Level Objectives, and implement processes and tools to maintain them. Establish and promote data best practices throughout GCE.
  • Coach, mentor, and support team members at all levels in their career development.

Similar Jobs

NVIDIA - Deep Learning Performance Architect

NVIDIA

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
N-iX - 2D/3D Visualization Engineer

N-iX

Colombia (Remote)
1 Week ago
Google - Staff Software Engineer, Infrastructure, Google Cloud

Google

Kirkland, Washington, United States (On-Site)
3 Months ago
Google - Software Engineer III, Full Stack, Google Cloud

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
5 Months ago
The Walt Disney Company - Lead Software Engineer - Applied AI

The Walt Disney Company

Santa Monica, California, United States (On-Site)
2 Days ago
Zoox - Senior Software Engineer - Simulaton Scenario Automation

Zoox

Foster City, California, United States (Hybrid)
6 Months ago
Google - Cloud AI Engineer, Global Services Delivery

Google

Mexico City, Mexico City, Mexico (On-Site)
4 Days ago
Microsoft - Senior Security Researcher

Microsoft

Redmond, Washington, United States (On-Site)
2 Days ago
Cornerstone OnDemand - Principal Data Scientist

Cornerstone OnDemand

Pune, Maharashtra, India (Hybrid)
7 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Google - Software Engineer (For Women in Tech Candidates)

Google

State Of Minas Gerais, Brazil (On-Site)
3 Months ago
Nextbrain - Computer Vision Engineer

Nextbrain

Bengaluru, Karnataka, India (On-Site)
5 Months ago
ByteDance - Software Engineer, Inference

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
Trendyol - Senior Data Analyst ( Data Science - Site Management)

Trendyol

İstanbul, İstanbul, Türkiye (Hybrid)
6 Months ago
Altagram Group - Data Science Internship/Workstudent

Altagram Group

Germany (On-Site)
3 Weeks ago
Google - Software Engineer II, Chrome Metrics

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
4 Days ago
ByteDance - Video Analysis and Quality Algorithm Intern 2023 Summer/Fall (MS)

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
ByteDance - Algorithm Engineer Intern (Multimedia Lab - Video Processing and Enhancement)

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
Snowed In Studios - Lead Software Developer

Snowed In Studios

Quebec, Canada (Remote)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Seattle, Washington, United States

Sinch - Senior Indirect Tax Manager

Sinch

United States (Remote)
3 Weeks ago
ByteDance - US Shopping Center Supply Strategy Operations

ByteDance

Seattle, Washington, United States (On-Site)
3 Weeks ago
Ziff Davis - Senior Full Stack Software Engineer

Ziff Davis

New York, New York, United States (Hybrid)
5 Months ago
Google - Senior Software Engineering Manager, Infrastructure, Google Cloud Security and Privacy

Google

Sunnyvale, California, United States (On-Site)
4 Days ago
undefined - C/C++ Unreal Engine Developer (SpacetimeDB)

United States (Remote)
4 Days ago
DraftKings - Director, Product Delivery Organization

DraftKings

Boston, Massachusetts, United States (On-Site)
1 Month ago
Nintendo - Marketing Translation Coordinator (French)

Nintendo

Redmond, Washington, United States (Hybrid)
4 Months ago
Rockstar Games - Senior Animation R&D Programmer: Retargeting

Rockstar Games

New York, New York, United States (On-Site)
3 Weeks ago
Dreamhaven - Senior Product Marketing Manager

Dreamhaven

Irvine, California, United States (On-Site)
3 Months ago
Bitwise Alchemy - Senior Engine Programmer

Bitwise Alchemy

Texas, United States (Remote)
9 Months ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Ubisoft - Senior ML Data Scientist

Ubisoft

Montreal, Quebec, Canada (On-Site)
3 Months ago
ByteDance - Research Scientist- Foundation Model, Generative AI

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
ByteDance - Student Researcher (Doubao (Seed) - Foundation Model - Generative AI)

ByteDance

Seattle, Washington, United States (On-Site)
1 Week ago
The Walt Disney Company - Lead Applied AI Engineer

The Walt Disney Company

Santa Monica, California, United States (On-Site)
3 Weeks ago
Microsoft - Product Management Lead

Microsoft

Redmond, Washington, United States (On-Site)
5 Days ago
Zoox - Senior Technical Program Manager, Service Readiness

Zoox

Foster City, California, United States (On-Site)
6 Months ago
Google - Senior Software Engineer, AI/ML, Google Cloud AI

Google

Kirkland, Washington, United States (On-Site)
3 Months ago
Spell Brush - LLM Engineer

Spell Brush

San Francisco, California, United States (On-Site)
3 Weeks ago
Google - Staff Software Engineer, AI/LLM, Platforms and Devices

Google

Mountain View, California, United States (On-Site)
4 Days ago
NVIDIA - Senior Solutions Architect, Networking - Cloud Service Providers

NVIDIA

California, United States (Hybrid)
2 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

Fremont, California, United States (On-Site)

Mountain View, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Dublin, County Dublin, Ireland (On-Site)

Atlanta, Georgia, United States (On-Site)

San Francisco, California, United States (On-Site)

Fremont, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug