Staff Software Engineer, Machine Learning Runtime Engines

1 Hour ago • 8-13 Years • Artificial Intelligence • $197,000 PA - $291,000 PA

Job Summary

Job Description

This Staff Software Engineering role focuses on designing the next generation of Machine Learning (ML) accelerator runtimes. Responsibilities include managing the ML runtime engines strategy, supporting Google’s ML ecosystem across firmware, hardware, and tools, migrating existing frameworks (TensorFlow, JAX, PyTorch) runtimes to GPUs, managing open-source components, and mentoring junior engineers. The ideal candidate will have extensive experience in software development, ML/AI algorithms, building large-scale ML systems, and experience with JAX, PyTorch, or TensorFlow. The role involves addressing user needs, minimizing use-case specific code, and ensuring highly performant execution across accelerator hardware for various models, including LLMs and LEMs.
Must have:
  • 8+ years software development experience
  • 5+ years ML/AI algorithm & tool experience
  • 5+ years building large-scale ML systems
  • 3+ years experience with JAX, PyTorch, or TensorFlow
  • Experience with data structures/algorithms
  • Manage ML runtime strategy
Good to have:
  • Master's or PhD in related field
  • ML & HPC experience
  • Framework/runtime experience
  • ML community involvement
  • Concurrent/parallel computation debugging
  • Debugging at all stack levels
Perks:
  • Bonus
  • Equity
  • Benefits

Job Details

Minimum qualifications:

  • Bachelor's degree or equivalent practical experience.
  • 8 years of experience in software development, and with data structures/algorithms.
  • 5 years of experience with developing Machine Learning (ML)/Artificial Intelligence (AI) algorithms and tools.
  • 5 years of experience building and architecting large-scale, production quality ML systems.
  • 3 years of experience in development with JAX, PyTorch, or TensorFlow.

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • Experience with Machine Learning and High Performance Computing (HPC).
  • Experience in framework and runtime.
  • Experience in the Machine Learning (ML) community through publications, open-source contributions, or conference participation.
  • Ability to debug and program concurrent/parallel computations.
  • Ability to debug correctness and performance issues at all levels of the stack.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

With your technical expertise you will manage project priorities, deadlines, and deliverables. You will design, develop, test, deploy, maintain, and enhance software solutions.

In this role, you will be responsible for designing the new generation of Machine Learning (ML) accelerator runtimes, how they interact with ML frameworks and compilers to deliver highly performant execution across accelerator hardware for the cutting edge Large Language Models (LLMs), Large Embedding Model (LEM), Language model (LM) and non LLM models used by all of Google.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about .

Responsibilities

  • Manage the ML runtime engines strategy towards solving the greater ML infrastructure problems and addressing users needs.
  • Support Google’s ML ecosystem needs across firmware, hardware, tools. This will minimize use-case specific code paths and turn down investments in maintenance of such fragmented efforts in the current stack.
  • Migrate existing frameworks (TensorFlow, JAX, PyTorch) runtimes (TFExecutor, TensorFlow Runtime (TFRT), PJRT, JetStream) and custom workflows from Tensor Processing Unit (TPU) to Graphics Processing Unit (GPU) and minimizing any user disruption.
  • Manage open-source components to enable integration with frameworks, compilers in Operational Support Systems (OSS).
  • Mentor and grow junior engineers, and provide career development guidance and advice.

Similar Jobs

Electronic Arts - Senior Software Engineer

Electronic Arts

Orlando, Florida, United States (On-Site)
2 Weeks ago
NVIDIA - Senior Research Engineer for Reinforcement Learning

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
NVIDIA - Senior Software Engineer, Deep Learning Inference, TensorRT

NVIDIA

Santa Clara, California, United States (Hybrid)
1 Month ago
Ubisoft - Scientifique principal en données ML _ Groupe Technologique Content Creation

Ubisoft

Montreal, Quebec, Canada (On-Site)
3 Months ago
ByteDance - Research Scientist in Foundation Model, Music Core Machine Learning Graduates - 2024 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
NVIDIA - Senior Software Engineer, AI Resiliency

NVIDIA

Redmond, Washington, United States (On-Site)
1 Month ago
Sony Interactive Entertainment - Learning and Development Specialist (AI Talent Development & Training Program Lead)

Sony Interactive Entertainment

Tokyo, Japan (On-Site)
1 Week ago
Resemble AI - Deep Learning Speech Researcher

Resemble AI

Mountain View, California, United States (On-Site)
8 Months ago
Inworld AI - AI Trainer (Contractor) - Writing & Gaming

Inworld AI

Mountain View, California, United States (Remote)
1 Month ago
CloudHire - ML Engineer

CloudHire

Telangana, India (Remote)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Canva - Senior Computer Vision Engineer - Photo AI

Canva

Vienna, Vienna, Austria (Remote)
2 Weeks ago
Ciklum - Senior Data Scientist

Ciklum

Chennai, Tamil Nadu, India (Hybrid)
5 Months ago
Rackspace Technology - AI/ML Architect

Rackspace Technology

Vietnam (Remote)
3 Weeks ago
NVIDIA - Solutions Architect, AI and ML

NVIDIA

Redmond, Washington, United States (On-Site)
2 Weeks ago
NVIDIA - Senior Site Reliability Engineer - AI Research Clusters

NVIDIA

Austin, Texas, United States (Hybrid)
1 Month ago
Truecaller - Senior MLOps Engineer

Truecaller

Stockholm, Stockholm County, Sweden (On-Site)
5 Months ago
ByteDance - Research Scientist- Applied Machine learning Graduates (AML) - 2024 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Google - Customer Engineer, AI Infrastructure

Google

Seattle, Washington, United States (On-Site)
1 Hour ago
ION - Data Engineer, Italy

ION

Italy (Hybrid)
6 Months ago
ByteDance - Research Scientist, Foundation Model, Speech Understanding

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Sunnyvale, California, United States

Glean - Product Management Lead, Verticals

Glean

Palo Alto, California, United States (On-Site)
5 Months ago
Samsung Semiconductor - Staff Software Engineer – Platform

Samsung Semiconductor

San Jose, California, United States (Hybrid)
4 Weeks ago
Next Level Business Services - Senior Developer

Next Level Business Services

Bethpage, New York, United States (On-Site)
6 Months ago
Go Fund Me - Staff Data Scientist

Go Fund Me

San Francisco, California, United States (Hybrid)
5 Months ago
ByteDance - Software Engineer in Machine Learning Systems

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
PlayStation Global - Global Entertainment Services Manager (Latin America)

PlayStation Global

Aliso Viejo, California, United States (On-Site)
1 Month ago
Super - Senior Full-Stack Software Engineer ( Remote! )

Super

Los Angeles, California, United States (Remote)
5 Months ago
Google - Software Engineer III, Google Cloud Security and Privacy

Google

Sunnyvale, California, United States (On-Site)
5 Months ago
NVIDIA - Director, Lease Accounting

NVIDIA

Santa Clara, California, United States (On-Site)
1 Week ago
ByteDance - Backend Software Engineer - Security Engineering

ByteDance

San Jose, California, United States (On-Site)
2 Days ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Meta - Research Scientist Intern, Machine Perception for Input and Interaction (PhD)

Meta

Seattle, Washington, United States (On-Site)
5 Months ago
NVIDIA - Solutions Architect, AI and ML

NVIDIA

Redmond, Washington, United States (On-Site)
2 Weeks ago
ByteDance - Research Scientist - AI Security

ByteDance

San Jose, California, United States (On-Site)
2 Days ago
Ubisoft - Senior ML Data Scientist

Ubisoft

Montreal, Quebec, Canada (On-Site)
3 Weeks ago
Microsoft - Technical Product Manager, AI Multimodal

Microsoft

London, England, United Kingdom (On-Site)
1 Day ago
Ubisoft - Senior ML Programmer

Ubisoft

Montreal, Quebec, Canada (On-Site)
3 Months ago
NVIDIA - AI Algorithms Software Engineer (RDSS Intern)

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
2 Months ago
The Walt Disney Company - Principal Machine Learning Engineer

The Walt Disney Company

Santa Monica, California, United States (On-Site)
2 Months ago
Interface AI - Vice President of Engineering

Interface AI

United States (Remote)
2 Months ago
Google - Software Engineer, Early Career

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Hour ago

Get notifed when new similar jobs are uploaded

About The Company

A problem isn't truly solved until it's solved for all. Googlers build products that help create opportunities for everyone, whether down the street or across the globe. Bring your insight, imagination and a healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

Portland, Oregon, United States (On-Site)

Mountain View, California, United States (On-Site)

Mountain View, California, United States (On-Site)

Taipei City, Taiwan (On-Site)

Atlanta, Georgia, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

View All Jobs

Get notified when new jobs are added by Google

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug