Software Engineer, Machine Learning Infrastructure

2 Weeks ago • 4 Years + • DevOps • Artificial Intelligence

Job Summary

Job Description

Character.AI seeks a seasoned Software Engineer specializing in Machine Learning Infrastructure. Responsibilities include providing infrastructure support for ML research and product development, building diagnostic tools for cluster issues and hardware failures, monitoring deployments and experiments, and maximizing GPU utilization for training and serving. The ideal candidate possesses 4+ years of experience supporting ML infrastructure, developing diagnostic tools, and working with cloud platforms like Compute Engine, Kubernetes, and Cloud Storage. Experience with GPUs is a must.
Must have:
  • 4+ years supporting ML infrastructure
  • Develop diagnostic tools for ML infrastructure
  • Experience with cloud platforms (Compute Engine, Kubernetes, Cloud Storage)
  • GPU experience
Good to have:
  • Large GPU clusters and HPC/networking
  • LLM training support
  • ML frameworks (PyTorch/TensorFlow/JAX)
  • GPU kernel development

Job Details

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

  • Provide infrastructure support to our ML research and product

  • Build tooling to diagnose cluster issues and hardware failures

  • Monitor deployments, manage experiments, and generally support our research

  • Maximize GPU allocation and utilization for both serving and training

Requirements:

  • 4+ years of experience supporting the infrastructure within an ML environment

  • Experience in developing tools used to diagnose ML infrastructure problems and failures

  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)

  • Experience working with GPUs

Nice to have

  • Experience with large GPU clusters and high-performance computing/networking

  • Experience with supporting large language model training

  • Experience with ML frameworks like Pytorch/TensorFlow/JAX

  • Experience with GPU kernel development

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.


In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.


Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

Similar Jobs

Google - App Development Customer Engineer, Global Partnerships Sellside

Google

Dublin, County Dublin, Ireland (On-Site)
2 Weeks ago
Canva - Machine Learning Engineer - Ecosystem Experiences

Canva

Surry Hills, New South Wales, Australia (Remote)
4 Weeks ago
Google - Software Engineer III, Machine Learning, Google Cloud Compute Infrastructure

Google

Seattle, Washington, United States (On-Site)
2 Days ago
WebFX - Full Stack JavaScript Developer (Remote PH)

WebFX

Philippines (Remote)
6 Months ago
Canva - Senior Computer Vision Engineer - Photo AI

Canva

Prague, Czechia (Remote)
3 Months ago
ByteDance - Linux System Engineer

ByteDance

London, England, United Kingdom (On-Site)
3 Months ago
Brillio - Azure DB Architect - Migration - R01531206

Brillio

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Epic Games - Senior DevOps Programmer

Epic Games

United States (On-Site)
2 Months ago
Omnissa - C++ Windows Internals Dev_MTS2/3 (2-7 Yrs)_Horizon Team

Omnissa

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
Google - Principal Architect, State, Local, and Education, Public Sector

Google

California, United States (On-Site)
2 Days ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Canva - Senior Machine Learning Engineer - Photo AI

Canva

Vienna, Vienna, Austria (Remote)
3 Months ago
ByteDance - Engineering Manager Machine Learning Infrastructure

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Western Digital - Data Scientist

Western Digital

Prachin Buri, Thailand (On-Site)
4 Weeks ago
Jane Street - Machine Learning Engineer

Jane Street

Hong Kong, Hong Kong (On-Site)
7 Hours ago
Ubisoft - Senior ML Data Scientist

Ubisoft

Montreal, Quebec, Canada (On-Site)
3 Months ago
ByteDance - Software Engineer Intern (Doubao (Seed) - Machine Learning System) - 2025 Summer (PhD)

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Ubisoft - Senior ML Ops - Content Creation Technology Group

Ubisoft

Montreal, Quebec, Canada (On-Site)
3 Months ago
Canva - Senior Computer Vision Engineer - Photo AI

Canva

Vienna, Vienna, Austria (Remote)
4 Weeks ago
Games talent (Staffing and recruiting) - Senior Data Engineer

Games talent (Staffing and recruiting)

(Remote)
23 Hours ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

Inkittt - Director of Engineering

Inkittt

San Francisco, California, United States (Hybrid)
3 Months ago
ION - Senior Technical Consultant - Endur

ION

Uniondale, New York, United States (On-Site)
6 Months ago
Next Level Business Services - Sr. UX Architect

Next Level Business Services

Fort Worth, Texas, United States (On-Site)
6 Months ago
Onward Search - Art Director

Onward Search

New Jersey, United States (On-Site)
2 Weeks ago
Ramboll3 - Senior Project Engineer, Civil (Data Center)

Ramboll3

Albany, New York, United States (Hybrid)
2 Weeks ago
Zinnia - Senior Director, Client Partner

Zinnia

Bridgewater, New Jersey, United States (Hybrid)
1 Day ago
Patel greene - Drainage Engineer Intern

Patel greene

Orlando, Florida, United States (On-Site)
7 Hours ago
PENN Interactive - Senior Manager, Product Intelligence, AI/ML & Data Solutions

PENN Interactive

Philadelphia, Pennsylvania, United States (Hybrid)
1 Month ago
ByteDance - Senior Backend Software Engineer

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
Christ Fellowship - CFStudents Coordinator (Development)

Christ Fellowship

Palm Beach Gardens, Florida, United States (On-Site)
8 Hours ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

ION - Site Reliability Engineer

ION

Collecchio, Emilia-Romagna, Italy (Hybrid)
6 Months ago
PENN Interactive - Engineering Manager, ML Platform

PENN Interactive

Philadelphia, Pennsylvania, United States (Hybrid)
2 Months ago
Velotio Technologies - Cloud Security Engineer

Velotio Technologies

Maharashtra, India (Remote)
1 Month ago
Microsoft - Technical Support Engineer - Azure Billing and Subscription

Microsoft

Lisbon, Lisbon, Portugal (Hybrid)
2 Weeks ago
ByteDance - Senior Software Engineer, Cloud Infrastructure

ByteDance

San Jose, California, United States (On-Site)
4 Months ago
Google - Customer Engineer, Google Cloud

Google

Wellington, Wellington, New Zealand (On-Site)
1 Week ago
Offworld - DevOps Engineer

Offworld

New Westminster, British Columbia, Canada (On-Site)
2 Months ago
Google - Program Manager, Google Distributed Cloud

Google

Wrocław, Lower Silesian Voivodeship, Poland (On-Site)
1 Week ago
Google - Senior Product Manager, Cloud Networking

Google

Sunnyvale, California, United States (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

New York, New York, United States (On-Site)

San Francisco, California, United States (On-Site)

Palo Alto, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Menlo Park, California, United States (Remote)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug