Software Engineer, Machine Learning Infrastructure

1 Hour ago • 4 Years + • DevOps • Artificial Intelligence

Job Summary

Job Description

Character.AI seeks a seasoned Software Engineer specializing in Machine Learning Infrastructure. Responsibilities include providing infrastructure support for ML research and product development, building diagnostic tools for cluster issues and hardware failures, monitoring deployments and experiments, and maximizing GPU utilization for training and serving. The ideal candidate possesses 4+ years of experience supporting ML infrastructure, developing diagnostic tools, and working with cloud platforms like Compute Engine, Kubernetes, and Cloud Storage. Experience with GPUs is a must.
Must have:
  • 4+ years supporting ML infrastructure
  • Develop diagnostic tools for ML infrastructure
  • Experience with cloud platforms (Compute Engine, Kubernetes, Cloud Storage)
  • GPU experience
Good to have:
  • Large GPU clusters and HPC/networking
  • LLM training support
  • ML frameworks (PyTorch/TensorFlow/JAX)
  • GPU kernel development

Job Details

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

  • Provide infrastructure support to our ML research and product

  • Build tooling to diagnose cluster issues and hardware failures

  • Monitor deployments, manage experiments, and generally support our research

  • Maximize GPU allocation and utilization for both serving and training

Requirements:

  • 4+ years of experience supporting the infrastructure within an ML environment

  • Experience in developing tools used to diagnose ML infrastructure problems and failures

  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)

  • Experience working with GPUs

Nice to have

  • Experience with large GPU clusters and high-performance computing/networking

  • Experience with supporting large language model training

  • Experience with ML frameworks like Pytorch/TensorFlow/JAX

  • Experience with GPU kernel development

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.


In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.


Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

Similar Jobs

The Walt Disney Company - Sr Machine Learning Engineer

The Walt Disney Company

Los Angeles, California, United States (On-Site)
4 Months ago
ByteDance - Engineering Manager Machine Learning Infrastructure

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
ByteDance - Video Analysis and Quality Algorithm Intern 2023 Summer/Fall (PHD)

ByteDance

San Diego, California, United States (On-Site)
5 Months ago
ByteDance - Machine Learning Engineer - Model Serving Infrastructure

ByteDance

Seattle, Washington, United States (On-Site)
3 Weeks ago
Virtuos - Machine Learning Engineer

Virtuos

Singapore (On-Site)
3 Weeks ago
Nintendo - Senior Manager, Engineering Infrastructure and IT

Nintendo

Redmond, Washington, United States (On-Site)
4 Months ago
Google - Cloud Platform Manager, Professional Services

Google

Mexico City, Mexico City, Mexico (On-Site)
11 Hours ago
Revolgy - L2 Cloud Operations Engineer

Revolgy

Georgia, United States (Remote)
3 Weeks ago
Kefir Games - Build Engineer

Kefir Games

Cyprus (On-Site)
5 Months ago
Revolgy - L1 Cloud Associate

Revolgy

United Kingdom (Remote)
1 Week ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Research Scientist - AI Security

ByteDance

San Jose, California, United States (On-Site)
2 Days ago
ByteDance - Tech Lead Manager - Code AI

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
ByteDance - Software Engineer in Large Model System Graduate (Machine Learning Sys-US) - 2024 Start (BS/MS)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
Canva - Senior Computer Vision Engineer - Photo AI

Canva

Vienna, Vienna, Austria (Remote)
2 Months ago
NVIDIA - Senior Software Engineer - Distributed Inference

NVIDIA

California, United States (Remote)
1 Month ago
Hedra - Machine Learning Engineer (CUDA)

Hedra

New York, New York, United States (On-Site)
3 Weeks ago
ByteDance - Research Scientist in Foundation Model (Speech & Audio Generation) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
SmileGate - Game Data Engineer [LOST ARK]

SmileGate

Seongnam-si, Gyeonggi-do, South Korea (On-Site)
3 Months ago
Applike Group - Senior Data Scientist (Recommendation Systems Expert) (f/m/d)

Applike Group

Hamburg, Hamburg, Germany (Hybrid)
6 Months ago
Canva - Machine Learning Engineer Lead - User Voice

Canva

Auckland, Auckland, New Zealand (Remote)
1 Week ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

Rivos - Silicon Logic Formal Verification - Full Time

Rivos

Portland, Oregon, United States (Hybrid)
6 Months ago
Bad Robot Games - Online Engineer

Bad Robot Games

California, United States (Remote)
2 Days ago
PlayStation Global - Senior Application Security Engineer

PlayStation Global

United States (Remote)
2 Months ago
Axon - Senior Privacy Engineer

Axon

Scottsdale, Arizona, United States (Hybrid)
4 Months ago
Maximum Games - Head of Publishing

Maximum Games

Walnut Creek, California, United States (On-Site)
3 Months ago
Next Level Business Services - Java/C++ Developer

Next Level Business Services

Sunnyvale, California, United States (On-Site)
5 Months ago
prizepicks - Director, Gaming Regulatory Compliance

prizepicks

Atlanta, Georgia, United States (Remote)
4 Weeks ago
Samsung Semiconductor - Senior Engineer, Modeling (Optical Proximity Correction) Software Engineer

Samsung Semiconductor

San Jose, California, United States (Hybrid)
4 Weeks ago
Google - ASIC Design Verification Engineer

Google

Madison, Wisconsin, United States (On-Site)
14 Hours ago
ION - Sales and Account Manager - 7992

ION

Chicago, Illinois, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Info Stretch - Lead Data Engineer

Info Stretch

Pune, Maharashtra, India (On-Site)
5 Months ago
N-iX - Solution Architect (Spanish Speaking)

N-iX

Poland (Remote)
3 Weeks ago
Escape Velocity Entertainment - Site Reliability Engineer

Escape Velocity Entertainment

(Remote)
3 Weeks ago
ByteDance - Cloud Site Reliability Engineer

ByteDance

Seattle, Washington, United States (On-Site)
2 Days ago
Google - Staff Software Engineer, Site Reliability Engineering

Google

Pittsburgh, Pennsylvania, United States (On-Site)
11 Hours ago
Quizizz - Software Engineer - Infrastructure

Quizizz

Bengaluru, Karnataka, India (On-Site)
3 Weeks ago
Omnissa - Staff Engineer (C++ Linux)

Omnissa

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Egnyte - Sr Software Engineer - Java

Egnyte

Poznań, Greater Poland Voivodeship, Poland (On-Site)
4 Months ago
WorldWinner - Senior DevOps Engineer

WorldWinner

(Remote)
2 Months ago
Ubisoft - Application Specialist

Ubisoft

Bucharest, Bucharest, Romania (On-Site)
4 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

San Francisco, California, United States (On-Site)

Palo Alto, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Menlo Park, California, United States (Remote)

San Francisco, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug