Software Engineer, Machine Learning Infrastructure

1 Month ago • 4 Years + • DevOps • Artificial Intelligence

Job Summary

Job Description

Character.AI seeks a seasoned Software Engineer for Machine Learning Infrastructure. Responsibilities include providing infrastructure support for ML research and product development; building tools to diagnose cluster issues and hardware failures; monitoring deployments and managing experiments; and maximizing GPU allocation and utilization for training and serving. The ideal candidate has 4+ years of experience supporting ML infrastructure, developing diagnostic tools, and working with cloud platforms (Compute Engine, Kubernetes, Cloud Storage) and GPUs. Experience with large GPU clusters, high-performance computing, large language model training, and ML frameworks (PyTorch/TensorFlow/JAX) is a plus.
Must have:
  • 4+ years ML infrastructure support experience
  • Experience developing ML infrastructure diagnostic tools
  • Cloud platform (Compute Engine, Kubernetes, Cloud Storage) experience
  • GPU experience
Good to have:
  • Large GPU cluster and high-performance computing experience
  • Large language model training experience
  • Experience with Pytorch/TensorFlow/JAX
  • GPU kernel development experience

Job Details

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

  • Provide infrastructure support to our ML research and product

  • Build tooling to diagnose cluster issues and hardware failures

  • Monitor deployments, manage experiments, and generally support our research

  • Maximize GPU allocation and utilization for both serving and training

Requirements:

  • 4+ years of experience supporting the infrastructure within an ML environment

  • Experience in developing tools used to diagnose ML infrastructure problems and failures

  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)

  • Experience working with GPUs

Nice to have

  • Experience with large GPU clusters and high-performance computing/networking

  • Experience with supporting large language model training

  • Experience with ML frameworks like Pytorch/TensorFlow/JAX

  • Experience with GPU kernel development

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.


In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.


Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

Similar Jobs

ByteDance - Software Engineer Intern (Doubao (Seed) - Machine Learning System) - 2025 Summer (MS)

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
ByteDance - Algorithm Engineer - Audio Understanding

ByteDance

Singapore (On-Site)
5 Months ago
SiftHub - Senior Software Engineer (Backend)

SiftHub

Mumbai, Maharashtra, India (On-Site)
6 Months ago
Meta - Research Scientist, Computer Vision for Generative AI (PhD)

Meta

New York, New York, United States (On-Site)
5 Months ago
ByteDance - Research Scientist in Large Model System

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
ASSIST Software - Azure DevOps Engineer

ASSIST Software

Suceava, Suceava County, Romania (Remote)
5 Months ago
Ajmera Infotech - Senior ASP.NET Developer with Azure Expertise

Ajmera Infotech

Hyderabad, Telangana, India (On-Site)
4 Months ago
Garena - Senior/Expert Site Reliability Engineer (SRE)

Garena

Singapore (On-Site)
3 Months ago
Warner Bros Games - Staff Software Engineer - AWS Architecture (Observability Team)

Warner Bros Games

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
Balbix - Staff /Sr Staff/ Principal Engineer - Lakehouse

Balbix

Gurugram, Haryana, India (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - AI Security Researcher - Security Flow

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
NVIDIA - Senior AI-HPC Storage Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
3 Months ago
Every matrix - Experienced CRM Data Scientist

Every matrix

United Kingdom (Hybrid)
6 Months ago
NVIDIA - Senior Software Engineer - Robot Learning Platform

NVIDIA

Toronto, Ontario, Canada (On-Site)
1 Month ago
Krafton  - [Deep Learning Div.] Deep Learning Engineer - ML (1년 ~ 3년)

Krafton

Seoul, South Korea (On-Site)
4 Months ago
Altagram Group - Data Science Internship/Work Student

Altagram Group

Germany (On-Site)
2 Months ago
The Walt Disney Company - Principal Machine Learning Engineer

The Walt Disney Company

San Francisco, California, United States (On-Site)
4 Months ago
ByteDance - Research Scientist- Applied Machine learning Graduates (AML) - 2024 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in New York, New York, United States

Global Step - Senior Director of Digital Marketing and Operations

Global Step

Texas, United States (Hybrid)
2 Months ago
Nintendo - Bilingual Artwork Project Specialist (Japanese)

Nintendo

Redmond, Washington, United States (Hybrid)
7 Months ago
Activate Games - Game Facilitator (Store Associate)

Activate Games

Aurora, Colorado, United States (On-Site)
1 Month ago
Scopely - Lead Game Designer - Monopoly GO!

Scopely

California, United States (Remote)
1 Month ago
Skillz - Senior Software Engineer (Mobile SDK)

Skillz

San Mateo, California, United States (On-Site)
3 Months ago
Onward Search - Sales Development Representative

Onward Search

Dallas, Texas, United States (On-Site)
3 Months ago
Rockstar Games - Director, Security Operations

Rockstar Games

New York, New York, United States (On-Site)
6 Months ago
The Walt Disney Company - Animal Keeper - Small Mammal / Ectotherm (Seasonal)

The Walt Disney Company

Lake Buena Vista, Florida, United States (On-Site)
2 Months ago
Company3 Method Studios - Senior VFX Coordinator (Temporary)

Company3 Method Studios

Santa Monica, California, United States (On-Site)
1 Month ago
Rockstar Games - Senior Data Engineer

Rockstar Games

New York, New York, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Applike Group - Senior DevOps Engineer  (f/m/d) 🚀

Applike Group

Hamburg, Hamburg, Germany (Hybrid)
6 Months ago
Hedra - Machine Learning Engineer

Hedra

San Francisco, California, United States (On-Site)
1 Month ago
CData Software - Platform Engineer

CData Software

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Hitachi - Cloud Solutions Architect

Hitachi

San José, San José Province, Costa Rica (Remote)
6 Months ago
SparkCognition - DevOps Engineer

SparkCognition

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Ubisoft - Linux DevOps System Administrator

Ubisoft

Montreal, Quebec, Canada (On-Site)
2 Months ago
Dream Sports - Architect - Cloud Security

Dream Sports

Mumbai, Maharashtra, India (On-Site)
8 Months ago
VGW - Infrastructure Engineer

VGW

Perth, Western Australia, Australia (On-Site)
1 Month ago
Skan AI - Release Manager

Skan AI

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
IGT - Systems Engineer

IGT

Alaska, United States (Remote)
5 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

New York, New York, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Palo Alto, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Menlo Park, California, United States (Remote)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug