Outscal Logooutscal logo

Software Engineer, Machine Learning Infrastructure

11 Hours ago • 4 Years + • DevOps • Artificial Intelligence

Job Summary

Job Description

Character.AI seeks a seasoned Software Engineer for Machine Learning Infrastructure. Responsibilities include providing infrastructure support for ML research and product development, building tooling for diagnosing cluster issues and hardware failures, monitoring deployments, managing experiments, and maximizing GPU allocation. The ideal candidate has 4+ years of experience supporting ML infrastructure, developing diagnostic tools, and working with cloud platforms (Compute Engine, Kubernetes, Cloud Storage) and GPUs. Experience with large GPU clusters, high-performance computing, large language model training, ML frameworks (PyTorch/TensorFlow/JAX), and GPU kernel development are highly desirable.
Must have:
  • 4+ years ML infrastructure support experience
  • Experience developing ML infrastructure diagnostic tools
  • Cloud platform experience (Compute Engine, Kubernetes, Cloud Storage)
  • GPU experience
Good to have:
  • Large GPU cluster & high-performance computing experience
  • Large language model training experience
  • ML framework experience (PyTorch/TensorFlow/JAX)
  • GPU kernel development experience

Job Details

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

  • Provide infrastructure support to our ML research and product

  • Build tooling to diagnose cluster issues and hardware failures

  • Monitor deployments, manage experiments, and generally support our research

  • Maximize GPU allocation and utilization for both serving and training

Requirements:

  • 4+ years of experience supporting the infrastructure within an ML environment

  • Experience in developing tools used to diagnose ML infrastructure problems and failures

  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)

  • Experience working with GPUs

Nice to have

  • Experience with large GPU clusters and high-performance computing/networking

  • Experience with supporting large language model training

  • Experience with ML frameworks like Pytorch/TensorFlow/JAX

  • Experience with GPU kernel development

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.


In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.


Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

Similar Jobs

ByteDance - Research Scientist, Foundation Model, Vision

ByteDance

Singapore (On-Site)
4 Months ago
ByteDance - Tech Expert - Machine Learning Infrastructure

ByteDance

Singapore (On-Site)
4 Months ago
NVIDIA - Solutions Architect, Financial Services

NVIDIA

New York, New York, United States (Remote)
2 Months ago
NVIDIA - Senior System Software Engineer - AI Performance and Efficiency Tools

NVIDIA

Santa Clara, California, United States (Hybrid)
1 Week ago
Netflix - Research Scientist (L6) - Identity Algorithms

Netflix

Los Gatos, California, United States (On-Site)
4 Months ago
Xsolla - Director of Development (Dev Director)

Xsolla

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (On-Site)
2 Months ago
ByteDance - Software Engineer, SRE - Platform Services

ByteDance

Seattle, Washington, United States (On-Site)
21 Hours ago
Britive - ENGINEERING MANAGER

Britive

Bengaluru, Karnataka, India (Remote)
4 Months ago
Fortis Games - Senior Cloud Security Engineer

Fortis Games

Romania (On-Site)
2 Months ago
IO Interactive - Lead Online Programmer

IO Interactive

Malmö, Skåne County, Sweden (Hybrid)
11 Hours ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

NVIDIA - AI Algorithms Software Engineer (RDSS Intern)

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
2 Months ago
Avathon - Data Scientist

Avathon

Bengaluru, Karnataka, India (On-Site)
5 Months ago
ByteDance - Backend Engineer (Model Inference), Machine Learning Systems

ByteDance

Singapore (On-Site)
4 Months ago
Vigaet - Internship -AI Agents

Vigaet

(Remote)
4 Months ago
Amazon Games - Senior Software Engineer, Amazon Games AI Research

Amazon Games

San Diego, California, United States (On-Site)
3 Months ago
NVIDIA - Senior AI-HPC Storage Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
ByteDance - Machine Learning Engineer Intern (Applied Machine Learning-Algorithm) - 2025 Summer/Fall (PhD)

ByteDance

San Jose, California, United States (On-Site)
4 Months ago
Zazz - Artificial Intelligence Engineer

Zazz

(Remote)
1 Month ago
Rackspace Technology - Principal MLOps Engineer

Rackspace Technology

(Remote)
6 Days ago
ByteDance - Software Engineer Large Model System Graduate (Machine Learning Sys-US) - 2024 Start (BS/MS)

ByteDance

Seattle, Washington, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in New York, New York, United States

ByteDance - Software Development Engineer (SDN Traffic Intelligence & Control)

ByteDance

Seattle, Washington, United States (On-Site)
1 Day ago
Onward Search - Business Development Specialist (Real Estate)

Onward Search

Norfolk, Virginia, United States (On-Site)
4 Months ago
The Walt Disney Company - Disney Culinary Program Alumni 2025

The Walt Disney Company

Florida, United States (On-Site)
1 Month ago
Google - Open Career Opportunities, Verily Life Sciences

Google

South San Francisco, California, United States (On-Site)
4 Months ago
ByteDance - Procurement Manager - Professional Services, AMS

ByteDance

Los Angeles, California, United States (On-Site)
1 Month ago
Inkittt - Product Analyst

Inkittt

San Francisco, California, United States (Hybrid)
1 Month ago
Probably Monsters - Systems Administrator

Probably Monsters

Texas, United States (On-Site)
1 Month ago
Twitch - Senior Product Manager - Ads

Twitch

Seattle, Washington, United States (Remote)
6 Months ago
ByteDance - Product Manager - Legal - Information Systems - San Jose

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
WebFX - Jr. Inside Sales Strategist

WebFX

Harrisburg, Pennsylvania, United States (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Warner Bros Games - Software Engineer II - Observability - AWS

Warner Bros Games

Bengaluru, Karnataka, India (Hybrid)
3 Weeks ago
NVIDIA - Senior AI-HPC Storage Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
Nagarro - Senior Engineer, DevOps

Nagarro

India (Remote)
5 Months ago
Patreon - Site Reliability Engineer

Patreon

United States (Remote)
11 Hours ago
The Walt Disney Company - Lead Software Engineer (Identity)

The Walt Disney Company

New York, New York, United States (On-Site)
4 Months ago
N-iX - Senior DevOps Engineer

N-iX

India (Remote)
3 Weeks ago
Codeway - DevOps Engineer (Mid/Sr)

Codeway

İstanbul, Türkiye (On-Site)
2 Months ago
ION - Cloud Engineer Kubernetes

ION

Milan, Lombardy, Italy (Hybrid)
5 Months ago
ByteDance - Production System Engineer, Infrastructure Engineering

ByteDance

Singapore (On-Site)
4 Months ago
Next Level Business Services - DevOps Engineer

Next Level Business Services

Redmond, Washington, United States (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

New York, New York, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

New York, New York, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug