Software Engineer, Machine Learning Infrastructure

3 Months ago • 4 Years + • Devops

Job Summary

Job Description

Character.AI seeks a seasoned Software Engineer specializing in Machine Learning Infrastructure. Responsibilities include providing infrastructure support for ML research and product development, building diagnostic tools for cluster issues and hardware failures, monitoring deployments and experiments, and maximizing GPU utilization for training and serving. The ideal candidate possesses 4+ years of experience supporting ML infrastructure, developing diagnostic tools, and working with cloud platforms like Compute Engine, Kubernetes, and Cloud Storage. Experience with GPUs is a must.
Must have:
  • 4+ years supporting ML infrastructure
  • Develop diagnostic tools for ML infrastructure
  • Experience with cloud platforms (Compute Engine, Kubernetes, Cloud Storage)
  • GPU experience
Good to have:
  • Large GPU clusters and HPC/networking
  • LLM training support
  • ML frameworks (PyTorch/TensorFlow/JAX)
  • GPU kernel development

Job Details

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

  • Provide infrastructure support to our ML research and product

  • Build tooling to diagnose cluster issues and hardware failures

  • Monitor deployments, manage experiments, and generally support our research

  • Maximize GPU allocation and utilization for both serving and training

Requirements:

  • 4+ years of experience supporting the infrastructure within an ML environment

  • Experience in developing tools used to diagnose ML infrastructure problems and failures

  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)

  • Experience working with GPUs

Nice to have

  • Experience with large GPU clusters and high-performance computing/networking

  • Experience with supporting large language model training

  • Experience with ML frameworks like Pytorch/TensorFlow/JAX

  • Experience with GPU kernel development

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.


In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.


Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

Similar Jobs

Hawkeye Innovations - KinaTrax Systems Operator - Baseball Tech

Hawkeye Innovations

Atlanta, Georgia, United States (On-Site)
4 Months ago
fluence - Senior Network Monitoring Engineer

fluence

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
Zones - Lead Application Support Engineer

Zones

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Trackman - Customer Service Agent

Trackman

Bogotá, Bogota, Colombia (On-Site)
1 Week ago
Ion - Private Credit Reporter

Ion

London, England, United Kingdom (On-Site)
1 Year ago
Survay Monkey - Senior Cloud Engineer

Survay Monkey

Bengaluru, Karnataka, India (Hybrid)
1 Month ago
Tencent - Senior Cloud Solution Architect

Tencent

California, United States (On-Site)
4 Months ago
Unisys - Sr Cloud Engineer (AWS and DevOps)

Unisys

Richmond, Virginia, United States (On-Site)
2 Months ago
luxsoft - Solution Architect

luxsoft

Germany (Remote)
1 Month ago
Toast - Senior Full Stack Software Engineer - Communication Platform

Toast

Dublin, County Dublin, Ireland (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Motorola solutions - Channel Sales Executive - Video Security & Access Control

Motorola solutions

Chicago, Illinois, United States (On-Site)
1 Week ago
Google - Software Engineer, Fuchsia, Devices and Experiences

Google

New Taipei City, Taiwan (On-Site)
3 Weeks ago
Addepar - IT Support Specialist

Addepar

New York, United States (On-Site)
3 Weeks ago
Veeam Software - Enterprise Architect

Veeam Software

Saudi Arabia (Remote)
2 Months ago
bytedance - Site Reliability Engineer, Edge Services (Seattle)

bytedance

Seattle, Washington, United States (On-Site)
1 Week ago
Moon Active - IT Support Specialist

Moon Active

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Week ago
Sprinkler - Senior Implementation Consultant – Voice & Telephony

Sprinkler

Austin, Texas, United States (Remote)
1 Week ago
Supabase - Platform Engineer: Edge & Networking

Supabase

(Remote)
1 Month ago
Gigamon - Senior Hardware Engineer

Gigamon

Chennai, Tamil Nadu, India (On-Site)
5 Months ago
Suki - Senior Manager of Technical Field Support

Suki

Redwood City, California, United States (Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

Next Level Business Services - Info Lease Developer

Next Level Business Services

Minneapolis, Minnesota, United States (On-Site)
9 Months ago
Coda - Senior Software Engineering Manager

Coda

California, United States (Hybrid)
10 Months ago
Ten4 - Senior Gaming Animator

Ten4

Redwood City, California, United States (On-Site)
8 Years ago
nubank - Senior/Staff Data Scientist

nubank

United States (Remote)
2 Weeks ago
Exploding Kittens - Associate Production Manager

Exploding Kittens

Los Angeles, California, United States (Hybrid)
1 Month ago
Aerovect - Software Engineer, Platform

Aerovect

United States (Remote)
3 Weeks ago
Shield AI - Staff Technical Writer (R3567)

Shield AI

San Diego, California, United States (On-Site)
6 Days ago
imerza - Environment Artist

imerza

United States (Remote)
2 Months ago
ChainGuard - Enterprise Account Executive

ChainGuard

Minnesota, United States (Remote)
2 Weeks ago
Nice - Technical Account Manager

Nice

United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Google - Senior Software Engineer, Google Cloud

Google

Pune, Maharashtra, India (On-Site)
8 Months ago
Domo - DevOps Engineer - India

Domo

Pune, Maharashtra, India (Hybrid)
6 Days ago
CD PROJEKT RED - Senior DevOps Engineer

CD PROJEKT RED

Warsaw, Masovian Voivodeship, Poland (On-Site)
2 Months ago
Spaulding Ridge - Anaplan Solution Architect

Spaulding Ridge

Chicago, Illinois, United States (On-Site)
2 Months ago
CyberArk - Staff Site Reliability Engineer

CyberArk

Santa Clara, California, United States (Hybrid)
2 Months ago
Coupa - Senior Salesforce Solution Architect

Coupa

Mexico City, Mexico (Remote)
3 Months ago
AeroSpike - Staff Site Reliability Engineer

AeroSpike

United States (Remote)
1 Month ago
FICO - CCS DevOps - Engineer II

FICO

Guadalajara, Jalisco, Mexico (Remote)
2 Months ago
DevRev - Partner Solutions Engineer

DevRev

Bengaluru, Karnataka, India (On-Site)
3 Weeks ago
Monolith - Cloud Playout Systems Engineer

Monolith

Sterling, Virginia, United States (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

Redwood City, California, United States (Hybrid)

Redwood City, California, United States (On-Site)

Redwood City, California, United States (Hybrid)

Redwood City, California, United States (On-Site)

Redwood City, California, United States (On-Site)

New York, New York, United States (On-Site)

San Francisco, California, United States (On-Site)

Palo Alto, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug