Software Engineer, Machine Learning Infrastructure

4 Months ago β€’ 4 Years + β€’ Artificial Intelligence

Job Summary

Job Description

Character.AI seeks a seasoned ML Infrastructure engineer to design, build, and maintain training and serving infrastructure for ML research and product development. Responsibilities include providing infrastructure support for ML research, building tooling for diagnosing cluster issues and hardware failures, monitoring deployments, managing experiments, and maximizing GPU allocation and utilization. The ideal candidate possesses 4+ years of experience supporting ML infrastructure, developing diagnostic tools, and working with cloud platforms like Compute Engine, Kubernetes, and Cloud Storage. Experience with GPUs is essential.
Must have:
  • 4+ years supporting ML infrastructure
  • Develop diagnostic tools for ML infrastructure
  • Experience with cloud platforms (Compute Engine, Kubernetes, Cloud Storage)
  • GPU experience
Good to have:
  • Large GPU clusters and high-performance computing/networking
  • Large language model training support
  • ML frameworks (Pytorch/TensorFlow/JAX)
  • GPU kernel development

Job Details

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

  • Provide infrastructure support to our ML research and product

  • Build tooling to diagnose cluster issues and hardware failures

  • Monitor deployments, manage experiments, and generally support our research

  • Maximize GPU allocation and utilization for both serving and training

Requirements:

  • 4+ years of experience supporting the infrastructure within an ML environment

  • Experience in developing tools used to diagnose ML infrastructure problems and failures

  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)

  • Experience working with GPUs

Nice to have

  • Experience with large GPU clusters and high-performance computing/networking

  • Experience with supporting large language model training

  • Experience with ML frameworks like Pytorch/TensorFlow/JAX

  • Experience with GPU kernel development

About Character.AI

Founded in 2021, Character is a leading AI company offering personalized experiences through customizable AI 'Characters.' As one of the most widely used AI platforms worldwide, Character enables users to interact with AI tailored to their unique needs and preferences.

In just two years, we achieved unicorn status and were named Google Play's AI App of the Year – a testament to our groundbreaking technology and vision.

Ready to shape the future of Consumer AI? πŸš€

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

Similar Jobs

Riot Games - Senior Data Scientist - Singapore Efficiency Team

Riot Games

Singapore (On-Site)
β€’ 1 Month ago
ByteDance - Lead Research Scientist, Foundation Model, Music Intelligence

ByteDance

San Jose, California, United States (On-Site)
β€’ 4 Months ago
PwC - Senior AI Developer - Roma [DIG]

PwC

Rome, Lazio, Italy (On-Site)
β€’ 6 Months ago
Meta - Research Scientist, Machine Learning (PhD)

Meta

Pittsburgh, Pennsylvania, United States (On-Site)
β€’ 4 Months ago
NVIDIA - Senior Software Engineer, GPU Communications and Networking

NVIDIA

Santa Clara, California, United States (On-Site)
β€’ 1 Month ago
ByteDance - Software Engineer Intern (Doubao (Seed) - Machine Learning System) - 2025 Summer (PhD)

ByteDance

San Jose, California, United States (On-Site)
β€’ 4 Months ago
Luxoft - Senior ML Engineer

Luxoft

Poland, Ohio, United States (Remote)
β€’ 3 Months ago
Google - Software Engineer, PhD, Early Career, Campus, AI/Machine Learning, 2025 Start

Google

Mountain View, California, United States (On-Site)
β€’ 4 Months ago
Tencent - Large Language Model Research Intern

Tencent

(On-Site)
β€’ 1 Month ago
NVIDIA - Solution Architect - AI and ML

NVIDIA

(Remote)
β€’ 2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

NVIDIA - Senior Software Engineer - Triton Tools

NVIDIA

California, United States (Remote)
β€’ 2 Months ago
The Walt Disney Company - Lead Data Scientist

The Walt Disney Company

Burbank, California, United States (On-Site)
β€’ 4 Months ago
Epic Games - Research Programmer

Epic Games

Vancouver, British Columbia, Canada (On-Site)
β€’ 2 Months ago
The Walt Disney Company - Lead Data Scientist

The Walt Disney Company

Glendale, California, United States (On-Site)
β€’ 4 Months ago
The Walt Disney Company - Principal Machine Learning Engineer, Research - Ad Platforms

The Walt Disney Company

Washington, United States (On-Site)
β€’ 1 Month ago
NVIDIA - AI Computing Architect Intern - 2025

NVIDIA

Shanghai, Shanghai, China (On-Site)
β€’ 2 Months ago
ION - Data Engineer, Italy

ION

Italy (Hybrid)
β€’ 5 Months ago
Luxoft - Senior ML Engineer

Luxoft

Poland, Ohio, United States (Remote)
β€’ 3 Months ago
Samsung Semiconductor - Intern, Machine Learning Engineer - VLMs

Samsung Semiconductor

San Jose, California, United States (Hybrid)
β€’ 2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in New York, New York, United States

PlayStation Global - Senior Security Analyst-Data Governance

PlayStation Global

Carlsbad, California, United States (On-Site)
β€’ 2 Months ago
Scientific Games  - Field Service Technician I

Scientific Games

Gardiner, Maine, United States (On-Site)
β€’ 1 Month ago
Zoox - Senior Firmware Engineer

Zoox

Foster City, California, United States (On-Site)
β€’ 5 Months ago
Company3 Method Studios - Facilities Housekeeper

Company3 Method Studios

Los Angeles, California, United States (On-Site)
β€’ 1 Month ago
Netflix - Product Manager, Ads Platform (Netflix Ads Interfaces)

Netflix

New York, New York, United States (On-Site)
β€’ 5 Months ago
Company3 Method Studios - Senior Benefits Administrator (Temporary)

Company3 Method Studios

Santa Monica, California, United States (Hybrid)
β€’ 1 Month ago
Crunchyroll - People Experience Communications Manager

Crunchyroll

Culver City, California, United States (Hybrid)
β€’ 1 Month ago
ByteDance - Backend Software Engineer - Security Engineering

ByteDance

San Jose, California, United States (On-Site)
β€’ 1 Month ago
Next Level Business Services - Neo4J Architect

Next Level Business Services

Los Angeles, California, United States (On-Site)
β€’ 5 Months ago
Fanatics - Recruiting Coordinator

Fanatics

Winona, Minnesota, United States (On-Site)
β€’ 5 Months ago

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Meta - AI Research Scientist, Language - Generative AI

Meta

Seattle, Washington, United States (On-Site)
β€’ 4 Months ago
ByteDance - Research Scientist Intern (Doubao (Seed) - Machine Learning System) - 2025 Summer (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
β€’ 4 Months ago
PAPAYA - Data Scientist

PAPAYA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
β€’ 1 Month ago
Zoox - Senior Machine Learning Engineer - Collision Avoidance System

Zoox

Foster City, California, United States (Hybrid)
β€’ 5 Months ago
Canva - Machine Learning Research Engineering Manager - Image Generation

Canva

Vienna, Vienna, Austria (Remote)
β€’ 1 Month ago
Meta - Software Engineer, Machine Learning

Meta

Fremont, California, United States (Remote)
β€’ 4 Months ago
Scale AI - QA Engineer, Generative AI

Scale AI

Argentina (On-Site)
β€’ 5 Months ago
Zoox - Software Engineer - 3D World Generation Pipelines

Zoox

Foster City, California, United States (Hybrid)
β€’ 5 Months ago
Zoox - Prediction Internship/Co-op

Zoox

Foster City, California, United States (On-Site)
β€’ 5 Months ago
ByteDance - Software Engineer Intern (Doubao (Seed) - Machine Learning System) - 2025 Summer (MS)

ByteDance

San Jose, California, United States (On-Site)
β€’ 4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

Menlo Park, California, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug