Software Engineer, Machine Learning Infrastructure

2 Months ago • 4 Years + • DevOps • Artificial Intelligence

Job Summary

Job Description

Character.AI seeks a seasoned Software Engineer for Machine Learning Infrastructure. Responsibilities include providing infrastructure support for ML research and product development, building tooling for diagnosing cluster issues and hardware failures, monitoring deployments, managing experiments, and maximizing GPU allocation. The ideal candidate has 4+ years of experience supporting ML infrastructure, developing diagnostic tools, and working with cloud platforms (Compute Engine, Kubernetes, Cloud Storage) and GPUs. Experience with large GPU clusters, high-performance computing, large language model training, ML frameworks (PyTorch/TensorFlow/JAX), and GPU kernel development are highly desirable.
Must have:
  • 4+ years ML infrastructure support experience
  • Experience developing ML infrastructure diagnostic tools
  • Cloud platform experience (Compute Engine, Kubernetes, Cloud Storage)
  • GPU experience
Good to have:
  • Large GPU cluster & high-performance computing experience
  • Large language model training experience
  • ML framework experience (PyTorch/TensorFlow/JAX)
  • GPU kernel development experience

Job Details

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

  • Provide infrastructure support to our ML research and product

  • Build tooling to diagnose cluster issues and hardware failures

  • Monitor deployments, manage experiments, and generally support our research

  • Maximize GPU allocation and utilization for both serving and training

Requirements:

  • 4+ years of experience supporting the infrastructure within an ML environment

  • Experience in developing tools used to diagnose ML infrastructure problems and failures

  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)

  • Experience working with GPUs

Nice to have

  • Experience with large GPU clusters and high-performance computing/networking

  • Experience with supporting large language model training

  • Experience with ML frameworks like Pytorch/TensorFlow/JAX

  • Experience with GPU kernel development

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.


In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.


Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

Similar Jobs

The Walt Disney Company - Lead Software Engineer, Machine Learning - Ad Platforms

The Walt Disney Company

Santa Monica, California, United States (On-Site)
6 Months ago
GoDaddy - Senior Machine Learning Scientist

GoDaddy

London, England, United Kingdom (Hybrid)
2 Weeks ago
Adobe - Machine Learning Engineer

Adobe

San Jose, California, United States (On-Site)
5 Days ago
Casumo - AI Engineer

Casumo

(Hybrid)
2 Months ago
Safari AI - CV/ML Intern

Safari AI

New York, United States (Remote)
14 Hours ago
Next Level Business Services - Sr. Big Data Engineer in San Francisco, CA  / McLean, VA

Next Level Business Services

San Francisco, California, United States (On-Site)
7 Months ago
Zazz - Cloud Engineer (AWS)

Zazz

(Remote)
3 Months ago
Mistplay - Senior DevOps Engineer II

Mistplay

Toronto, Ontario, Canada (Hybrid)
2 Months ago
Next Level Business Services - Salesforce Devops Engineer

Next Level Business Services

Agoura Hills, California, United States (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Google - Staff Software Engineer, ML Performance, GPUs

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
PrizePicks - Data Science Engineering Manager

PrizePicks

Atlanta, Georgia, United States (Remote)
1 Month ago
Scale AI - AI Infrastructure Engineer, Model Serving Platform

Scale AI

San Francisco, California, United States (On-Site)
1 Month ago
bytedance - Video Analysis and Quality Algorithm Intern 2023 Summer/Fall (PHD)

bytedance

Seattle, Washington, United States (On-Site)
7 Months ago
Nightfall - Senior ML Platform Backend Engineer

Nightfall

San Francisco, California, United States (Hybrid)
3 Days ago
Great Learning - Data Scientist

Great Learning

Bengaluru, Karnataka, India (On-Site)
8 Months ago
Match Group - Staff Software Engineer, Machine Learning

Match Group

Palo Alto, California, United States (Hybrid)
7 Months ago
The Walt Disney Company - Senior Machine Learning Engineer - Ad Platforms

The Walt Disney Company

San Francisco, California, United States (On-Site)
3 Months ago
Yahoo - Senior Machine Learning Engineer

Yahoo

United States (Hybrid)
4 Days ago
Trendyol - Data Science Team Lead - Dolap

Trendyol

Istanbul, İstanbul, Türkiye (Hybrid)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in New York, New York, United States

The Walt Disney Company - Lead Software Engineer (Roku Engineer)

The Walt Disney Company

New York, New York, United States (On-Site)
6 Months ago
Nordson Corporation - Specialist I, Business Development

Nordson Corporation

East Providence, Rhode Island, United States (Remote)
1 Month ago
WebTech Corporation - Proposal/Bid Manager

WebTech Corporation

Duncan, South Carolina, United States (On-Site)
2 Weeks ago
Zinnia - Associate I, Carrier Solutions - Life Insurance

Zinnia

Alpharetta, Georgia, United States (Hybrid)
6 Days ago
Scout - Director, Interior and Seats

Scout

Novi, Michigan, United States (On-Site)
1 Month ago
Dream world  - Character Artist (with Art Direction Potential)

Dream world

Redwood City, California, United States (Remote)
1 Month ago
ManyChat - Internal Auditor

ManyChat

Austin, Texas, United States (Hybrid)
5 Days ago
Britive - IT Engineer I

Britive

Richmond, Virginia, United States (On-Site)
2 Weeks ago
Rive - Sr. C++ Graphics Engineer

Rive

United States (Remote)
2 Months ago
NVIDIA - Senior Datacenter GPU Power Architect

NVIDIA

Santa Clara, California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

zeta - Data Reliability Engineer II

zeta

Hyderabad, Telangana, India (On-Site)
7 Months ago
bytedance - Senior Software Engineer - Compute Infrastructure (Orchestration & Scheduling)

bytedance

San Jose, California, United States (On-Site)
1 Month ago
Hashone Careers - Cloud Engineer

Hashone Careers

Bengaluru, Karnataka, India (Remote)
6 Months ago
Virtusa - DevOps Lead

Virtusa

Pune, Maharashtra, India (Hybrid)
7 Months ago
velotio technologies  - Senior DevOps Engineer (GCP)

velotio technologies

Pune, Maharashtra, India (Remote)
2 Months ago
Nagarro - Principal Engineer -- PHP Developer

Nagarro

New Jersey, United States (Remote)
7 Months ago
bytedance - Tech Lead (SRE) - Cloud Infrastructure

bytedance

Singapore (On-Site)
6 Months ago
Crunchyroll - DevOps Engineer, Core Infrastructure Engineering

Crunchyroll

San Francisco, California, United States (Hybrid)
3 Months ago
NVIDIA - Senior Site Reliability Engineer - AI Research Clusters

NVIDIA

Westford, Massachusetts, United States (Hybrid)
3 Months ago
Rackspace Technology - Senior Cloud Business Consultant (FinOps)

Rackspace Technology

Canada (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

New York, New York, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Palo Alto, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Menlo Park, California, United States (Remote)

San Francisco, California, United States (On-Site)

New York, New York, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug