Software Engineer, Machine Learning Infrastructure

2 Months ago • 4 Years + • Devops

Job Summary

Job Description

Character.AI seeks a seasoned Software Engineer specializing in Machine Learning Infrastructure. Responsibilities include providing infrastructure support for ML research and product development, building diagnostic tools for cluster issues and hardware failures, monitoring deployments and experiments, and maximizing GPU utilization for training and serving. The ideal candidate possesses 4+ years of experience supporting ML infrastructure, developing diagnostic tools, and working with cloud platforms like Compute Engine, Kubernetes, and Cloud Storage. Experience with GPUs is a must.
Must have:
  • 4+ years supporting ML infrastructure
  • Develop diagnostic tools for ML infrastructure
  • Experience with cloud platforms (Compute Engine, Kubernetes, Cloud Storage)
  • GPU experience
Good to have:
  • Large GPU clusters and HPC/networking
  • LLM training support
  • ML frameworks (PyTorch/TensorFlow/JAX)
  • GPU kernel development

Job Details

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

  • Provide infrastructure support to our ML research and product

  • Build tooling to diagnose cluster issues and hardware failures

  • Monitor deployments, manage experiments, and generally support our research

  • Maximize GPU allocation and utilization for both serving and training

Requirements:

  • 4+ years of experience supporting the infrastructure within an ML environment

  • Experience in developing tools used to diagnose ML infrastructure problems and failures

  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)

  • Experience working with GPUs

Nice to have

  • Experience with large GPU clusters and high-performance computing/networking

  • Experience with supporting large language model training

  • Experience with ML frameworks like Pytorch/TensorFlow/JAX

  • Experience with GPU kernel development

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.


In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.


Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

Similar Jobs

Google - Software Engineer III, Infrastructure, Core

Google

Bengaluru, Karnataka, India (On-Site)
2 Months ago
FICO - Site Reliability Engineering-Engineer II

FICO

Pune, Maharashtra, India (On-Site)
2 Weeks ago
Tide - Principal Cloud Engineer

Tide

Belgrade, Serbia (Remote)
1 Month ago
Corsair - Technical Sales Engineer - Broadcasting

Corsair

(Remote)
2 Months ago
Ion - Service Desk Manager

Ion

Milan, Lombardy, Italy (On-Site)
2 Months ago
Autodesk - Software Engineering Manager - SRE/DevOps

Autodesk

Singapore (On-Site)
3 Weeks ago
Qualcomm - Senior Devops Engineer

Qualcomm

Hyderabad, Telangana, India (On-Site)
1 Month ago
binance - Senior DevOps Engineer (Blockchain)

binance

Bangkok, Thailand (Remote)
2 Weeks ago
Thousand Eyes - Senior Software Engineer, Cloud and Enterprise Agents

Thousand Eyes

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
1 Month ago
Rackspace Technology - Senior Solution Architect (Applications)

Rackspace Technology

England, United Kingdom (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

easygo - Senior DevOps Engineer

easygo

Melbourne, Victoria, Australia (On-Site)
1 Month ago
Palo Alto Networks - SASE Customer Success Engineering Manager

Palo Alto Networks

London, England, United Kingdom (On-Site)
1 Month ago
Epic Games - Senior Engineer, Patching

Epic Games

Cary, North Carolina, United States (On-Site)
5 Months ago
PhonePe - Site Reliability Engineer 3

PhonePe

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Haptic  - Senior DevOps Engineer

Haptic

Paris, Île-de-France, France (Remote)
6 Months ago
Ion - Reporter

Ion

New York, United States (On-Site)
6 Months ago
NVIDIA - Senior System Software Architect, HPC Networking

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
5 Months ago
Axel springer - Account Manager

Axel springer

Stockholm, Stockholm County, Sweden (Hybrid)
3 Weeks ago
Nintendo - Systems Engineer (Windows/Client Engineering)

Nintendo

Redmond, Washington, United States (Hybrid)
7 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

Zuora - Solution Architect, Revenue Accounting

Zuora

Atlanta, Georgia, United States (Hybrid)
1 Month ago
Lightcast - Survey Project Manager, Alumni Surveys, Education

Lightcast

United States (Remote)
1 Month ago
WebTech Corporation - Technical Project Manager

WebTech Corporation

Cedar Rapids, Iowa, United States (On-Site)
3 Weeks ago
AECOM - Project Manager (Renewables Focus)

AECOM

Houston, Texas, United States (Remote)
2 Weeks ago
bytedance - Site Reliability Engineer - AML

bytedance

San Jose, California, United States (On-Site)
8 Months ago
Side - Tools Engineer - Talent Pool

Side

United States (Remote)
1 Month ago
Adtran - Services Engineer

Adtran

Huntsville, Alabama, United States (Hybrid)
1 Month ago
Unbroken Studios - Assistant Manager

Unbroken Studios

Apex, North Carolina, United States (On-Site)
1 Week ago
Next Level Business Services - Business Analyst - Mobility

Next Level Business Services

Collegeville, Pennsylvania, United States (On-Site)
8 Months ago
Valve corporation - Steam Support Leadership

Valve corporation

Bellevue, Washington, United States (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

New Globe - Senior DevOps Engineer

New Globe

Timișoara, Timiș, Romania (Hybrid)
1 Month ago
Google - Software Engineer III, Infrastructure, Google TV

Google

(On-Site)
7 Months ago
Spaulding Ridge - Oracle EPM Solution Architect

Spaulding Ridge

Chicago, Illinois, United States (On-Site)
2 Months ago
GigXR - Platform Engineer

GigXR

Los Angeles, California, United States (Remote)
1 Month ago
Synechron - Scrum Master (DevOps Expertise in Cloud Computing and AI/ML Technologies)

Synechron

Pune, Maharashtra, India (On-Site)
2 Weeks ago
Zones - Inside Solutions Architect - UC & Collaboration

Zones

Islamabad, Islamabad Capital Territory, Pakistan (On-Site)
1 Week ago
Saviynt - Senior Solutions Engineer

Saviynt

Singapore (Hybrid)
3 Weeks ago
Scale AI - AI Infrastructure Engineer, Model Serving Platform

Scale AI

San Francisco, California, United States (On-Site)
2 Months ago
Pay2 - Cloud Infrastructure Engineer (MLOps)

Pay2

Gurugram, India (On-Site)
1 Month ago
Intel  - Senior Infrastructure Engineer - Virtualization and Cloud Platforms

Intel

Phoenix, Arizona, United States (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

New York, New York, United States (On-Site)

San Francisco, California, United States (On-Site)

Palo Alto, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Menlo Park, California, United States (Remote)

San Francisco, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

New York, New York, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug