Software Engineer, Machine Learning Infrastructure

1 Month ago • 4 Years + • Artificial Intelligence

About the job

Job Description

Character.AI seeks a seasoned ML Infrastructure engineer to design, build, and maintain training and serving infrastructure for ML research and product development. Responsibilities include providing infrastructure support for ML research, building tooling for diagnosing cluster issues and hardware failures, monitoring deployments, managing experiments, and maximizing GPU allocation and utilization. The ideal candidate possesses 4+ years of experience supporting ML infrastructure, developing diagnostic tools, and working with cloud platforms like Compute Engine, Kubernetes, and Cloud Storage. Experience with GPUs is essential.
Must have:
  • 4+ years supporting ML infrastructure
  • Develop diagnostic tools for ML infrastructure
  • Experience with cloud platforms (Compute Engine, Kubernetes, Cloud Storage)
  • GPU experience
Good to have:
  • Large GPU clusters and high-performance computing/networking
  • Large language model training support
  • ML frameworks (Pytorch/TensorFlow/JAX)
  • GPU kernel development
Not hearing back from companies?
Unlock the secrets to a successful job application and accelerate your journey to your next opportunity.

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

  • Provide infrastructure support to our ML research and product

  • Build tooling to diagnose cluster issues and hardware failures

  • Monitor deployments, manage experiments, and generally support our research

  • Maximize GPU allocation and utilization for both serving and training

Requirements:

  • 4+ years of experience supporting the infrastructure within an ML environment

  • Experience in developing tools used to diagnose ML infrastructure problems and failures

  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)

  • Experience working with GPUs

Nice to have

  • Experience with large GPU clusters and high-performance computing/networking

  • Experience with supporting large language model training

  • Experience with ML frameworks like Pytorch/TensorFlow/JAX

  • Experience with GPU kernel development

About Character.AI

Founded in 2021, Character is a leading AI company offering personalized experiences through customizable AI 'Characters.' As one of the most widely used AI platforms worldwide, Character enables users to interact with AI tailored to their unique needs and preferences.

In just two years, we achieved unicorn status and were named Google Play's AI App of the Year – a testament to our groundbreaking technology and vision.

Ready to shape the future of Consumer AI? 🚀

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

View Full Job Description

Add your resume

80%

Upload your resume, increase your shortlisting chances by 80%

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

New York, New York, United States (On-Site)

Menlo Park, California, United States (On-Site)

New York, New York, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Similar Jobs

Vigaet - Internship -AI Agents

Vigaet, (Remote)

ByteDance - Research Engineer - Multimodal Model

ByteDance, Singapore (On-Site)

Simplify Hire - GenAI Engineer

Simplify Hire, India (Remote)

Stupa Sports Analytics - Computer Vision Engineer

Stupa Sports Analytics, India (On-Site)

ThreeV Technologies,  Inc  - Data Scientist Computer Vision

ThreeV Technologies, Inc , India (Remote)

Kokotree - Artificial Intelligence Developers

Kokotree, United States (On-Site)

CharacterAI - Research Engineer - Multimodal

CharacterAI, United States (On-Site)

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Paypal - Machine Learning Engineer

Paypal, United States (Hybrid)

Match Group - Machine Learning Engineer (MG AI)

Match Group, South Korea (On-Site)

Google - Software Engineer, Machine Learning, Gemini

Google, Switzerland (On-Site)

Rackspace Technology - Data Solutions Director

Rackspace Technology, United States (Hybrid)

Intel Corporation - AI Frameworks Engineer

Intel Corporation, Costa Rica (On-Site)

Rackspace Technology - Principal MLOPs Engineer

Rackspace Technology, United States (Remote)

Get notifed when new similar jobs are uploaded

Jobs in New York, New York, United States

Overwolf - Business Development Manager - Nitro

Overwolf, United States (Remote)

The Walt Disney Company - Pixar Undergraduate Program (PUP) - Summer 2025

The Walt Disney Company, United States (On-Site)

Payactiv - Sales Specialist-Mid Market

Payactiv, United States (Remote)

Axon - Stock Plan Administrator

Axon, United States (On-Site)

CD PROJEKT RED - Senior Narrative (Quest) Designer

CD PROJEKT RED, United States (Hybrid)

Meta - Software Engineering Manager, Product

Meta, United States (On-Site)

Barbaricum - Systems Administrator

Barbaricum, United States (Hybrid)

Funko - Trade Customer & Operations Coordinator

Funko, United States (On-Site)

Hasbro - Claims Analyst

Hasbro, United States (On-Site)

Get notifed when new similar jobs are uploaded

Artificial Intelligence Jobs

Get notifed when new similar jobs are uploaded