Outscal Logooutscal logo

Staff Software Engineer, Site Reliability (SRE)

11 Hours ago • 5 Years + • DevOps

Job Summary

Job Description

As a founding member of the Site Reliability Engineering (SRE) function at Character.AI, you'll support a massive infrastructure (thousands of nodes, terabytes of data, millions of daily active users) with a goal of reaching 3 billion users. Responsibilities include maintaining production services, developing monitoring and automation tools (Python, Golang), implementing CI/CD processes, collaborating with development teams on scalable systems, establishing SLAs/SLOs, providing system monitoring and incident alerts, participating in on-call rotations, and developing disaster recovery plans. You'll work with Kubernetes, Terraform, and multiple cloud platforms (GCP is a must). The role requires troubleshooting across various platforms and handling incident management and postmortems.
Must have:
  • 5+ years DevOps/SRE experience in a large-scale organization
  • Software tool and automation development (Python, Golang)
  • Expertise with SQL, Linux, CI/CD, Kubernetes, Terraform
  • Experience with GCP and troubleshooting across platforms
  • Incident management and postmortems
Good to have:
  • Familiarity with GPU clusters/HPC environments
  • Experience with Prometheus and Grafana

Job Details

About the role

As one of the founding members of our Site Reliability Engineering function here at Character, you’ll have the opportunity to support our infrastructure with thousands of nodes, terabytes of data and millions of daily active users on our site.  You’ll be responsible for ensuring our product's reliability, scalability, and performance as we aggressively grow our user base, with a goal of growing to 3 billion users. Work closely with our development team to design and implement processes and systems that ensure the stability and availability of our service.

What you’ll do

  • Maintain production services and keep them operational.

  • Develop tools, Instrumentation and automation to monitor and optimize the performance and reliability of our service.

  • Develop, implement and maintain automation tools and processes to prevent and mitigate service disruptions.

  • Collaborate with development teams to design and implement scalable, reliable systems, CI/CD processes for deployment.

  • Establish and support SLAs and SLOs for our site

  • Provide system monitoring and incident alerts

  • Participate in on-call rotations to provide support for critical incidents and outages.

  • Develop plans for site reliability and disaster recovery

Who you are

Competitive candidates will have:

  • 5+ years of experience in a development focused DevOps/SRE role within a technology organization that has significant scale

  • Deep experience with and proven success in developing software tools and automation wherever needed using Python and Golang

  • Expertise with SQL, Linux, CI/CD, Kubernetes, Terraform to support a site/application within a large multi node infrastructure and a growing user base. 

  • Experience working with multiple cloud computing platforms such as GCP is also a must

  • Demonstrated experience to successfully and reliably troubleshoot technical issues and challenges across a range of platforms and systems

  • Experience with incident management and event postmortems

Outstanding candidates will have one or more of the following:

  • Familiarity with GPU clusters and/or HPC environments is preferred

  • Experience with monitoring and logging tools such as Prometheus and Grafana

  • Hands-on experience scaling a consumer product from early days into hypergrowth

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.


In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.


Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

Similar Jobs

Wargaming - DevOps Engineer

Wargaming

Shanghai, Shanghai, China (On-Site)
6 Days ago
Nielsen Holdings - Staff Software Engineer- Full Stack Developer (AM-TECH-DA-39)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Argus Labs - Site Reliability Engineer

Argus Labs

Calgary, Alberta, Canada (Remote)
5 Days ago
Wargaming - DevOps Engineer (Deployment team)

Wargaming

Belgrade, Serbia (On-Site)
5 Days ago
Hedra - Full-Stack Engineer

Hedra

San Francisco, California, United States (On-Site)
12 Hours ago
Go Fund Me - Senior DevEx Engineer

Go Fund Me

Buenos Aires, Buenos Aires, Argentina (Remote)
3 Weeks ago
Revolgy - L2 Cloud Ops Engineer

Revolgy

(Remote)
1 Month ago
Rackspace Technology - Cloud Business Consultant

Rackspace Technology

Mexico City, Mexico City, Mexico (Remote)
2 Months ago
Fandom - Principal DevOps Engineer

Fandom

Poznań, Greater Poland Voivodeship, Poland (Remote)
1 Month ago
Ubisoft - Backend Golang Developer

Ubisoft

Montreal, Quebec, Canada (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Info Stretch - Senior Engineer

Info Stretch

Mumbai, Maharashtra, India (On-Site)
4 Months ago
Gaming Innovation Group  - Middle QA Engineer

Gaming Innovation Group

St. Julian's, Malta (Hybrid)
1 Week ago
Riot Games - Software Engineering Manager - Content Access Platform

Riot Games

Dublin, County Dublin, Ireland (On-Site)
1 Day ago
Build A Rocket Boy - Senior Data Engineer

Build A Rocket Boy

Edinburgh, Scotland, United Kingdom (Remote)
2 Months ago
Roofstacks - Senior Platform Engineer

Roofstacks

İstanbul, İstanbul, Türkiye (On-Site)
1 Month ago
Red Rover Interactive - Tools Programmer

Red Rover Interactive

Newcastle Upon Tyne, England, United Kingdom (Hybrid)
3 Months ago
ASSIST Software - Ruby on Rails Developer

ASSIST Software

Suceava, Suceava County, Romania (Remote)
4 Months ago
Gaming Innovation Group  - Senior Platform DevOps Engineer

Gaming Innovation Group

St. Julian's, Malta (Hybrid)
11 Hours ago
Corsair - eCommerce Architect

Corsair

Milpitas, California, United States (On-Site)
1 Day ago
N-iX - Senior Full-Stack Engineer

N-iX

Slovakia (Flexible)
1 Day ago

Get notifed when new similar jobs are uploaded

Jobs in Menlo Park, California, United States

Go Fund Me - Staff Software Engineer (Payments)

Go Fund Me

San Francisco, California, United States (On-Site)
4 Months ago
Fluence - Sales Engineer/Senior Sales Engineer - Battery Energy Storage

Fluence

Mountain View, California, United States (Hybrid)
5 Months ago
Onward Search - Business Developer

Onward Search

Columbus, Ohio, United States (On-Site)
4 Months ago
Niantic - Technical Art Manager, Pokémon GO

Niantic

Bellevue, Washington, United States (Hybrid)
2 Months ago
Crunchyroll - Staff Software Engineer, Account Services

Crunchyroll

San Francisco, California, United States (On-Site)
3 Weeks ago
Onward Search - Social Media Specialist

Onward Search

Addison, Texas, United States (Hybrid)
1 Month ago
Rockstar Games - Marketing Manager, Live Services

Rockstar Games

New York, New York, United States (On-Site)
3 Months ago
NVIDIA - Senior Math Libraries Engineer – AI and HPC

NVIDIA

Santa Clara, California, United States (Remote)
1 Month ago
People Can Fly - Live Operations Technician

People Can Fly

New York, United States (On-Site)
1 Week ago
The Walt Disney Company - Facilities & Security Coordinator

The Walt Disney Company

Burbank, California, United States (On-Site)
1 Day ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Nagarro - Staff Engineer

Nagarro

Portugal (Remote)
5 Months ago
Tencent - SRE Intern

Tencent

(On-Site)
3 Weeks ago
The Walt Disney Company - Sr Systems Engineer

The Walt Disney Company

Celebration, Florida, United States (On-Site)
1 Month ago
Social Discovery Group - ML Ops Engineer (AI Product)

Social Discovery Group

(Remote)
2 Months ago
PwC - Senior Associate_Azure Data Engineer_Data & Analytics_Advisory_PAN  India

PwC

Kolkata, West Bengal, India (On-Site)
5 Months ago
Luxoft - Senior Java Developer

Luxoft

Ukrainka, Kyiv Oblast, Ukraine (Remote)
3 Months ago
Sandsoft Games - DevOps & Automation Engineer

Sandsoft Games

Riyadh, Riyadh Province, Saudi Arabia (Hybrid)
11 Hours ago
Google - Staff Software Engineer, Site Reliability Engineering, Google Cloud

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
3 Months ago
The Walt Disney Company - Senior Software Engineer - Scala

The Walt Disney Company

Santa Monica, California, United States (On-Site)
1 Day ago
bosh group india - Technical Consultant

bosh group india

Bengaluru, Karnataka, India (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

New York, New York, United States (On-Site)

Menlo Park, California, United States (On-Site)

New York, New York, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

New York, New York, United States (On-Site)

California, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

Menlo Park, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug