Staff Software Engineer, Site Reliability (SRE)

1 Month ago • 5 Years + • DevOps

Job Summary

Job Description

As a founding member of the Site Reliability Engineering (SRE) function at Character.AI, you'll maintain and operate production services supporting thousands of nodes, terabytes of data, and millions of daily active users. You'll develop tools and automation for monitoring and optimizing performance and reliability, implement CI/CD processes, collaborate with development teams on scalable systems, establish SLAs/SLOs, provide system monitoring and incident alerts, participate in on-call rotations, and develop site reliability and disaster recovery plans. The goal is to support aggressive user base growth to 3 billion users. This role requires deep experience in a development-focused DevOps/SRE role within a large-scale technology organization.
Must have:
  • 5+ years DevOps/SRE experience
  • Python and Golang expertise
  • SQL, Linux, Kubernetes, Terraform
  • GCP experience
  • Troubleshooting skills
  • Incident management
Good to have:
  • GPU clusters/HPC experience
  • Prometheus and Grafana experience

Job Details

About the role

As one of the founding members of our Site Reliability Engineering function here at Character, you’ll have the opportunity to support our infrastructure with thousands of nodes, terabytes of data and millions of daily active users on our site.  You’ll be responsible for ensuring our product's reliability, scalability, and performance as we aggressively grow our user base, with a goal of growing to 3 billion users. Work closely with our development team to design and implement processes and systems that ensure the stability and availability of our service.

What you’ll do

  • Maintain production services and keep them operational.

  • Develop tools, Instrumentation and automation to monitor and optimize the performance and reliability of our service.

  • Develop, implement and maintain automation tools and processes to prevent and mitigate service disruptions.

  • Collaborate with development teams to design and implement scalable, reliable systems, CI/CD processes for deployment.

  • Establish and support SLAs and SLOs for our site

  • Provide system monitoring and incident alerts

  • Participate in on-call rotations to provide support for critical incidents and outages.

  • Develop plans for site reliability and disaster recovery

Who you are

Competitive candidates will have:

  • 5+ years of experience in a development focused DevOps/SRE role within a technology organization that has significant scale

  • Deep experience with and proven success in developing software tools and automation wherever needed using Python and Golang

  • Expertise with SQL, Linux, CI/CD, Kubernetes, Terraform to support a site/application within a large multi node infrastructure and a growing user base. 

  • Experience working with multiple cloud computing platforms such as GCP is also a must

  • Demonstrated experience to successfully and reliably troubleshoot technical issues and challenges across a range of platforms and systems

  • Experience with incident management and event postmortems

Outstanding candidates will have one or more of the following:

  • Familiarity with GPU clusters and/or HPC environments is preferred

  • Experience with monitoring and logging tools such as Prometheus and Grafana

  • Hands-on experience scaling a consumer product from early days into hypergrowth

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.


In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.


Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

Similar Jobs

Fortis Games - Staff Security Architect

Fortis Games

Canada (On-Site)
2 Months ago
Mixmob - Senior Full-Stack React/Node & NFT Gaming Developer

Mixmob

Vancouver, British Columbia, Canada (Remote)
9 Months ago
Playrix - Senior Unity Software Engineer (Gameplay)

Playrix

Almaty, Almaty Region, Kazakhstan (Remote)
6 Months ago
ION - Cloud Engineer Kubernetes

ION

Milan, Lombardy, Italy (Hybrid)
6 Months ago
Wargaming - DevOps Engineer

Wargaming

Shanghai, Shanghai, China (On-Site)
1 Month ago
Equivalent Jobs - Technical Product Owner

Equivalent Jobs

(Remote)
2 Months ago
Nagarro - Senior Staff Engineer

Nagarro

Philippines (Remote)
6 Months ago
Info Stretch - Senior Engineer

Info Stretch

Pune, Maharashtra, India (On-Site)
5 Months ago
Rackspace Technology - Security Engineer - Palo Alto

Rackspace Technology

India (Remote)
1 Month ago
LSEG (London Stock Exchange Group) - DevOps Engineer

LSEG (London Stock Exchange Group)

Bengaluru, Karnataka, India (Hybrid)
6 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Ness Digital - Java & React Engineer

Ness Digital

Timișoara, Timiș, Romania (Remote)
2 Months ago
Trend Micro - (Sr.) Software Engineer in Linux

Trend Micro

Taipei City, Taiwan (On-Site)
6 Months ago
Wargaming - DevOps Engineer

Wargaming

Vilnius, Vilnius County, Lithuania (On-Site)
4 Months ago
Ness Digital - Senior Software Engineer

Ness Digital

Timișoara, Timiș, Romania (Remote)
1 Month ago
Info Stretch - Lead Data Engineer

Info Stretch

Chennai, Tamil Nadu, India (On-Site)
5 Months ago
DPDzero - Senior Software Engineer

DPDzero

Bengaluru, Karnataka, India (On-Site)
6 Months ago
Go Fund Me - Senior Software Engineer (Integrity)

Go Fund Me

Buenos Aires, Buenos Aires, Argentina (On-Site)
5 Months ago
Playrix - Principal C++ Software Engineer (Tools)

Playrix

Almaty, Almaty Region, Kazakhstan (Remote)
6 Months ago
PwC - Senior Associate_Azure Data Engineer_Data & Analytics_Advisory_PAN  India

PwC

Kolkata, West Bengal, India (On-Site)
6 Months ago
Starkflow - Full Stack Architect

Starkflow

Bengaluru, Karnataka, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Menlo Park, California, United States

NVIDIA - Technical Marketing Engineer - AI Platform Software

NVIDIA

Santa Clara, California, United States (Hybrid)
1 Month ago
Nintendo - CONTRACT - Software Engineer (NTD)

Nintendo

Redmond, Washington, United States (On-Site)
5 Months ago
WebFX - Jr. Web Designer

WebFX

Harrisburg, Pennsylvania, United States (On-Site)
6 Months ago
HoYoverse - Senior Game Recruiter

HoYoverse

Santa Monica, California, United States (Remote)
11 Months ago
Regent Craft - Systems Safety Engineering Intern

Regent Craft

North Kingstown, Rhode Island, United States (On-Site)
6 Months ago
The Walt Disney Company - Sr Software Engineer (webOS/Tizen)

The Walt Disney Company

Seattle, Washington, United States (On-Site)
5 Months ago
Epoch Games - Reallusion Character Creator 3D Artist

Epoch Games

North Carolina, United States (Remote)
3 Months ago
NVIDIA - Senior Physical Design Methodology Engineer

NVIDIA

Austin, Texas, United States (On-Site)
2 Months ago
Riot Games - Principal Insights Analyst - Player Platform

Riot Games

Los Angeles, California, United States (On-Site)
5 Months ago
Next Level Business Services - Senior Developer

Next Level Business Services

Bethpage, New York, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

PlayStation Global - Platform Engineer

PlayStation Global

Adelaide, South Australia, Australia (On-Site)
1 Month ago
Omnissa - Staff Engineer (C++ Linux)

Omnissa

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Zoox - IT Platform Manager

Zoox

Foster City, California, United States (On-Site)
6 Months ago
Hitachi - Azure Developer

Hitachi

Hyderabad, Telangana, India (Remote)
6 Months ago
Match Group - Senior Platform Engineer

Match Group

New York, New York, United States (Hybrid)
6 Months ago
Hashlist - Senior Data Engineer

Hashlist

Pune, Maharashtra, India (Hybrid)
5 Months ago
Wargaming - Senior Infrastructure Engineer (Python) (Game Engine Development Team)

Wargaming

Belgrade, Serbia (Hybrid)
4 Months ago
Litera - Site Reliability Engineer

Litera

Ahmedabad, Gujarat, India (On-Site)
5 Months ago
Immutable - Senior Site Reliability Engineer

Immutable

Sydney, New South Wales, Australia (Hybrid)
5 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

New York, New York, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Palo Alto, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

Menlo Park, California, United States (Remote)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Character.AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug