Staff Software Eng - Site Reliability

4 Weeks ago • 5 Years +

About the job

About the role

The Role:

As the founding member of our Site Reliability Engineering function here at Character, you’ll have the opportunity to support our infrastructure with thousands of nodes, terabytes of data and millions of daily active users on our site.  You’ll be responsible for ensuring our product's reliability, scalability, and performance as we aggressively grow our user base, with a goal of growing to 3 billion users. Work closely with our development team to design and implement processes and systems that ensure the stability and availability of our service.

Specific Responsibilities:

  • Maintain production services and keep them operational.

  • Develop tools, Instrumentation and automation to monitor and optimize the performance and reliability of our service.

  • Develop, implement and maintain automation tools and processes to prevent and mitigate service disruptions.

  • Collaborate with development teams to design and implement scalable, reliable systems, CI/CD processes for deployment.

  • Establish and support SLAs and SLOs for our site

  • Provide system monitoring and incident alerts

  • Participate in on-call rotations to provide support for critical incidents and outages.

  • Develop plans for site reliability and disaster recovery

Job Requirements:

  • 5+ years of experience in a development focused DevOps/SRE role within a technology organization that has significant scale

  • Deep experience with and proven success in developing software tools and automation wherever needed using Python and Golang

  • Expertise with SQL, Linux, CI/CD, Kubernetes, Terraform to support a site/application within a large multi node infrastructure and a growing user base. 

  • Experience working with multiple cloud computing platforms such as GCP is also a must

  • Demonstrated experience to successfully and reliably troubleshoot technical issues and challenges across a range of platforms and systems

  • Experience with incident management and event postmortems

Desired Experience:

  • Familiarity with GPU clusters and/or HPC environments is preferred

  • Experience with monitoring and logging tools such as Prometheus and Grafana

  • Hands-on experience scaling a consumer product from early days into hypergrowth

About Character.AI

Founded in 2021, Character is a leading AI company offering personalized experiences through customizable AI 'Characters.' As one of the most widely used AI platforms worldwide, Character enables users to interact with AI tailored to their unique needs and preferences.

In just two years, we achieved unicorn status and were named Google Play's AI App of the Year – a testament to our groundbreaking technology and vision.

Ready to shape the future of Consumer AI? 🚀

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Compensation Range: $150K - $350K

About The Company

Character is one of the world's leading personal AI platforms. Founded in 2021 by AI pioneers Noam Shazeer and Daniel De Freitas, Character is a full-stack AI company with a globally scaled direct-to-consumer platform. 

California, United States (On-Site)

View All Jobs

Similar Skill Jobs

The Walt Disney Company - Rigging Supervisor

British Columbia, Canada (On-Site)

Offworld - DevOps Engineer

British Columbia, Canada (Hybrid)

Activision - Analytics Engineer (Contract)

California, United States (On-Site)

Warner Bros. Games - Senior Data Engineer

Telangana, India (Hybrid)

Ubisoft - IT Manager

Quebec, Canada (Hybrid)

Ubisoft - Programmeuse, Programmeur online

Quebec, Canada (Hybrid)

Ubisoft - Online Programmer

Quebec, Canada (Hybrid)

Jagex - Senior Commercial Business Analyst

England, United Kingdom (Remote)

Rockstar Games - Full Stack Engineer (C#/React)

California, United States (On-Site)

Jobs in Menlo Park, California, United States

Nintendo - Software Engineer I, Graphics (NST)

Washington, United States (Hybrid)

Niantic - Product Manager - AR Platform

California, United States (Hybrid)

Niantic - Product Manager - AR Platform

California, United States (Hybrid)

Riot Games - Game Producer III, Game Loop - 2XKO

California, United States (On-Site)

Daybreak Game Company LLC - Senior Publishing Producer

California, United States (Hybrid)

Tencent - Senior Business Partnership Manager

California, United States (On-Site)

Tencent - Marketing Specialist

California, United States (On-Site)

Activision - Analytics Engineer (Contract)

California, United States (On-Site)

Epic Games - Technical UI Designer

North Carolina, United States (On-Site)

Software Engineering Jobs

Aristocrat Gaming - Safer Gambling Specialist

Sliema, Malta (Hybrid)

Nintendo - Software Engineer I, Graphics (NST)

Washington, United States (Hybrid)

The Workshop - Head of Security

Andalusia, Spain (On-Site)

Daybreak Game Company LLC - Senior Publishing Producer

California, United States (Hybrid)

Offworld - DevOps Engineer

British Columbia, Canada (Hybrid)