DevOps/Site Reliability Engineer
Chief Rebel
Job Summary
As a DevOps/Site Reliability Engineer, you will join Chief Rebel's systems team to design, deploy, and maintain the robust platform and infrastructure supporting their game, Fellowship. The role focuses on automation and reliability, ensuring backend services never go down. You will implement infrastructure-as-code, enhance continuous delivery pipelines, refine observability systems, optimize game server orchestration, and manage incidents, taking ownership of the platform's operational stability and scalability.
Must Have
- Design, develop, and maintain scalable, reliable infrastructure and continuous delivery systems.
- Implement and manage observability, monitoring, and alerting systems.
- Set up on-call schedules, response plans, and manage incidents.
- Identify and execute cost optimization for cloud and platform services.
- Collaborate with game developers to streamline deployment and system integration.
- Ensure security and privacy of the platform and user data.
- Automate operational tasks and maintain robust CI/CD pipelines.
- Extensive experience with modern DevOps and Site Reliability Engineering principles.
- Comprehensive knowledge for uninterrupted service operation 24/7.
- Strong experience building robust, secure, scalable distributed applications in cloud environments (AWS, Google Cloud, Azure).
- Deep expertise in Infrastructure-as-Code (IaC) tools (Terraform, Flux, GitOps, Helm).
- Proficiency with containerization and orchestration technologies (Docker, Kubernetes).
- Knowledge of different types of databases.
- Strong programming skills in at least one language (Go, Python, Rust).
Good to Have
- C++ experience for understanding systems
Job Description
Are you passionate about building and maintaining highly available, scalable infrastructure for games? Do you have deep expertise in cloud environments, automation, and DevOps/SRE principles? Are you ready to take ownership and responsibility for ensuring that our backend services never go down? If so, we have an opportunity for you!
As a DevOps/Site Reliability Engineer, you will join our systems team to design, deploy, and maintain the robust platform and infrastructure that supports our game, Fellowship. We focus on automation and reliability to empower our development teams. While some Go or C++ experience is a plus for understanding the systems, your primary focus will be on the operational stability and scalability of the platform.
The type of features and tasks worked on will be varied, as Fellowship is a wide-featured game, developed by a small team. Examples of tasks in our backlog include infrastructure-as-code implementation, continuous delivery pipeline enhancements, observability system refinement, game server orchestration, and backend infrastructure optimization.
Expertise in Site Reliability Engineering principles and experience with different types of databases will be invaluable in maintaining the reliability and stability of our services.
Our current technology stack is mainly developed using Go, Docker, PostgreSQL, Redis and runs on Kubernetes both in cloud and on-prem. If you're excited about building and operating game infrastructure, taking ownership of its reliability, and want to be part of an innovative project, join us now!
Responsibilities:
- Design, develop, and maintain scalable and reliable infrastructure and continuous delivery systems.
- Implement and manage observability, monitoring, and alerting systems to ensure high performance and service health.
- Set up on-call schedules and response plans; write runbooks, and manage incidents and incident reporting.
- Cost optimization: Identify potential cost savings on cloud and platform services and execute on them, including negotiating with platform providers.
- Collaborate with game developers and engineers to streamline deployment processes and system integration.
- Ensure the security and privacy of the platform and user data.
- Automate operational tasks and maintain robust CI/CD pipelines.
What You Will Bring:
- Extensive experience with modern DevOps and Site Reliability Engineering principles.
- Comprehensive knowledge of what's needed to ensure uninterrupted service operation 24/7.
- Strong experience building robust, secure, and scalable distributed applications in cloud environments like AWS, Google Cloud, Azure, or similar.
- Deep expertise in Infrastructure-as-Code (IaC) tools and practices, including technologies like Terraform, Flux, GitOps, and Helm.
- Proficiency with containerization and orchestration technologies (Docker, Kubernetes).
- Knowledge of different types of databases and their strengths and weaknesses in different contexts.
- Strong programming skills in at least one language (e.g., Go, Python, Rust) for automation and tooling.
- A desire to work with others, share your knowledge, and eagerness to learn new things.
About Chief Rebel
Chief Rebel is a game development studio located in Stockholm, Sweden. We make stylized games with deeply involving mechanics. www.chiefrebel.com
About Chief Rebel
-----------------
Chief Rebel is a game development studio located in Stockholm, Sweden. Our studio exists to craft incredible, stylized games with deeply involved mechanics; providing thousands of hours of entertainment for our players. That's all.
Founded in 2018
Co-workers 30-40