Senior Site Reliability Engineer

Playson

| Remote | Full Time | 1 day ago

Apply Now

Job Summary

Playson is seeking an experienced Senior Site Reliability Engineer/DevOps to join their dynamic Platform Tribe. This role involves managing day-to-day alerts, system checks, and issue escalation, alongside providing 24x7 on-call support for critical SaaS events. Key responsibilities include proactively creating monitors within the EKS/K8s ecosystem, deploying to EKS/K8s clusters using Terraform and Helm/Flux, and enhancing infrastructure health. The engineer will also maintain deployment code, integrate new technologies into Cloud Infrastructure, and collaborate with other teams to ensure top-notch support and minimal impact during deployments.

Must Have

Manage day-to-day alerts, system checks, and issue escalation.
Provide 24x7 on-call support for critical SaaS events.
Document issues and remediation steps.
Proactively create monitors within the EKS/K8s ecosystem.
Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
Enhance infrastructure health with checks and scripts.
Maintain and develop deployment code.
Implement/integrate new technologies into Cloud Infrastructure.
Collaborate with other teams for support and assistance.
Prioritize customer focus in planning deployments/updates.
Conduct RCA and take corrective actions to prevent recurrence.
Assign alert-related actions to appropriate teams after investigation.
Handle support requests for environment-specific actions.
Strong experience with issue processing (RCA, Postmortems).
Proficiency in Kubernetes (deployment, scaling, troubleshooting).
Familiarity with AWS, Terraform, Docker, CI/CD.
Experience with monitoring tools like DataDog, Prometheus, Grafana.
Experience with logging solutions like ELK Stack or AWS CloudWatch.
Strong understanding of networking concepts and protocols.
Proficiency in at least one scripting language (e.g., Python, NodeJS, Go).
Experience with configuration management tools like FluxCD/ArgoCD.
Proficiency in Git or other version control systems.
Familiarity with incident response and management tools.

Perks & Benefits

Professional development
Flexibility in your schedule
Full Medical Insurance for you and your +1
Special Life Event financial support
Unlimited paid vacation leave
Bonus system
Unlimited sick leave
Remote work
Courses and training reimbursement

Job Description

We are currently seeking an experienced Senior Site Reliability Engineer/DevOps to join our dynamic Platform Tribe.

###### What will you be doing:

Manage day-to-day alerts, system checks, and issue escalation as necessary.
Provide 24x7 on-call support for critical SaaS events.
Document issues and remediation steps.
Proactively create monitors within the EKS/K8s ecosystem.
Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
Enhance infrastructure health by implementing checks and scripts to address known issues.
Maintain and develop deployment code.
Implement/integrate new technologies into our Cloud Infrastructure.
Collaborate with other teams to provide top-notch support and assistance.
Prioritize customer focus in planning deployments/updates, ensuring minimal impact.
Conduct RCA and take necessary corrective actions to prevent issue recurrence.
Assign alert-related actions to the appropriate team after investigation.
Handle support requests for environment-specific actions.

###### To succeed in this role, you will need:

Strong experience with issue processing (RCA, Postmortems).
Proficiency in Kubernetes (deployment, scaling, troubleshooting).
Familiarity with AWS, Terraform, Docker, CI/CD.
Experience with monitoring tools like DataDog, Prometheus, Grafana, and logging solutions like Elasticsearch, Logstash, and Kibana (ELK Stack) or AWS CloudWatch.
Strong understanding of networking concepts and protocols.
Proficiency in at least one scripting language (e.g., Python, NodeJS, Go).
Experience with configuration management tools like FluxCD/ArgoCD.
Proficiency in Git or other version control systems.
Familiarity with incident response and management tools like PagerDuty, Opsgenie, or VictorOps.
Ownership, proactiveness, persistence, and passion for maintaining a high-traffic online platform.

###### Recruitment Process:

1. HR Interview

2. Hiring Manager Interview

3. Technical Interview

4. Final Interview with Head of Platform & CTO

If you're ready to embrace ambitious goals and thrive in a dynamic environment,

Apply now and become part of Playson's exciting journey in the iGaming world!

21 Skills Required For This Role

Saas Business Models Problem Solving Github Talent Acquisition Game Texts Networking Incident Response Aws Logstash Kibana Prometheus Grafana Terraform Elasticsearch Elk Helm Ci Cd Docker Kubernetes Git Python

Senior Site Reliability Engineer

Job Summary

Must Have

Perks & Benefits

Job Description

21 Skills Required For This Role

Similar Jobs

Devops

Software Development & Engineering