Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer-centric #LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.
In this role, your responsibilities include:
• Lead efforts to monitor, maintain, and enhance the reliability and availability of our production systems.
• Design and implement robust monitoring, alerting, and incident response processes.
• Collaborate with development teams to ensure seamless deployment and operation of applications.
• Manage our cloud-based infrastructure (AWS) and systems.
• Automate routine tasks to improve efficiency and reduce manual intervention.
• Scale infrastructure to meet growing demands.
• Participate in on-call rotations and respond promptly to critical incidents.
• Optimize resource utilization, including CPU, memory, and storage.
• Drive continuous improvement in system reliability and operational excellence.
• Champion reliability best practices across the organization.
In this role, the preferred skills and qualifications are:
• Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
• Proven track record as a Site Reliability Engineer or similar role.
• Experience with containerization (Docker, Kubernetes) and orchestration.
• Familiarity with cloud platforms (AWS, Azure, GCP) and infrastructure-as-code (Terraform, Ansible).
• Excellent problem-solving abilities and a passion for automating repetitive tasks.
• Certifications in cloud technologies (AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer, etc.).
• Experience with observability tools (Prometheus, Grafana, ELK stack).
• Knowledge of CI/CD pipelines and GitOps practices.
Are you game?