Senior Site Reliability Engineer

9 Hours ago • 3 Years +

Job Summary

Job Description

As a Senior Site Reliability Engineer (SRE) at Razer, you will join a team focused on building scalable, observable, and resilient systems within an AWS environment. Your responsibilities will include designing and maintaining Infrastructure as Code (IaC) solutions using tools like Terraform and CloudFormation. You will collaborate with various teams to ensure secure and reliable cloud infrastructure. You will also implement monitoring and alerting systems, drive incident response, and provide mentorship to junior engineers. The role involves automating tasks, troubleshooting complex issues, and participating in on-call support, with a shift from 5:00 PM to 2:00 AM (UTC+8).
Must have:
  • 3+ years experience in SRE or related roles.
  • Expertise with AWS Cloud Services.
  • Deep understanding of IaC using Terraform/CloudFormation.
  • Proficiency in a scripting language (Python, etc.).
  • Experience operating and troubleshooting Linux/Windows environments.
  • Experience with monitoring, alerting, and incident management.
  • Experience with Zero Downtime Deployments.
  • Strong understanding of DR.

Job Details

Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer-centric #LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.

Job Responsibilities :

We are seeking a skilled and driven Senior Site Reliability Engineer (SRE) to join our growing infrastructure and platform engineering team. The ideal candidate will have hands-on experience in Amazon Web Services (AWS), strong troubleshooting capabilities, and a passion for building scalable, observable, and resilient systems using modern Infrastructure as Code (IaC) and automation tools.

REQUIREMENTS:

  • Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related field.
  • Minimum 3 years of experience in SRE, DevOps, cloud infrastructure, or system administration roles.
  • Hands-on expertise with AWS Cloud Services, including:
  • Compute & Containerization: EC2, Lambda, ECS, EKS, Auto Scaling
  • Networking: Load Balancers, VPC, Route 53, Security Groups, Firewalls
  • Storage & Databases: RDS, ElastiCache, Athena, S3
  • Messaging: SQS, SES
  • Deep understanding of Infrastructure as Code (IaC) tools such as Terraform and CloudFormation.
  • Proficiency in at least one programming/scripting language: Python, Node.js, Bash, Ruby, or related.
  • Experience operating and troubleshooting across Linux, Windows, and container-based environments.
  • Strong understanding of distributed systems, cloud networking (routers, switches), firewalls, DNS, and HTTP/TLS.
  • Experience implementing monitoring and alerting systems and working with incident management processes.
  • Experience with Zero Downtime Deployments, blue/green or canary deployments.
  • Familiarity with cost optimization and right-sizing AWS resources.
  • Exposure to multi-region, multi-account AWS architecture.
  • Understanding of API gateway, or edge networking (e.g., Akamai, CloudFront).

JOB DESCRIPTION:

  • Design, implement, and maintain Infrastructure as Code (IaC) solutions using Terraform and/or CloudFormation across multi-account AWS environments.
  • Collaborate with developers, architects, and DevOps teams to build scalable, secure, and observable cloud infrastructure.
  • Lead and participate in architecture design sessions, focusing on system reliability, scalability, security, and performance.
  • Implement and manage robust monitoring, alerting, and observability solutions (e.g., CloudWatch, Prometheus, ELK, Datadog).
  • Set and monitor Key Performance Indicators (KPIs) for system uptime, latency, throughput, and overall reliability.
  • Drive incident response processes, including coordination, triaging, resolution, documentation, and post-incident reviews (PIRs).
  • Supervise and mentor junior SREs and infrastructure engineers, fostering knowledge-sharing and team growth.
  • Collaborate across development, operations, and security teams to ensure secure and compliant deployments.
  • Automate manual tasks and workflows through scripting and tooling (Python, Node.js, Bash, Ruby, JSON/YAML).
  • Troubleshoot complex infrastructure issues across Linux, Windows, Docker, and cloud-native environments.
  • Provide IaC and CI/CD best practices to ensure repeatability, scalability, and compliance across all environments.
  • Provide on-call support, participate in incident rotations, and lead technical investigations during outages or degradations.
  • Strong understanding and experience for Disaster Recovery (DR).
  • Support from 5:00PM to 2:00AM (UTC+8) shift to ensure continuous of SRE coverage.
  • Undergo initial familiarization period during regular working hours before transitioning to the designated shift.
  • Provide support and solution handling to incident and tickets assigned.


 

Pre-Requisites :

Are you game?

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

At Razer, you'll be at the forefront of the most exciting industry in the world — gaming. Evolving forms of gaming require evolving forms of hardware, software and services. That’s where Razer comes in, offering innovative top-of-the-line products and services to allow gamers to fully immerse in the ultimate gaming experience.Getting onboard Razer will place you on a global mission to bring gamers closer to the games they love. Razer is a place to do great work, offering you the opportunity to be a part of a global team across 11 countries. Whether you are a hardcore evangelist who breathe life to the latest and greatest gaming gear or a behind-the-scene hero who runs our global operations, you are assured of a career-changing quest that transcends time zones and culture with one single spell: For Gamers. By Gamers.The journey towards phenomenal-ness won’t come easy. However, we will excel because gamers rely on teamwork. We achieve greatness because we are wicked problem-solvers and tenacious in clinching victories in all that we do. It is the team that makes Razer where it is today and will continue to bring Razer to even greater heights.

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (On-Site)

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (On-Site)

Singapore (On-Site)

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (On-Site)

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (On-Site)

Singapore (On-Site)

Singapore (On-Site)

Singapore (On-Site)

Singapore (On-Site)

View All Jobs

Get notified when new jobs are added by Razer

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug