Site Reliability Engineer

11 Minutes ago • 2 Years +
Devops

Job Description

Razer is seeking a skilled and driven Site Reliability Engineer (SRE) to join its growing infrastructure and platform engineering team. This role offers the opportunity to make a global impact within a gamer-centric environment, fostering accelerated personal and professional growth. The ideal candidate will have hands-on experience with Amazon Web Services (AWS), strong troubleshooting capabilities, and a passion for building scalable, observable, and resilient systems using modern Infrastructure as Code (IaC) and automation tools. Key responsibilities include designing and maintaining IaC, implementing cloud infrastructure on AWS, leading architecture reviews, developing monitoring solutions, performing incident management, and automating operations.
Good To Have:
  • Experience with containerization and orchestration (Docker, ECS, or Kubernetes)
Must Have:
  • Bachelor’s degree in Computer Science or related field
  • Minimum 2 years of experience in SRE, DevOps, Cloud Infrastructure, or Systems Administration
  • Solid hands-on experience with AWS Cloud services (EC2, Lambda, ECS, VPC, SQS, S3, RDS, CloudWatch, etc.)
  • Proficiency in Infrastructure as Code using Terraform and/or CloudFormation
  • Experience with CI/CD tools (e.g., GitLab CI, Jenkins, CodePipeline)
  • Strong understanding of Linux and Windows system administration and troubleshooting
  • Comfortable with scripting/programming languages like Python, Node.js, Bash, Ruby, or JSON/YAML
  • Strong grasp of network fundamentals (DNS, HTTP(S), TLS/SSL, firewalls, TCP/IP)
  • Familiarity with observability tools and incident management best practices
  • Ability to design, develop, and maintain IaC
  • Implement and operate reliable, scalable cloud infrastructure on AWS
  • Develop and manage robust monitoring, alerting, and logging solutions
  • Perform incident management, postmortems, and root cause analysis
  • Automate infrastructure operations and improve reliability
  • Ensure systems are compliant with security standards
  • Provide on-call support and participate in incident rotations
Perks:
  • Opportunity to make a global impact
  • Work across a global team located across 5 continents
  • Unique, gamer-centric #LifeAtRazer experience
  • Accelerated personal and professional growth
  • Certified as a Great Place to Work® in United States and Singapore

Add these skills to join the top 1% applicants for this job

problem-solving
game-texts
release-management
gitlab
ruby
networking
yaml
dns
linux
aws
prometheus
grafana
terraform
elk
amazon-web-services
node.js
json
ci-cd
docker
kubernetes
python
bash
jenkins

Job Responsibilities :

Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer-centric #LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.

We are seeking a skilled and driven Site Reliability Engineer (SRE) to join Razer Gold growing infrastructure and platform engineering team. The ideal candidate will have hands-on experience in Amazon Web Services (AWS), strong troubleshooting capabilities, and a passion for building scalable, observable, and resilient systems using modern Infrastructure as Code (IaC) and automation tools.

REQUIREMENTS:

  • Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related field.
  • Minimum 2 years of experience in SRE, DevOps, Cloud Infrastructure, or Systems Administration roles.
  • Solid hands-on experience with AWS Cloud services including (but not limited to):
  • Compute: EC2, Lambda, ECS, Auto Scaling
  • Networking: VPC, Load Balancers, Route 53
  • Messaging & Storage: SQS, S3, RDS, ElastiCache, SES
  • Monitoring: CloudWatch, X-Ray
  • Proficient in Infrastructure as Code using Terraform and/or CloudFormation.
  • Experience with CI/CD tools (e.g., GitLab CI, Jenkins, CodePipeline, ArgoCD).
  • Strong understanding of Linux and Windows system administration and troubleshooting.
  • Comfortable with one or more scripting/programming languages such as Python, Node.js, Bash, Ruby, or JSON/YAML for automation.
  • Strong grasp of network fundamentals, including DNS, HTTP(S), TLS/SSL, firewalls, and TCP/IP.
  • Experience with containerization and orchestration (Docker, ECS, or Kubernetes is a plus).
  • Familiar with observability tools and incident management best practices.

JOB DESCRIPTION:

  • Design, develop, and maintain Infrastructure as Code (IaC) using tools like Terraform or AWS CloudFormation.
  • Implement and operate reliable, scalable cloud infrastructure primarily on AWS (e.g., EC2, ECS, RDS, S3, Lambda, ElastiCache, SQS, SES, Auto Scaling, Load Balancers).
  • Lead and participate in architecture reviews focusing on reliability, scalability, security, and performance.
  • Develop and manage robust monitoring, alerting, and logging solutions (e.g., CloudWatch, Prometheus, Grafana, ELK, etc.) to detect and resolve issues proactively.
  • Perform incident management, postmortems, root cause analysis, and implement continuous improvement strategies.
  • Collaborate with software engineering teams to improve CI/CD pipelines, deployment automation, and release management.
  • Automate infrastructure operations, reduce manual toil, and improve reliability using scripting (Python, Bash, Node.js, or Ruby).
  • Maintain and troubleshoot environments involving web servers, databases, firewalls, DNS, load balancers, and networking.
  • Ensure systems are compliant with security standards, including patching, hardening, and secure access policies.
  • Provide on-call support, participate in incident rotations.
  • Monitor and maintain service-level objectives (SLOs), SLAs, and error budgets to ensure reliability targets are met.
  • Provide support and solution handling to incident and tickets assigned.

Set alerts for more jobs like Site Reliability Engineer
Set alerts for new jobs by Razer
Set alerts for new Devops jobs in Malaysia
Set alerts for new jobs in Malaysia
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙