Site Reliability Engineer (Mid/Senior)

Razer

4+ Years | Singapore, Singapore (On Site) | Full Time | 2 months ago

Apply Now

Job Summary

Razer is seeking a Site Reliability Engineer (SRE) to join their AI Software team. This role focuses on ensuring the reliability, performance, scalability, and operational excellence of AI products, model-serving infrastructure, and backend API systems. The SRE will collaborate with software and AI teams to automate operations, enhance observability, and streamline deployments in a cloud-scale environment, building resilient systems and supporting AI workloads in production. Razer offers a global mission to revolutionize gaming and a unique, gamer-centric work experience for accelerated personal and professional growth.

Must Have

4+ years of experience in SRE, DevOps, infrastructure engineering, or cloud operations
Experience operating production services with significant availability or scaling demands
Strong knowledge of Web Technologies such as HTTP, REST, SSL, Load Balancers, Web Proxies (NGINX)
Comfortable with Linux and Docker administration
Basic knowledge in AWS, CI/CD (Jenkins), IaC (Terraform), Container Orchestration (AWS ECS or K8s), Version Control (Git), Database (mySQL, noSQL)
Strong ability to code and script (preferably Bash scripting and Python)
Ability to use or quickly pick up a wide variety of open source technologies and automation tools
Understanding of GPU-based workloads and resource scheduling
Familiarity with vector databases, embeddings, and inference pipeline
Comfort with frequent, incremental code testing and deployment
Good analytical skills to debug deployment problems without developer help
Deep hands-on technical expertise and problem-solving skills
Ability to work in a collaborative, technically challenging environment with rapidly changing requirements
Bachelor’s or Master’s degree in computer science, AI or similar discipline

Perks & Benefits

Opportunity to make a global impact
Work across a global team located across 5 continents
Unique, gamer-centric #LifeAtRazer experience
Accelerated personal and professional growth
Certified as a Great Place to Work® in United States and Singapore

Job Description

Job Responsibilities :

We are looking for Site Reliability Engineers (SRE) to join our AI Software team. In this role, you will ensure the reliability, performance, scalability, and operational excellence of AI products, model-serving infrastructure, and backend API systems. You’ll work closely with software engineers, AI teams and release teams to automate operations, enhance observability, and streamline deployments in a cloud-scale environment. This role is ideal for someone who enjoys building resilient systems, solving complex infrastructure problems, and supporting AI workloads in production.

Site Reliability Engineer (Mid/Senior)

Job Summary

Must Have

Perks & Benefits

Job Description

Job Responsibilities :

Essential Duties and Responsibilities

Pre-Requisites :

Qualifications

Education & Experience

Travel Requirements

17 Skills Required For This Role

Similar Jobs

Devops

Software Development & Engineering