Senior/Expert Engineer, Site Reliability Engineering (SRE)

Garena

Job Summary

The Senior/Expert Engineer, Site Reliability Engineering (SRE) will be responsible for ensuring product scalability, stability, and performance by deep diving into development lines and understanding application mechanisms. Key duties include setting up, managing, and maintaining applications, middleware, and big-data services, performing deployments, fine-tuning, and troubleshooting. The role also involves designing automation, capacity management, full-chain stress testing, and preparing operation documentation. Candidates should have a strong background in Linux, Kubernetes, networking, and programming with Bash, Python, or Go.

Must Have

  • Deep dive into development lines, learning and understanding the mechanism of every application component
  • Promote product scalability, stability and performance
  • Setup, manage and maintain product/middleware/big-data applications and services
  • Perform regular and ad-hoc server-side deployments, performance fine-tuning and troubleshooting
  • Design and develop automations for workflow
  • Capacity and Resource management
  • Responsible for the full-chain stress test to enhance the performance and remove redundancy of applications
  • Prepare routine operation documentation
  • Bachelor’s or higher degree in Computer Science, Engineering, Information Systems or related fields
  • Minimum 3 years of relevant full-time working experience in Site Reliability Engineer roles
  • Extensive and hands-on knowledge with Linux operating systems (Ubuntu, CentOS, etc.)
  • Extensive and hands-on knowledge with Kubernetes and the eco-system
  • Knowledge of Computer Network (TCP/IP, DNS, etc.) and OS
  • Hands-on experience with at least one of the programming languages: Bash, Python, Go
  • Strong analytical and problem-solving skills with the ability to thrive under high-pressure situations
  • Fast learning ability and a good team player
  • Detailed-oriented, cautious and prudent

Job Description

Job Description

  • Deep dive into development lines, learning and understanding the mechanism of every application component, and promoting product scalability, stability and performance.
  • Setup, manage and maintain product/middleware/big-data applications and services.
  • Perform regular and ad-hoc server-side deployments, performance fine-tuning and troubleshooting.
  • Design and develop automations for our workflow.
  • Capacity and Resource management.
  • Responsible for the full-chain stress test to enhance the performance and remove redundancy of applications.
  • Prepare routine operation documentation.

Job Requirements

  • Bachelor’s or higher degree in Computer Science, Engineering, Information Systems or related fields.
  • Minimum 3 years of relevant full-time working experience in Site Reliability Engineer roles
  • Extensive and hands-on knowledge with Linux operating systems (Ubuntu, CentOS, etc.).
  • Extensive and hands-on knowledge with Kubernetes and the eco-system.
  • Knowledge of Computer Network(TCP/IP, DNS, etc.) and OS.
  • Hands-on experience with at least one of the programming languages: Bash, Python, Go.
  • Strong analytical and problem-solving skills with the ability to thrive under high-pressure situations.
  • Fast learning ability and a good team player.
  • Detailed-oriented, cautious and prudent.

8 Skills Required For This Role

Problem Solving Team Player Game Texts Dns Linux Kubernetes Python Bash

Similar Jobs