Senior Site Reliability Engineer

7 Months ago • 5 Years + • DevOps

Job Summary

Job Description

Senior Site Reliability Engineer with 5+ years of experience in Cloud and on-prem SRE design and implementation. Must have expertise in infrastructure automation, distributed systems, and cloud platforms like AWS, Azure, GCP. Strong knowledge of monitoring, logging, and configuration management is essential.
Must have:
  • Infrastructure Automation
  • Distributed Systems
  • Cloud Platforms
  • Monitoring Concepts
Good to have:
  • Containerization Tech
  • Network Experience
  • Elastic Search
  • Prometheus
Perks:
  • Global IT Team
  • Fast-Paced Environment

Job Details

Responsibilities:

About Tencent Overseas IT:
Tencent Overseas IT has the mission to empower Tencent’s rapid global growth with future-ready, global IT platforms, applications, and services. We are chartered to lead the Overseas IT strategy, architecture, roadmap, and execution. Satisfying our internal/external customers and becoming a world-class global IT team are our top aspirations.


We are seeking a Sr. Site Reliability Engineer with extensive cloud and on-prem SRE design and implementation experience.

Duties and Responsibilities:
This senior role will closely work with our internal IT and cloud providers to design the best global SRE architecture and solution in the cloud. This role will also support the studio’s infrastructure, game publishing infrastructure and its evolution to the cloud. Our customers include internal or acquired gaming studios, game publishing services, innovative offices/workplaces, various business groups, and external customers. The work scope will include understanding the internal customers’ business requirements, collecting the technical requirements, developing reference architecture and prototypes based on leading industry best practices, leading implementation, and deployment for global locations, as well as issue troubleshooting when necessary.

For this SRE job, you will:
• Design, implement, and support operational and reliability of large-scale Cloud-enabled studio with a focus on performance at scale, real-time monitoring, logging ,analyzing and alerting
• Maintain services once they go live by measuring and monitoring availability, latency, and overall system health.
• Design and develop robust and scalable products and tools to enhance operational efficiency.
• Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
• Participate in incident response and troubleshooting efforts to minimize downtime and ensure system reliability.
• Maintain project and product documents and knowledge
• Be part of an on-call rotation to support production systems (if needed)


Based in Shanghai, China, this person will work closely with the global IT team, and HQ teams.

Whom we are looking for:

  • A quick learner
  • A positive, self-motivated, and passionate person
  • Independent, insistent, and open-minded.
  • A great team player and both dependable and autonomous.
  • Customer-oriented and could work at a very fast pace.

Requirements:

Requirements

  • 5+ years of experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large-scale private or public cloud systems in Production
  • In-depth knowledge and understanding of monitoring concepts, alert mechanisms, log monitoring, anomaly detections, creation, and setup of dashboards.
  • In-depth knowledge and experience with Elastic Search, Prometheus
  • Expertise in configuration management with a framework such as Ansible, Terraform, Helm
  • Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks
  • Passion for infrastructure and monitoring as code
  • Bachelor’s degree (or higher), Computer Science, Mathematics, or related science or engineering major
  • Solid understanding of cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes).
  • Good understanding and hands on experience in network is plus
  • Bilingual preferred (English, Chinese)

Similar Jobs

Oriserve - Lead DevOps Engineer (5+ Yrs Exp)

Oriserve

Noida, Uttar Pradesh, India (On-Site)
5 Months ago
Fractal - DevOps - Lead

Fractal

Mumbai, Maharashtra, India (On-Site)
5 Months ago
Epic Games - Senior DevOps Programmer

Epic Games

Montreal, Quebec, Canada (On-Site)
1 Month ago
ByteDance - Senior Software Development Engineer - Cloud Native Databases

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
GoReel - Manual QA Engineer

GoReel

Bratislava, Bratislava Region, Slovakia (Hybrid)
1 Week ago
Google - Technical Sales Specialist, Platform, Public Sector, Google Cloud

Google

Ottawa, Ontario, Canada (On-Site)
1 Week ago
Modio - Cloud Systems Engineer

Modio

Prahran, Victoria, Australia (On-Site)
4 Weeks ago
Velotio Technologies - Senior DevOps Engineer (AWS)

Velotio Technologies

Maharashtra, India (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Site Reliability Engineer (Traffic), Infrastructure Engineering

ByteDance

Singapore (On-Site)
5 Months ago
Microsoft - Principal Software Engineering Manager

Microsoft

Vancouver, British Columbia, Canada (On-Site)
4 Days ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

Boston, Massachusetts, United States (On-Site)
1 Week ago
Electronic Arts - DevOps Engineer II

Electronic Arts

Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia (On-Site)
3 Weeks ago
Truecaller - Senior MLOps Engineer

Truecaller

Stockholm, Stockholm County, Sweden (On-Site)
5 Months ago
ION - Senior Linux Systems Administrator - Somerset, NJ

ION

Clifton, New Jersey, United States (Hybrid)
6 Months ago
Rapt Studio - Senior Designer (Interior Design/Architecture)

Rapt Studio

Los Angeles, California, United States (Hybrid)
6 Months ago
Info Stretch - Java Support Software Engineer

Info Stretch

Mexico (On-Site)
5 Months ago
Tencent - Senior Backend Developer - Global Realistic 3A Action Game

Tencent

Shenzhen, Guangdong Province, China (On-Site)
1 Month ago
ByteDance - SRE and DevOps Tech Lead - Edge Cloud Infrastructure - London

ByteDance

London, England, United Kingdom (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Shanghai, Shanghai, China

Tencent - Overseas Game Publishing Business Manager

Tencent

Shenzhen, Guangdong Province, China (On-Site)
1 Month ago
Tencent - 2D Mobile Game Community and Content Operator

Tencent

Shenzhen, Guangdong Province, China (On-Site)
4 Months ago
Virtuos - Management Trainee

Virtuos

China (On-Site)
1 Month ago
Microsoft - Senior Researcher

Microsoft

Beijing, Beijing, China (On-Site)
1 Week ago
Ubisoft - UI Programmer

Ubisoft

Shanghai, Shanghai, China (On-Site)
2 Months ago
NVIDIA - Senior Networking Architect

NVIDIA

Beijing, Beijing, China (On-Site)
1 Week ago
Tencent - Senior Brand Manager, NIKKE (China Server)

Tencent

Shenzhen, Guangdong Province, China (On-Site)
1 Month ago
Canva - Quality Engineer - Internationalization

Canva

Wuhan, Hubei, China (Remote)
1 Month ago
Razer - Lead Site Reliability Engineer

Razer

Shanghai, Shanghai, China (On-Site)
6 Months ago
Tencent - Senior Channel Marketing Manager - PUBG Mobile

Tencent

Shenzhen, Guangdong Province, China (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Anavation - Senior Cloud Developer

Anavation

Virginia, United States (Remote)
1 Day ago
Google - Senior Product Manager, DevOps, Google Cloud

Google

New York, New York, United States (On-Site)
1 Week ago
ION - Cloud Engineer Kubernetes

ION

Italy (Hybrid)
6 Months ago
Bounteous - Senior Cloud Engineer - BOT

Bounteous

India (Remote)
6 Months ago
Revolgy - L1 Cloud Associate

Revolgy

(Remote)
1 Day ago
Google - Customer Engineer, Platform, Public Sector, Google Cloud

Google

Sydney, New South Wales, Australia (On-Site)
1 Week ago
Rackspace Technology - AWS Engineer IV-IN (R-20541)

Rackspace Technology

Gurugram, Haryana, India (Remote)
4 Months ago
Tesla - Site Reliability Engineer, Energy Software

Tesla

North Holland, Netherlands (On-Site)
2 Months ago
Google - Staff Software Engineer, Site Reliability Engineering

Google

Sydney, New South Wales, Australia (On-Site)
1 Week ago
Inworld AI - Staff Cloud DevOps/Site Reliability Engineer (SRE) - USA

Inworld AI

Mountain View, California, United States (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life of people around the world.


Founded in 1998 with its headquarters in Shenzhen, China, Tencent's guiding principle is to use technology for good. Our communication and social services connect more than one billion people around the world, helping them to keep in touch with friends and family, access transportation, pay for daily necessities, and even be entertained.


Tencent also publishes some of the world's most popular video games and other high-quality digital content, enriching interactive entertainment experiences for people around the globe.


Tencent also offers a range of services such as cloud computing, advertising, FinTech, and other enterprise services to support our clients' digital transformation and business growth.


Tencent has been listed on the Stock Exchange of Hong Kong since 2004.

View All Jobs

Get notified when new jobs are added by Tencent

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug