Platform Reliability Operations

1 Month ago • 5 Years +

Job Summary

Job Description

This role involves analyzing and improving system design to reduce failures and create self-healing systems. Responsibilities include establishing and maintaining robust systems for observability (logging, monitoring, tracing, alerting, testing), collaborating with development partners on architecture and implementation, and working with service engineers to establish SLAs and SLOs. The role requires identifying performance indicators, suggesting solutions in uncertain situations, and managing individual tasks while working in a team. The individual will respond to incidents and act as an SME. This requires strong engineering and coding skills, experience with service-oriented APIs, cloud services (AWS preferred), microservices, and hands-on server software experience.
Must have:
  • 5+ years experience in software development
  • Solid engineering and coding skills
  • Experience building service-oriented APIs and cloud services
  • Proficient in Golang, and Javascript
  • Experience in the Linux environment
  • Understanding of distributed systems
Good to have:
  • Experience with Golang
  • Experience with Typescript
  • Experience with Kubernetes
  • Experience with Terraform
  • Experience with Opentelemetry
  • Experience with Istio
  • Experience with Datadog
  • Experience with Helm Charts
  • Experience with HLS video transcoding

Job Details

This is a critical role with a wide range of responsibilities, including: ● Analyze and improve system design to reduce failure modes and promote self-healing systems ● Establish and maintain robust systems that facilitate observability, encompassing logging, monitoring, distributed tracing, alerting, and offline test tools. ● Work with development partners to shape the architecture, design, and implementations of new and existing systems to enhance their reliability, performance, efficiency, and scalability ● Ability to work both independently as well as part of a geographically dispersed yet integrated team. ● Collaborate with service engineers to establish Service Level Agreements (SLAs) and Service Level Objectives (SLOs) for backend services. ● Being able to identify the indications or cues that demonstrate the effectiveness of an application and having the knowledge to improve or repair its performance ● Ability to assess options and suggest solutions when there is limited or unclear information. This position requires a level of comfort and assurance in dealing with uncertain situations. ● Ability to work seamlessly within a team as well as manage individual tasks ● Respond to emerging incidents, solve critical issues, and follow through with a plan for resolution or future mitigation ● Act as an SME on the Engineering Operations team, partnering with backend services teams and application teams to overcome challenges across all the platforms where we stream our service Qualities / Experience We’re Seeking We believe the right individual will have the following skills and experience to be successful in the role: ● 5+ years experience in software development ● Degree in Computer Science or related or equivalent work experience ● You have solid engineering and coding skills, data structure knowledge, and the ability to write high-performance production-quality code. ● Experience building service-oriented APIs and cloud services (preferable against AWS) ● Experience designing, implementing, and deploying microservices ● Extremely technical hands-on server software experience ● Proficient in Golang, and Javascript, and quick to learn new languages. ● Experience in the Linux environment and a good understanding of its fundamentals and internals: filesystems and modern memory management, threads, and processes, the user/kernel-space divide, etc. ● A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring, and storage systems. ● Working knowledge of the TCP/IP stack, internet routing, and load balancing. ● Grit, drive, and a deep feeling of ownership. Bonus Points for Experience with the following: ● Golang ● Typescript ● Kubernetes ● Terraform ● Opentelemetry ● Istio ● Datadog ● Helm Charts ● HLS video transcoding, distribution & playback ● Experience designing, implementing, and running services in high demand high-traffic environments ● Experience with high-availability services

Similar Jobs

Performio - Senior Software Engineer

Performio

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
ZeniMax Media - Senior Application Security Engineer

ZeniMax Media

Rockville, Maryland, United States (On-Site)
3 Weeks ago
Eccentric - JavaScript Developer

Eccentric

Mumbai, Maharashtra, India (On-Site)
1 Month ago
Humanitarian Operations - Unity 3D Developer

Humanitarian Operations

Edinburgh, Scotland, United Kingdom (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

USE Insider - Senior Full Stack Engineer

USE Insider

Türkiye (Remote)
7 Months ago
ByteDance - Software Engineer - Low-code Platform

ByteDance

Singapore (On-Site)
3 Months ago
Volksbyte - PHP Developer

Volksbyte

Dhaka, Dhaka Division, Bangladesh (Remote)
1 Year ago
Critical mass - Creative Engineer

Critical mass

San Jose, California, United States (On-Site)
2 Weeks ago
Sigma Software - Threat Detection and Script Engineer

Sigma Software

Argentina (On-Site)
2 Weeks ago
Hitachi - Share Point Developer

Hitachi

Pune, Maharashtra, India (Remote)
7 Months ago
EveryMatrix - Middle JavaScript Developer

EveryMatrix

Chiang Mai, Thailand (Hybrid)
3 Weeks ago
Ajmera Infotech - Senior React Developer

Ajmera Infotech

Bengaluru, Karnataka, India (On-Site)
10 Months ago
Progress carrers - Junior Software Engineer - AI focus

Progress carrers

Raleigh, North Carolina, United States (Hybrid)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Costa Rica, Mato Grosso do Sul, Brazil

Magic Media - Senior 3D Creature Artist

Magic Media

Rio De Janeiro, State Of Rio De Janeiro, Brazil (Remote)
1 Month ago
PlayStation Global - Manager, Commercial Planning

PlayStation Global

São Paulo, State Of São Paulo, Brazil (Hybrid)
1 Month ago
Epic Games - FX Outsource Lead

Epic Games

Porto Alegre, State Of Rio Grande Do Sul, Brazil (On-Site)
1 Month ago
Epic Games - Art Producer

Epic Games

Porto Alegre, State Of Rio Grande Do Sul, Brazil (On-Site)
1 Month ago
CyberArk - Enterprise Customer Success Manager

CyberArk

Brazil (On-Site)
1 Month ago
Valeo - Logistics Trainee

Valeo

Itatiba, State Of São Paulo, Brazil (On-Site)
1 Week ago
Magic Media - Senior 3D Environment Artist (Unreal)

Magic Media

São Paulo, State Of São Paulo, Brazil (Remote)
2 Months ago
Epic Games - Senior Engine Programmer

Epic Games

Porto Alegre, State Of Rio Grande Do Sul, Brazil (On-Site)
2 Months ago
WebTech Corporation - Internship in Stock

WebTech Corporation

State Of Minas Gerais, Brazil (On-Site)
2 Weeks ago
MIQ Digital - Marketing Manager

MIQ Digital

State Of São Paulo, Brazil (Hybrid)
1 Week ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!