Platform Reliability Operations

2 Days ago • 5 Years +

Job Summary

Job Description

This role involves analyzing and improving system design to reduce failures and create self-healing systems. Responsibilities include establishing and maintaining robust systems for observability (logging, monitoring, tracing, alerting, testing), collaborating with development partners on architecture and implementation, and working with service engineers to establish SLAs and SLOs. The role requires identifying performance indicators, suggesting solutions in uncertain situations, and managing individual tasks while working in a team. The individual will respond to incidents and act as an SME. This requires strong engineering and coding skills, experience with service-oriented APIs, cloud services (AWS preferred), microservices, and hands-on server software experience.
Must have:
  • 5+ years experience in software development
  • Solid engineering and coding skills
  • Experience building service-oriented APIs and cloud services
  • Proficient in Golang, and Javascript
  • Experience in the Linux environment
  • Understanding of distributed systems
Good to have:
  • Experience with Golang
  • Experience with Typescript
  • Experience with Kubernetes
  • Experience with Terraform
  • Experience with Opentelemetry
  • Experience with Istio
  • Experience with Datadog
  • Experience with Helm Charts
  • Experience with HLS video transcoding

Job Details

This is a critical role with a wide range of responsibilities, including: ● Analyze and improve system design to reduce failure modes and promote self-healing systems ● Establish and maintain robust systems that facilitate observability, encompassing logging, monitoring, distributed tracing, alerting, and offline test tools. ● Work with development partners to shape the architecture, design, and implementations of new and existing systems to enhance their reliability, performance, efficiency, and scalability ● Ability to work both independently as well as part of a geographically dispersed yet integrated team. ● Collaborate with service engineers to establish Service Level Agreements (SLAs) and Service Level Objectives (SLOs) for backend services. ● Being able to identify the indications or cues that demonstrate the effectiveness of an application and having the knowledge to improve or repair its performance ● Ability to assess options and suggest solutions when there is limited or unclear information. This position requires a level of comfort and assurance in dealing with uncertain situations. ● Ability to work seamlessly within a team as well as manage individual tasks ● Respond to emerging incidents, solve critical issues, and follow through with a plan for resolution or future mitigation ● Act as an SME on the Engineering Operations team, partnering with backend services teams and application teams to overcome challenges across all the platforms where we stream our service Qualities / Experience We’re Seeking We believe the right individual will have the following skills and experience to be successful in the role: ● 5+ years experience in software development ● Degree in Computer Science or related or equivalent work experience ● You have solid engineering and coding skills, data structure knowledge, and the ability to write high-performance production-quality code. ● Experience building service-oriented APIs and cloud services (preferable against AWS) ● Experience designing, implementing, and deploying microservices ● Extremely technical hands-on server software experience ● Proficient in Golang, and Javascript, and quick to learn new languages. ● Experience in the Linux environment and a good understanding of its fundamentals and internals: filesystems and modern memory management, threads, and processes, the user/kernel-space divide, etc. ● A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring, and storage systems. ● Working knowledge of the TCP/IP stack, internet routing, and load balancing. ● Grit, drive, and a deep feeling of ownership. Bonus Points for Experience with the following: ● Golang ● Typescript ● Kubernetes ● Terraform ● Opentelemetry ● Istio ● Datadog ● Helm Charts ● HLS video transcoding, distribution & playback ● Experience designing, implementing, and running services in high demand high-traffic environments ● Experience with high-availability services

Similar Jobs

Meta - Software Engineer, Machine Learning

Meta

United States (Remote)
2 Weeks ago
N-iX - Senior Full Stack Engineer (.NET+React)

N-iX

Argentina (Remote)
1 Week ago
Nagarro - Senior Staff Engineer, QA Automation

Nagarro

India (Remote)
6 Months ago
Aisera Jobs - Professional Services Engineer

Aisera Jobs

Palo Alto, California, United States (On-Site)
1 Day ago
Knuddels - Senior Web Developer

Knuddels

Baden-Württemberg, Germany (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

version 1 - Senior Outsystems Developer

version 1

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Agara labs - Software Development Engineer - R&D

Agara labs

(Remote)
22 Hours ago
Daybreak Game Company LLC - Software Development Engineer (Server Developer)

Daybreak Game Company LLC

Renton, Washington, United States (Hybrid)
5 Months ago
ByteDance - Senior Software Engineer - Spatial Web

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Google - Senior Software Engineer, Full Stack

Google

Bengaluru, Karnataka, India (On-Site)
1 Week ago
Nagarro - Staff Engineer, QA Automation

Nagarro

Noida, Uttar Pradesh, India (On-Site)
6 Months ago
The Walt Disney Company - Direct Channel Strategy & Activation Campaign Developer

The Walt Disney Company

Celebration, Florida, United States (Hybrid)
3 Days ago
Nagarro - Engineer

Nagarro

Mexico (Remote)
6 Months ago
Animoca Brands - Web3 Engineer

Animoca Brands

Hong Kong, Hong Kong (Hybrid)
1 Month ago
Brave Group (Language barrier) - Server-Side Engineer

Brave Group (Language barrier)

(Remote)
1 Day ago

Get notifed when new similar jobs are uploaded

Jobs in Costa Rica, Mato Grosso do Sul, Brazil

Fanatee - Product Business Analyst

Fanatee

São Paulo, State Of São Paulo, Brazil (Hybrid)
2 Months ago
Epic Games - Gameplay Engineer

Epic Games

Porto Alegre, State Of Rio Grande Do Sul, Brazil (On-Site)
2 Weeks ago
Capco - Business Intelligence Analyst

Capco

Rio De Janeiro, State Of Rio De Janeiro, Brazil (On-Site)
23 Hours ago
Google - Tech Lead, Software Engineering (For Women in Tech Candidates)

Google

São Paulo, State Of São Paulo, Brazil (On-Site)
5 Months ago
ARVORE Immersive Experiences - Game Designer

ARVORE Immersive Experiences

São Paulo, State Of São Paulo, Brazil (Remote)
4 Weeks ago
Google - Software Engineering Manager II, Mobile (iOS), Core

Google

State Of Minas Gerais, Brazil (On-Site)
1 Week ago
Google - Project Management Apprenticeship, 2025

Google

São Paulo, State Of São Paulo, Brazil (On-Site)
3 Days ago
Cadence - Product Validation Engineer II: Quality Assurance

Cadence

Belo Horizonte, State Of Minas Gerais, Brazil (Hybrid)
7 Months ago
Haleon - Jr. Quality Analyst

Haleon

State Of Rio Grande Do Sul, Brazil (On-Site)
6 Hours ago
PlayStation Global - Commercial Manager

PlayStation Global

São Paulo, State Of São Paulo, Brazil (Hybrid)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

LTIMindtree is a global technology consulting and digital solutions company that enables enterprises across industries to reimagine business models, accelerate innovation, and maximize growth by harnessing digital technologies. As a digital transformation partner to more than 700+ clients, LTIMindtree brings extensive domain and technology expertise to help drive superior competitive differentiation, customer experiences, and business outcomes in a converging world. Powered by nearly 90,000 talented and entrepreneurial professionals across 30+ countries, LTIMindtree — a Larsen & Toubro Group company — combines the industry-acclaimed strengths of erstwhile Larsen and Toubro Infotech and Mindtree in solving the most complex business challenges and delivering transformation at scale. For more info, please visit www.ltimindtree.com

Mexico City, Mexico City, Mexico (On-Site)

Madrid, Community Of Madrid, Spain (Hybrid)

Madrid, Community Of Madrid, Spain (Hybrid)

Mexico City, Mexico City, Mexico (On-Site)

Mexico City, Mexico (On-Site)

Mexico City, Mexico City, Mexico (On-Site)

Mexico City, Mexico City, Mexico (On-Site)

Mexico City, Mexico City, Mexico (On-Site)

View All Jobs

Get notified when new jobs are added by LTI Mindtree

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug