Platform Reliability Operations

18 Hours ago • 5 Years +

Job Summary

Job Description

This role involves analyzing and improving system design to reduce failures and create self-healing systems. Responsibilities include establishing and maintaining robust systems for observability (logging, monitoring, tracing, alerting, testing), collaborating with development partners on architecture and implementation, and working with service engineers to establish SLAs and SLOs. The role requires identifying performance indicators, suggesting solutions in uncertain situations, and managing individual tasks while working in a team. The individual will respond to incidents and act as an SME. This requires strong engineering and coding skills, experience with service-oriented APIs, cloud services (AWS preferred), microservices, and hands-on server software experience.
Must have:
  • 5+ years experience in software development
  • Solid engineering and coding skills
  • Experience building service-oriented APIs and cloud services
  • Proficient in Golang, and Javascript
  • Experience in the Linux environment
  • Understanding of distributed systems
Good to have:
  • Experience with Golang
  • Experience with Typescript
  • Experience with Kubernetes
  • Experience with Terraform
  • Experience with Opentelemetry
  • Experience with Istio
  • Experience with Datadog
  • Experience with Helm Charts
  • Experience with HLS video transcoding

Job Details

This is a critical role with a wide range of responsibilities, including: ● Analyze and improve system design to reduce failure modes and promote self-healing systems ● Establish and maintain robust systems that facilitate observability, encompassing logging, monitoring, distributed tracing, alerting, and offline test tools. ● Work with development partners to shape the architecture, design, and implementations of new and existing systems to enhance their reliability, performance, efficiency, and scalability ● Ability to work both independently as well as part of a geographically dispersed yet integrated team. ● Collaborate with service engineers to establish Service Level Agreements (SLAs) and Service Level Objectives (SLOs) for backend services. ● Being able to identify the indications or cues that demonstrate the effectiveness of an application and having the knowledge to improve or repair its performance ● Ability to assess options and suggest solutions when there is limited or unclear information. This position requires a level of comfort and assurance in dealing with uncertain situations. ● Ability to work seamlessly within a team as well as manage individual tasks ● Respond to emerging incidents, solve critical issues, and follow through with a plan for resolution or future mitigation ● Act as an SME on the Engineering Operations team, partnering with backend services teams and application teams to overcome challenges across all the platforms where we stream our service Qualities / Experience We’re Seeking We believe the right individual will have the following skills and experience to be successful in the role: ● 5+ years experience in software development ● Degree in Computer Science or related or equivalent work experience ● You have solid engineering and coding skills, data structure knowledge, and the ability to write high-performance production-quality code. ● Experience building service-oriented APIs and cloud services (preferable against AWS) ● Experience designing, implementing, and deploying microservices ● Extremely technical hands-on server software experience ● Proficient in Golang, and Javascript, and quick to learn new languages. ● Experience in the Linux environment and a good understanding of its fundamentals and internals: filesystems and modern memory management, threads, and processes, the user/kernel-space divide, etc. ● A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring, and storage systems. ● Working knowledge of the TCP/IP stack, internet routing, and load balancing. ● Grit, drive, and a deep feeling of ownership. Bonus Points for Experience with the following: ● Golang ● Typescript ● Kubernetes ● Terraform ● Opentelemetry ● Istio ● Datadog ● Helm Charts ● HLS video transcoding, distribution & playback ● Experience designing, implementing, and running services in high demand high-traffic environments ● Experience with high-availability services

Similar Jobs

Google - Software Engineer II, AgentSpace

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Day ago
Google - Web Solutions Engineer

Google

Hyderabad, Telangana, India (On-Site)
1 Day ago
Animoca Brands - Senior Full Stack Web3 Engineer - Open Campus

Animoca Brands

Hong Kong (On-Site)
7 Months ago
Rockstar Games - Associate QA Tester: Online Services (Night Shift)

Rockstar Games

Lincoln, England, United Kingdom (On-Site)
1 Month ago
Google - Software Engineer II, Filestore Control Plane

Google

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Day ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ION - Principal Software Engineer, Italy

ION

Milan, Lombardy, Italy (On-Site)
6 Months ago
Meta - Software Engineer, Infrastructure

Meta

Los Angeles, California, United States (Remote)
2 Weeks ago
Fluxon - Senior Software Engineer

Fluxon

Hyderabad, Telangana, India (Remote)
6 Months ago
Nagarro - Staff Engineer, Java

Nagarro

Bengaluru, Karnataka, India (On-Site)
6 Months ago
Hawk Eye Innovations - Test Automation Engineer

Hawk Eye Innovations

Basingstoke, England, United Kingdom (Hybrid)
1 Week ago
Tesla - Senior Application Support Engineer

Tesla

Berlin, Berlin, Germany (On-Site)
2 Months ago
Maxis Studios - Gameplay Software Engineer

Maxis Studios

Vancouver, British Columbia, Canada (Hybrid)
1 Day ago
The Walt Disney Company - Software Engineer II - ABC News Roku

The Walt Disney Company

New York, New York, United States (On-Site)
2 Weeks ago
Relax Gaming  - Front-End Technical Lead

Relax Gaming

Harju County, Estonia (Hybrid)
2 Months ago
Netflix - Full-Stack Engineer (L5)

Netflix

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

Jobs in Costa Rica, Mato Grosso do Sul, Brazil

Google - Software Engineer (For Women in Tech Candidates)

Google

São Paulo, State Of São Paulo, Brazil (On-Site)
5 Months ago
Google - Senior Sales Manager, Black Community Inclusion

Google

São Paulo, State Of São Paulo, Brazil (On-Site)
5 Months ago
Evolution - HR Business Partner

Evolution

São Paulo, State Of São Paulo, Brazil (On-Site)
3 Weeks ago
The Walt Disney Company - Media Supervisor

The Walt Disney Company

São Paulo, State Of São Paulo, Brazil (On-Site)
1 Day ago
Epic Games - Associate External Development Manager

Epic Games

Porto Alegre, State Of Rio Grande Do Sul, Brazil (On-Site)
1 Week ago
Epic Games - UI Artist

Epic Games

Porto Alegre, State Of Rio Grande Do Sul, Brazil (On-Site)
2 Weeks ago
Nissan - Banco de Talentos para Operador e ou Operadora

Nissan

Resende, State Of Rio De Janeiro, Brazil (On-Site)
7 Months ago
Eleven Labs - Sales Development Representative - Brazil

Eleven Labs

Brazil (Remote)
1 Month ago
Google - Senior Customer Solutions Engineer

Google

São Paulo, State Of São Paulo, Brazil (On-Site)
1 Week ago
ByteDance - IT Support Engineer

ByteDance

State Of São Paulo, Brazil (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

LTIMindtree is a global technology consulting and digital solutions company that enables enterprises across industries to reimagine business models, accelerate innovation, and maximize growth by harnessing digital technologies. As a digital transformation partner to more than 700+ clients, LTIMindtree brings extensive domain and technology expertise to help drive superior competitive differentiation, customer experiences, and business outcomes in a converging world. Powered by nearly 90,000 talented and entrepreneurial professionals across 30+ countries, LTIMindtree — a Larsen & Toubro Group company — combines the industry-acclaimed strengths of erstwhile Larsen and Toubro Infotech and Mindtree in solving the most complex business challenges and delivering transformation at scale. For more info, please visit www.ltimindtree.com

Mexico City, Mexico (On-Site)

Mexico City, Mexico City, Mexico (On-Site)

Mexico City, Mexico (On-Site)

Mexico City, Mexico (On-Site)

Mexico City, Mexico (On-Site)

Mexico City, Mexico City, Mexico (On-Site)

Mexico City, Mexico City, Mexico (On-Site)

Mexico City, Mexico City, Mexico (On-Site)

Mexico City, Mexico City, Mexico (On-Site)

Costa Rica, Mato Grosso Do Sul, Brazil (On-Site)

View All Jobs

Get notified when new jobs are added by LTI Mindtree

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug