Staff Engineer - DevOps Site Reliability

1 Day ago • All levels • DevOps

Job Summary

Job Description

Experienced L3 SRE engineer needed for a business-critical SaaS application. Responsibilities include L3 support across the full stack (infra, backend, frontend), automating SRE tools, proactive monitoring, handling business pressure, communicating effectively with various teams and end-users, incident/problem management, and working with multitenant applications. Requires strong understanding of networking, CI/CD, Python, and AWS services (especially EKS, serverless technologies, and databases). Experience with Kubernetes, Prometheus, and monitoring/logging tools is essential.
Must have:
  • EKS
  • Github Actions
  • Python (Strong)
  • Kubernetes (Expert)
  • Prometheus
  • L3 support across full-stack
  • Automation of SRE tools
  • Incident/Problem Management
Good to have:
  • GenAI/LLM application experience
  • AWS Managed Services
  • FastAPI and NextJS
  • Websockets
  • Cloud security concepts
  • Terraform

Job Details

Company Description

We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale — across all devices and digital mediums, and our people exist everywhere in the world (19000+ experts across 33 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in!

Job Description

  • Experienced L3 SRE engineer based on business-critical SaaS application.
  • Capacity to L3 across the full stack including infra, backend and front-end, before escalation to engineering business unit.
  • Capacity to automate SRE tools to provide proactive.
  • L3 support, close to our tech monitoring strategy.
  • Capacity to work under business pressure for business critical applications.
  • Capacity to communicate accordingly with L1,L2, Engineering, Product managers, leadership and end-users during troubleshooting.
  • Capacity to communicate accordingly.
  • Experience with incident and problem management.
  • Experience with multitenant applications.
  • Solid understanding of networking concepts(TCP/IP, DNS, Routing, etc) like VPCs, subnets, firewalls, and load balancing, TLS and SSL.
  • Experience with CI/CD pipelines (e.g., Jenkins, Github Actions) & version control.
  • Python, react/next.
  • Monitoring and logging to analyze & track resource utilization, application performance, and identify potential issues, Grafana, Prometheus, Loki or ELK.
  • Experience with AWS, particularly EKS, serverless, queue & various databases.
  • Solid knowledge Kubernetes.

Qualifications

Must have Skills: EKS, Github Actions, Python (Strong), Kubernetes (Expert), Prometheus.

Good to Have Skills: 

  • Previous experience building a user-facing GenAI/LLM software application.
  • Security best practices in cloud environments. - AWS Managed Services (RDS, Batch, Lambda, Fargate, Step Functions, SQS/SNS, etc.).
  • FastAPI and NextJS experience (if we're still using the latter).
  • Websockets, Server-Side Events, Pub/Sub (RabbitMQ, Kafka, etc.).
  • Cloud security concepts (IAM, access control).
  • Terraform experience. 

Similar Jobs

Bazaar Voice - Staff MLOps Engineer

Bazaar Voice

Belfast, Northern Ireland, United Kingdom (Hybrid)
3 Months ago
Warner Bros Discovery - Senior Software Engineer - GAQ Team - Bangalore

Warner Bros Discovery

Bengaluru, Karnataka, India (On-Site)
2 Months ago
CloudHire - Frontend Testing Engineer

CloudHire

London, England, United Kingdom (Hybrid)
3 Months ago
Bazaar Voice - Senior Staff Cloud Platform Engineer

Bazaar Voice

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
Warner Bros Games - Senior Machine Learning Engineer

Warner Bros Games

(Hybrid)
1 Week ago
Rackspace Technology - Senior DataDog Developer

Rackspace Technology

Gurugram, Haryana, India (Remote)
3 Months ago
CrazyLabs - DevOps Engineer

CrazyLabs

Skopje, Greater Skopje, North Macedonia (On-Site)
3 Weeks ago
Integral Ad Science - Senior Site Reliability Engineer

Integral Ad Science

Pune, Maharashtra, India (Hybrid)
3 Months ago
Ema Unlimited - Platform Engineer

Ema Unlimited

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
bosh group india - 2024_MS_EDE3_XC_SRE_DataEngineering

bosh group india

Bengaluru, Karnataka, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Voodoo - Senior Backend Engineer Golang - BeReal

Voodoo

Paris, Île-de-France, France (On-Site)
5 Months ago
Grizmo Labs 🌐 - DevOps Engineer

Grizmo Labs 🌐

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
Warner Bros Discovery - Senior Software Engineer - Machine Learning

Warner Bros Discovery

New York, New York, United States (On-Site)
1 Month ago
Riot Games - Senior Software Engineer, Services - Esports Platform & Experiences

Riot Games

Dublin, County Dublin, Ireland (On-Site)
2 Months ago
Gala - Senior Infrastructure Platform Engineer

Gala

Green Bay, Wisconsin, United States (On-Site)
6 Months ago
Hitachi - Azure Infra Consultant

Hitachi

Pune, Maharashtra, India (Remote)
3 Months ago
Take-Two Interactive - Senior Data Platform Engineer

Take-Two Interactive

Bengaluru, Karnataka, India (On-Site)
1 Month ago
LeoVegas - Node.js Engineer

LeoVegas

Málaga, Andalusia, Spain (On-Site)
3 Months ago
UXBERT Labs - Senior Backend Developer (Node.js)

UXBERT Labs

Riyadh, Riyadh Province, Saudi Arabia (Hybrid)
1 Week ago
Inworld AI - Staff Cloud DevOps/Site Reliability Engineer (SRE) - USA

Inworld AI

Mountain View, California, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Colombia

Salesforce - Technical Architect

Salesforce

Colombia (Remote)
3 Days ago
Neostella - React Developer

Neostella

Bogotá, Bogota, Colombia (Remote)
3 Months ago
CAE - Piloto Instructor de  Simulador de vuelo A320

CAE

Bogotá, Bogota, Colombia (On-Site)
3 Months ago
Rush Street Interactive - Growth Hacker

Rush Street Interactive

Bogotá, Bogota, Colombia (On-Site)
3 Weeks ago
Nagarro - Associate Staff Engineer - Data Engineering

Nagarro

Colombia (Remote)
4 Days ago
Anthology  Inc  - Senior MLOps Engineer

Anthology Inc

Bogotá, Bogota, Colombia (Remote)
3 Months ago
Amber - Recruiter (Project Based)

Amber

Bogotá, Bogota, Colombia (On-Site)
3 Months ago
Token Metrics - Tech Lead - Crypto & AI (Remote - Astra)

Token Metrics

Medellín, Antioquia, Colombia (Remote)
3 Months ago
Neostella - Implementation Specialist

Neostella

Colombia (Hybrid)
1 Month ago
Rush Street Interactive - Staff Engineer

Rush Street Interactive

Colombia (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

WebPT - Lead, DevOps Engineer

WebPT

Hyderabad, Telangana, India (Hybrid)
4 Months ago
Saviynt - Associate Principal Engineer/ Principal Engineer Support Operations

Saviynt

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
Ubisoft Blue Byte - Site Reliability Engineer [Game Security]

Ubisoft Blue Byte

Düsseldorf, North Rhine-Westphalia, Germany (On-Site)
1 Week ago
Wind River Systems - Senior Member of Technical Staff – Cloud Platform Development (OpenStack)

Wind River Systems

Ottawa, Ontario, Canada (On-Site)
3 Months ago
PwC - Senior Associate | Devops SRE

PwC

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Netflix - Distributed Systems Engineer (L5) - Platform Engineering

Netflix

Los Gatos, California, United States (On-Site)
1 Month ago
Pragma - Service Operations Specialist

Pragma

United States (Remote)
4 Months ago
Krafton  - [Infra Div.] Publishing DevOps (3년 이상)

Krafton

Seoul, South Korea (On-Site)
2 Months ago
Zeta - Engineering Manager - Cloud Security (DevSecOps)

Zeta

Bengaluru, Karnataka, India (On-Site)
3 Months ago
ByteDance - Product Manager - Infrastructure Platform

ByteDance

Singapore (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded