Senior Site Reliability Engineer

5 Hours ago • 8 Years + • Devops

Job Summary

Job Description

The Senior Site Reliability Engineer is a strategic and hands-on leader responsible for designing, building, and owning complex infrastructure and deployment systems for live environments. This role involves mentoring junior engineers, collaborating with development teams to architect reliable, scalable, and automated systems, and driving the adoption of robust solutions. Key responsibilities include infrastructure design and maintenance using IaC tools, enhancing CI/CD pipelines, managing observability stacks, and leading reliability reviews and incident responses. The role champions continuous improvement and knowledge sharing across engineering.
Must have:
  • Lead the design, build, and maintenance of core infrastructure using IaC tools.
  • Own provisioning and lifecycle management of production and critical environments.
  • Architect and implement shared infrastructure components.
  • Drive continuous improvements to infrastructure scalability, availability, and performance.
  • Design, own, and enhance CI/CD pipelines to maximize reliability and automation.
  • Establish and enforce best practices for deployment, rollback, and observability.
  • Architect and manage monitoring, alerting, and logging infrastructure.
  • Define, implement, and track SLOs/SLIs for core services.
  • Proactively identify and eliminate single points of failure and performance bottlenecks.
  • Lead reliability reviews, blameless post-incident analyses, and capacity planning.
  • Ensure all systems and processes are accompanied by thorough documentation.
  • Mentor other engineers and contribute to shared knowledge bases.
  • Work closely with SRE Lead to define team strategy and prioritize work.
  • Participate in on-call rotations, acting as an escalation point.
Good to have:
  • Java experience for basic debugging of applications.

Job Details

Job Details:

The Senior Site Reliability Engineer is a leader within the team, responsible for designing, building, and owning the complex infrastructure and deployment systems that underpin our live environments. This role is both hands-on and strategic, requiring deep technical expertise and strong collaboration skills. You will mentor junior engineers and work closely with development teams to architect and implement systems that are reliable, scalable, and highly automated. Senior SREs are expected to drive the adoption of robust, automated solutions and ensure those solutions are well-documented and understood across engineering.

Core Responsibilities

  • Infrastructure Design & Maintenance
  • Lead the design, build, and maintenance of our core infrastructure using infrastructure-as-code (IaC) tools (e.g., Terraform, CloudFormation).
  • Own the provisioning and lifecycle management of production, staging, and other critical environments.
  • Architect and implement shared infrastructure components (e.g., logging, metrics, service mesh, load balancing).
  • Drive continuous improvements to infrastructure scalability, availability, and performance.
  • Act as a key partner to development teams, providing infrastructure primitives and strategic guidance on deployment needs.
  • Deployment Systems & CI/CD
  • Design, own, and enhance our CI/CD pipelines (GitHub Actions, Argo CD) to maximize reliability, velocity, and automation.
  • Establish and enforce best practices across all environments for deployment, rollback, and observability.
  • Partner with developers to architect and streamline the testing and delivery of code to production.
  • Champion the elimination of manual steps in deployment and operations workflows.
  • Reliability, Observability & Tooling
  • Architect and manage our monitoring, alerting, and logging infrastructure (Kube-Prometheus-Grafana stack).
  • Define, implement, and track SLOs/SLIs for core services, holding service owners accountable.
  • Proactively identify and eliminate single points of failure, performance bottlenecks, and sources of instability.
  • Lead reliability reviews, blameless post-incident analyses, and capacity planning initiatives.
  • Perform basic debugging of Java applications to assist development teams in troubleshooting.
  • Documentation & Knowledge Sharing
  • Ensure all systems and processes built or maintained by the SRE team are accompanied by thorough, up-to-date documentation.
  • Mentor other engineers and contribute to shared knowledge bases, runbooks, and developer-facing materials.
  • Lead internal training sessions, walkthroughs, and pairings to cross-train teammates and reduce knowledge silos.
  • Collaboration & Culture
  • Work closely with the SRE Lead to define team strategy, prioritize work, and execute on team goals.
  • Mentor junior team members and act as a technical leader across engineering.
  • Participate in on-call rotations, acting as an escalation point for complex issues.
  • Champion a culture of blameless learning, transparency, and continuous improvement.

Qualifications & Skills

  • Experience: 8+ years in a senior SRE, DevOps, or related infrastructure role.
  • Cloud: Deep, hands-on expertise with AWS, including services like ECS, EKS, Aurora (Postgres), EC2, S3, and VPC.
  • Containers & Orchestration: Strong, production-level proficiency with Kubernetes and Helm. Deep understanding of container runtimes and networking.
  • CI/CD: Extensive experience designing, building, and managing complex CI/CD pipelines using tools like GitHub Actions and Argo CD. Experience with container registries like GHCR.
  • IaC: Expertise in Infrastructure as Code, with strong proficiency in Terraform or CloudFormation.
  • Observability: Proven experience with observability stacks, particularly the Kube-Prometheus-Grafana stack, including custom metric instrumentation and advanced dashboarding.
  • Debugging: Ability to perform basic performance analysis and debugging of applications (Java experience is a strong plus).
  • Leadership: Demonstrated ability to mentor junior engineers, lead technical projects, and drive architectural decisions.
  • Incident Management: Experience leading incident response, conducting blameless post-mortems, and driving resulting action items to completion.

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Hyderabad, Telangana, India

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Devops Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Cubic creates and delivers technology solutions in transportation that make people’s lives easier by simplifying their daily journeys, and defense capabilities that help promote mission success and safety for those who serve their nation. Led by our talented teams around the world, Cubic is committed to solving global challenges through innovation and service to our customers and partners.We have a top-tier portfolio of businesses, including Cubic Transportation Systems (CTS) and Cubic Defense (CD).CTS is an industry-leading integrator of payment and information solutions and related services for intelligent travel applications.CTS delivers integrated systems for transportation and traffic management, delivering tools for travelers to choose the smartest and easiest way to travel and pay for their journeys, and enabling transportation authorities and agencies to manage demand across the entire transportation network. Cubic Defense provides networked Command, Control, Communications, Computers, Cyber, Intelligence, Surveillance and Reconnaissance (C5ISR) solutions, and live, virtual, constructive and game-based training solutions for both U.S. and Allied Forces. These mission-inspired capabilities enable assured multi-domain access; converged digital intelligence; and superior readiness for defense, intelligence, security and commercial missions.

Hyderabad, Telangana, India (On-Site)

Hyderabad, Telangana, India (Hybrid)

Hyderabad, Telangana, India (On-Site)

Sydney, New South Wales, Australia (On-Site)

Sydney, New South Wales, Australia (On-Site)

Salfords, England, United Kingdom (On-Site)

Crawley, England, United Kingdom (On-Site)

Hyderabad, Telangana, India (On-Site)

United Kingdom (Remote)

Hyderabad, Telangana, India (On-Site)

View All Jobs

Get notified when new jobs are added by Cubic corporation

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug