Site Reliability Engineer

Cubic corporation

3+ Years | Hyderabad, Telangana, India (On Site) | Full Time | 2 months ago

Apply Now

Job Summary

The Junior Site Reliability Engineer assists in designing, building, and maintaining infrastructure and deployment systems for live environments. This hands-on role involves collaborating with development teams and senior SREs to ensure reliable, scalable, and well-instrumented systems. Responsibilities include applying best practices for robust, automated solutions, ensuring repeatability, and documenting all contributions for knowledge-sharing within the engineering team. The role focuses on improving infrastructure, CI/CD pipelines, observability, and participating in on-call rotations.

Must Have

Assist in building and maintaining infrastructure using infrastructure-as-code (IaC) tools.
Support the provisioning and lifecycle management of production, staging, and other critical environments.
Help implement shared infrastructure components (e.g., logging, metrics, service mesh, load balancing).
Support and help extend CI/CD pipelines (GitHub Actions, Argo CD).
Assist in the implementation and maintenance of our monitoring, alerting, and logging infrastructure (Kube-Prometheus-Grafana stack).
Ensure that all systems and processes you work on are accompanied by thorough, up-to-date documentation.
3+ years in a DevOps, SRE, or related role.
Basic understanding of cloud computing concepts, with some hands-on experience in AWS.
Familiarity with Docker and a foundational understanding of Kubernetes concepts.
Proficiency in at least one scripting language (e.g., Bash, Python).

Good to Have

Experience with AWS ECS.
Familiarity with Argo CD.
Exposure to Prometheus and Grafana.

Job Description

Business Unit:

Cubic Transportation Systems

Company Details:

When you join Cubic, you become part of a company that creates and delivers technology solutions in transportation to make people’s lives easier by simplifying their daily journeys, and defense capabilities to help promote mission success and safety for those who serve their nation. Led by our talented teams around the world, Cubic is committed to solving global issues through innovation and service to our customers and partners.

We have a top-tier portfolio of businesses, including Cubic Transportation Systems (CTS) and Cubic Defense (CD). Explore more on Cubic.com.

Job Details:

The Junior Site Reliability Engineer is responsible for assisting in the design, build, and maintenance of the infrastructure and deployment systems that underpin our live environments. This role is hands-on and highly collaborative, working closely with development teams and senior SREs to ensure our systems are reliable, scalable, and well-instrumented. Junior SREs are expected to learn and apply best practices in building robust, automated solutions, and to ensure their work is repeatable and understandable by others. Every contribution should be accompanied by documentation to support knowledge-sharing within the team and across engineering.

Core Responsibilities

Infrastructure Design & Maintenance
Assist in building and maintaining infrastructure using infrastructure-as-code (IaC) tools (e.g., Terraform, CloudFormation).
Support the provisioning and lifecycle management of production, staging, and other critical environments.
Help implement shared infrastructure components (e.g., logging, metrics, service mesh, load balancing).
Contribute to improving infrastructure scalability, availability, and performance under the guidance of senior engineers.
Collaborate with development teams to provide infrastructure support for their deployment needs.
Deployment Systems & CI/CD
Support and help extend CI/CD pipelines (GitHub Actions, Argo CD) to improve reliability and automation of deployments.
Help promote consistency and best practices across environments for deployment, rollback, and observability.
Work with developers to streamline testing and delivery of code to production.
Assist in reducing manual steps in the deployment and operations workflows.
Reliability, Observability & Tooling
Assist in the implementation and maintenance of our monitoring, alerting, and logging infrastructure (Kube-Prometheus-Grafana stack).
Help track SLOs/SLIs for core services in partnership with service owners.
Learn to identify and help eliminate single points of failure, performance bottlenecks, and sources of instability.
Participate in reliability reviews and post-incident analysis.
Documentation & Knowledge Sharing
Ensure that all systems and processes you work on are accompanied by thorough, up-to-date documentation.
Contribute to shared knowledge bases, runbooks, and developer-facing onboarding materials.
Participate in internal training sessions and pairings to learn from teammates.
Collaboration & Culture
Work closely with the SRE Lead and other team members to execute work aligned with team goals.
Engage constructively with other teams across engineering.
Participate in on-call rotations with strong support from senior members.
Embrace a culture of blameless learning, transparency, and continuous improvement.

Qualifications & Skills

Experience: 3+ years in a DevOps, SRE, or related role.
Cloud: Basic understanding of cloud computing concepts, with some hands-on experience in AWS.
Containers & Orchestration: Familiarity with Docker and a foundational understanding of Kubernetes concepts. Experience with AWS ECS is a plus.
CI/CD: Exposure to CI/CD principles and tools like GitHub Actions. Familiarity with Argo CD is a bonus.
IaC: Some experience with or exposure to Infrastructure as Code tools like Terraform or CloudFormation.
Scripting: Proficiency in at least one scripting language (e.g., Bash, Python).
Observability: A basic understanding of monitoring and logging. Exposure to Prometheus and Grafana is desirable.
Collaboration: Strong communication skills and a desire to learn and work within a team.
Problem Solving: An enthusiastic and curious approach to solving technical challenges.

Worker Type:

Employee

17 Skills Required For This Role

Communication Github Talent Acquisition Game Texts Aws Service Mesh Load Balancing Argo Cd Prometheus Terraform Grafana Ci Cd Docker Kubernetes Python Github Actions Bash