Site Reliability Engineer (SRE) III

2 Months ago • 5-5 Years • DevOps

About the job

Job Description

We're seeking an experienced SRE with a passion for automation and tooling to build and maintain reliable, scalable cloud infrastructure. You'll work closely with cross-functional teams to ensure high availability of critical services, integrate automation solutions, and champion best practices.
Must have:
  • VMWARE experience
  • Linux experience
  • Cloud platforms
  • Automation tools
Good to have:
  • Incident management
  • Container orchestration
  • Monitoring tools
  • Scripting languages
Not hearing back from companies?
Unlock the secrets to a successful job application and accelerate your journey to your next opportunity.

About the job

Site Reliability Engineer (SRE) - Automation & Tooling

Position Overview

We are looking for a talented and motivated Site Reliability Engineer (SRE) with a strong focus on system administration, automation and tooling. As part of our dynamic engineering team, you will play a crucial role in building and maintaining reliable, scalable, and efficient cloud infrastructure. You will work closely with development, operations, and product teams to enhance our systems and services while championing the best practices of SRE.

Key Responsibilities

  • Design, develop, and implement automated systems to improve the reliability, performance, and scalability of our services.
  • Create and maintain tooling that facilitates rapid deployment, monitoring, and management of our infrastructure.
  • Collaborate with cross-functional teams to integrate automation solutions with existing workflows and pipelines.
  • Identify and resolve performance bottlenecks and ensure high availability of critical services.
  • Develop and follow SRE best practices to enhance system reliability and operational efficiency.
  • Contribute to incident response and postmortem analysis to continuously improve our systems.
  • Participate in on-call rotations to support continuous 24/7 operations.
  • Foster a culture of continuous improvement through proactive monitoring, performance tuning, and capacity planning.
  • Advocate for cloud-agnostic architecture principles and assist in the integration and management of multi-cloud environments.

Qualifications

  • S./M.S. in Computer Science, Engineering, or a related field, or equivalent industry experience.
  • 5+ years experience as a Site Reliability Engineer, System Engineer, or similar role.
  • Familiarity with CI/CD pipelines and relevant tools (e.g., Jenkins, Bitbucket).
  • 5+ years hands-on experience with VMWARE (or similar virtualization solution) and Linux (RedHat).
  • Solid understanding and experience with cloud platforms (e.g. AWS, Azure) and cloud-agnostic architectural principles.
  • Strong proficiency in configuration automation tools and frameworks (e.g., , Terraform, Puppet).

Nice To Have

  • Demonstrated knowledge of incident management and post-incident analysis processes (e.g., SLIs, SLOs, SLAs).
  • Solid understanding and experience with RabbitMQ (queuing tools), Redis Cluster (caching tools), nGinx, Apache, Gunicorn (web layer tools) applications.
  • Extensive experience with scripting and programming languages (e.g., Python, PowerShell, Bash, Cloud CLIs).
  • Solid understanding and experience with container orchestration tools (e.g., Kubernetes, Docker, EKS).
  • Expertise in monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack).

View Full Job Description

About The Company

Karnataka, India (On-Site)

Karnataka, India (On-Site)

Karnataka, India (Hybrid)

View All Jobs

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug