Site Reliability Engineer

1 Hour ago • All levels
Devops

Job Description

The Site Reliability Engineer (SRE) will ensure the reliability, availability, and performance of software systems. This involves designing, building, and maintaining scalable infrastructure, implementing monitoring and alerting, and automating processes. The role requires collaboration with developers, root cause analysis, and participation in on-call rotations to continuously improve system reliability and performance.
Good To Have:
  • Certifications in cloud computing (e.g., AWS Certified DevOps Engineer, Google Professional Cloud Architect).
  • Experience with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI).
  • Knowledge of database technologies (e.g., MySQL, PostgreSQL, MongoDB).
  • Familiarity with Agile methodologies and DevOps practices.
  • Experience with performance tuning and optimization of distributed systems.
Must Have:
  • Design, build, and maintain scalable and reliable infrastructure using modern cloud technologies.
  • Develop and implement monitoring and alerting systems.
  • Collaborate with software developers to improve system architecture and performance.
  • Automate manual processes to increase efficiency and reduce downtime.
  • Conduct root cause analysis for incidents and implement preventive measures.
  • Participate in on-call rotation to respond to critical incidents.
  • Continuously evaluate and improve the reliability and performance of systems.
  • Document system configurations, processes, and best practices.
  • Bachelor’s degree in computer science, Engineering, or a related field.
  • Proven experience as a Site Reliability Engineer or similar role.
  • Strong understanding of cloud computing platforms (e.g., AWS, Google Cloud, Azure).
  • Proficiency in scripting languages such as Python, Bash, or PowerShell.
  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Solid understanding of networking principles and protocols.
  • Familiarity with infrastructure as code tools such as Terraform or Ansible.
  • Excellent problem-solving and troubleshooting skills.
  • Ability to work independently and collaboratively in a fast-paced environment.
  • Strong communication skills and ability to work effectively with cross-functional teams.
Perks:
  • Competitive remuneration package
  • Employee Stock Purchase Plan Enrolment
  • 30 days of earned leave
  • An extra day off for your birthday
  • Various other leaves like marriage leave, casual leave, maternity leave, and paternity leave
  • Premium Group Medical Insurance for employees and five dependents
  • Personal accident insurance coverage
  • Life insurance coverage
  • Professional development reimbursement
  • Interest subsidy on loans - either vehicle or personal loans

Add these skills to join the top 1% applicants for this job

cross-functional
problem-solving
communication
game-texts
agile-development
gitlab
mysql
postgresql
networking
aws
azure
ansible
terraform
powershell
mongodb
ci-cd
docker
kubernetes
python
bash
jenkins

We are Progress (Nasdaq: PRGS) - the trusted provider of software that enables our customers to develop, deploy and manage responsible, AI-powered applications and experience with agility and ease.

We’re proud to have a diverse, global team where we value the individual and enrich our culture by considering varied perspectives because we believe people power progress. Join us as a Site Reliability Engineer (SRE) and help us do what we do best: propelling business forward. Learn More about us.

The Site Reliability Engineer (SRE) will be responsible for ensuring the reliability, availability, and performance of our company's software systems. This role will involve designing, building, and maintaining scalable infrastructure, implementing monitoring and alerting systems, and automating processes to streamline operations.

In this role, you will:

  • Design, build, and maintain scalable and reliable infrastructure using modern cloud technologies.
  • Develop and implement monitoring and alerting systems to proactively identify and address issues.
  • Collaborate with software developers to improve system architecture and performance.
  • Automate manual processes to increase efficiency and reduce downtime.
  • Conduct root cause analysis for incidents and implement preventive measures.
  • Participate in on-call rotation to respond to critical incidents outside of business hours.
  • Continuously evaluate and improve the reliability and performance of our systems.
  • Document system configurations, processes, and best practices.

Your background:

  • Bachelor’s degree in computer science, Engineering, or a related field.
  • Proven experience as a Site Reliability Engineer or similar role.
  • Strong understanding of cloud computing platforms (e.g., AWS, Google Cloud, Azure).
  • Proficiency in scripting languages such as Python, Bash, or PowerShell.
  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Solid understanding of networking principles and protocols.
  • Familiarity with infrastructure as code tools such as Terraform or Ansible.
  • Excellent problem-solving and troubleshooting skills.
  • Ability to work independently and collaboratively in a fast-paced environment.
  • Strong communication skills and ability to work effectively with cross-functional teams.

Preferred Qualifications:

  • Certifications in cloud computing would be preferred (e.g., AWS Certified DevOps Engineer, Google Professional Cloud Architect).
  • Experience with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI).
  • Knowledge of database technologies (e.g., MySQL, PostgreSQL, MongoDB).
  • Familiarity with Agile methodologies and DevOps practices.
  • Experience with performance tuning and optimization of distributed systems.

If this sounds like you and fits your experience and career goals, we’d be happy to chat. What we offer in return is the opportunity to experience a great company culture with wonderful colleagues to learn from and collaborate with, and also to enjoy:

Compensation

  • Competitive remuneration package
  • Employee Stock Purchase Plan Enrolment

Vacation, Family, and Health

  • 30 days of earned leave
  • An extra day off for your birthday
  • Various other leaves like marriage leave, casual leave, maternity leave, and paternity leave
  • Premium Group Medical Insurance for employees and five dependents, personal accident insurance coverage, and life insurance coverage
  • Professional development reimbursement
  • Interest subsidy on loans - either vehicle or personal loans.

Apply now!

#LI-SR1

#LI-Hybrid

Together, We Make Progress

Progress is an inclusive workplace where opportunities to succeed are available to everyone. As a multicultural company serving a global community, we encourage a wide range of points of view and celebrate our diverse backgrounds. Our unique combination of perspectives inspires innovation, connects us to our customers and positively affects our communities. It is only by working together and learning from each other that we make Progress. Join us!

Set alerts for more jobs like Site Reliability Engineer
Set alerts for new jobs by progress
Set alerts for new Devops jobs in India
Set alerts for new jobs in India
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙