Site Reliability Engineer

12 Months ago • 8-10 Years
Devops

Job Description

Seeking a talented SRE with 8-10 years of experience in building scalable and reliable systems. Must have strong programming skills in Python, Go, Java, or Ruby, and experience with monitoring tools like ELK, Dynatrace, Cloudwatch, etc. Strong problem-solving and communication skills required.
Good To Have:
  • Cloud experience
  • On-call management
  • Release management
  • Agile development
Must Have:
  • SRE experience
  • Programming skills
  • Monitoring tools
  • Problem-solving skills

Add these skills to join the top 1% applicants for this job

java
ruby
kibana
elk
elasticsearch
grafana
logstash
aws
python
prometheus
communication
agile-development
release-management

About the job

About the role

We are seeking a talented Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in software engineering and systems administration, with a passion for building scalable and reliable systems. As an SRE, you will collaborate with development and operations teams to ensure our services are reliable, performant, and highly available.


Key Responsibilities

  • Experience maintaining and supporting solutions in a Cloud based environment (GCP or AWS)
  • Experience working with various monitoring tools. (eg. ELK, Dyntrace, Cloudwatch, Cloud logging, Cloud Monitoring, BMC Surveyor, BMC Patrol, Grafana, Prometheus)
  • Ensure monitoring and self-healing strategies are implemented and maintained to proactively prevent production incidents.
  • Perform root cause analysis of production issues
  • Design and manage on call and escalation processes – Nice to Have
  • Participate in design reviews and production reviews for new features, products, or pieces of infrastructure
  • Designing and implementing ELK (Elasticsearch, Logstash and Kibana) stack, Prometheus and Grafana solutions for monitoring and alerting.
  • Debug production issues across services and levels of the stack.
  • Establish KPIs to demonstrate maturity, efficiency, and value to our business partners
  • Works as an integral part of the DevOps team with complimentary skills and common goals
  • L3 Support experience is an asset.
  • Work to create a Release management process and help with Out-of-business-hour deployments and support (Rotation with team members)
  • Familiar and comfortable with agile development techniques.


Technology skills (Mandatory)

ELK, Dyntrace, Cloudwatch, Cloud logging, Cloud Monitoring, BMC Surveyor, BMC Patrol, Grafana, Prometheus


Required qualifications to be successful in this role:

  • Bachelor’s degree in computer science engineering, or related field.
  • 8 -10 years of experience as a SRE.
  • Proven experience as an SRE, DevOps engineer, or similar role.
  • Strong programming skills in languages such as Python, Go, Java, or Ruby.
  • Strong problem-solving skills and ability to work under pressure.
  • Excellent communication and collaboration skills.
  • Flexible to work in EST time zones ( 9-5 EST)

Set alerts for more jobs like Site Reliability Engineer
Set alerts for new jobs by BCE Global Tech
Set alerts for new Devops jobs in India
Set alerts for new jobs in India
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙