Site Reliability Engineer (PA2025Q3JB086)

14 Minutes ago • All levels
Devops

Job Description

As a Site Reliability Engineer (SRE) at SS&C, you will be crucial in ensuring the scalability, reliability, and performance of our systems, infrastructure, and applications. You will collaborate with engineering, system administration, and DevOps teams to design and implement solutions that enhance uptime, system availability, and overall service health. Key responsibilities include maintaining infrastructure, designing monitoring and incident response systems, developing automation tools, and participating in on-call rotations to minimize downtime.
Good To Have:
  • Experience with large-scale distributed systems.
  • Familiarity with SLAs, SLOs, and SLIs.
  • Previous experience in a DevOps or SRE role in a production environment.
Must Have:
  • Maintain scalable and reliable infrastructure to support mission-critical systems.
  • Design and implement monitoring, alerting, and incident response systems.
  • Develop tools and automation to eliminate manual operations and improve system efficiency.
  • Work with development teams to ensure reliability and performance are considered from the start.
  • Conduct root cause analysis and postmortems to learn from system failures and prevent recurrence.
  • Participate in on-call rotations and respond to incidents, minimizing downtime and customer impact.
  • Continuously improve deployment, configuration, and observability processes.
  • Strong experience with Linux/Unix systems administration.
  • Proficient in scripting and programming languages such as Python, Go, or Bash.
  • Hands-on experience with cloud platforms such as AWS and infrastructure-as-code tools (Terraform, Ansible).
  • Experience with containerization and orchestration tools (Docker, Kubernetes, Linux, Windows, Citrix).
  • Deep understanding of CI/CD pipelines, networking, and security best practices.
  • Excellent troubleshooting and problem-solving skills.
  • Strong communication and collaboration abilities.

Add these skills to join the top 1% applicants for this job

problem-solving
talent-acquisition
game-texts
networking
linux
incident-response
aws
unix
ansible
terraform
ci-cd
docker
kubernetes
python
bash

Job Description

Job Title: Site Reliability Engineer (SRE)

Job Description:

We are looking for a highly skilled Site Reliability Engineer (SRE) to join our engineering team. As an SRE, you will be responsible for ensuring the scalability, reliability, and performance of our systems, infrastructure, and applications. You will collaborate closely with software engineers, system administrators, and DevOps professionals to design and implement solutions that improve uptime, system availability, and overall service health.

Key Responsibilities:

  • Maintain scalable and reliable infrastructure to support mission-critical systems.
  • Design and implement monitoring, alerting, and incident response systems to ensure high availability and performance.
  • Develop tools and automation to eliminate manual operations and improve system efficiency.
  • Work with development teams to ensure reliability and performance are considered from the start.
  • Conduct root cause analysis and postmortems to learn from system failures and prevent recurrence.
  • Participate in on-call rotations and respond to incidents, minimizing downtime and customer impact.
  • Continuously improve deployment, configuration, and observability processes.

Qualifications:

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • Strong experience with Linux/Unix systems administration.
  • Proficient in scripting and programming languages such as Python, Go, or Bash.
  • Hands-on experience with cloud platforms such as AWS and infrastructure-as-code tools (Terraform, Ansible)
  • Experience with containerization and orchestration tools (Docker, Kubernetes, Linux, Windows, Citrix).
  • Deep understanding of CI/CD pipelines, networking, and security best practices.
  • Excellent troubleshooting and problem-solving skills.
  • Strong communication and collaboration abilities.

Preferred Qualifications:

  • Experience with large-scale distributed systems.
  • Familiarity with SLAs, SLOs, and SLIs.
  • Previous experience in a DevOps or SRE role in a production environment.

Unless explicitly requested or approached by SS&C Technologies, Inc. or any of its affiliated companies, the company will not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services.

SS&C Technologies is an Equal Employment Opportunity employer and does not discriminate against any applicant for employment or employee on the basis of race, color, religious creed, gender, age, marital status, sexual orientation, national origin, disability, veteran status or any other classification protected by applicable discrimination laws.

Set alerts for more jobs like Site Reliability Engineer (PA2025Q3JB086)
Set alerts for new jobs by SSC Technologies
Set alerts for new Devops jobs in India
Set alerts for new jobs in India
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙