Site Reliability Engineer

3 Minutes ago • All levels • Devops

Job Summary

Job Description

As a Site Reliability Engineer at Progress, you will be responsible for ensuring data security and compliance with standards like PCI-DSS, HIPAA, and SOC2. You will build and maintain reliable infrastructure and security services using Azure/AWS/GCP, automate system administration tasks, and optimize performance. The role involves designing monitoring solutions, participating in incident management, capacity planning, and providing on-call support. You will also collaborate with agile development teams and provision customer accounts, ensuring high-availability deployments and understanding end-to-end solutions.
Must have:
  • Protect systems from data breaches and prioritize data security.
  • Ensure compliance with PCI-DSS, HIPAA, SOC2, and other compliance policies.
  • Build and maintain reliable infrastructure and security services for highly available and scalable services.
  • Perform basic system administration tasks such as configuring servers, setting up HA/DR, and automating routine tasks.
  • Develop and maintain automation frameworks, tools, and processes to streamline operations.
  • Analyze system performance and identify opportunities for optimization and efficiency improvements.
  • Design and implement comprehensive monitoring and observability solutions.
  • Participate in incident management processes and conduct postmortem reviews.
  • Perform capacity planning and forecasting to anticipate resource requirements.
  • Serve on the on-call team and troubleshoot issues related to application development, deployment, and operations.
  • Work collaboratively with agile software development teams, providing support to developers, QA, and technical support.
  • Provision new customer accounts, including handling complex orders.
  • Implement automated high-availability deployments, ensuring system reliability and uptime.
  • Become proficient in understanding end-to-end software solutions.
  • Proven experience as a Site Reliability Engineer (or similar position) in a production capacity.
  • Understand what it means to operate infrastructure as code and have experience developing services and automation.
  • Ability to debug and optimize code and automate routine tasks to eliminate toil.
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership, initiative, grit, and drive.
  • Designed and implemented applications and systems that scale, are resilient to failure, and are observable.
  • Strong understanding of Windows, Linux, automation tools (Terraform, Ansible, Chef, or Puppet).
  • Strong understanding of Azure/AWS services (ECS, EKS, S3, and more), and scripting languages (Shell, Python, PowerShell, or others).
  • Knowledge of databases (Azure SQL, Postgres/RDS, Graph databases), Service Mesh (Linkerd or Envoy), API gateways, authentication services, 3rd party integrations.
  • Proficient in managing containerized environments using Kubernetes, Docker, and Rancher.
  • Familiarity with security concepts, including cloud authentication, authorization, web attacks, and environment security.
  • Experience with network concepts, including TCP/IP, HTTP, and TLS.
  • Experience with cloud-hosted apps/services (Azure/AWS preferred) and translating business requirements into securely implemented capabilities in the cloud.
  • Bachelor’s degree in computer science, Information Systems, or a related field.
  • Proven ability to adhere to policies, standards, and procedures related to change control and operational best practices.
  • Strong written and verbal communication skills for both technical and non-technical audiences.
  • Willingness to be flexible in responding to customer issues and ability to identify product/deployment improvements for future mitigation.
  • Interested in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Experience with PCI, HIPAA, and SOC2 compliance.
  • Willingness to work in US time zone (4:30 PM to 1:30 AM IST).
Good to have:
  • Chef knowledge
Perks:
  • Competitive remuneration package
  • Employee Stock Purchase Plan Enrolment
  • 30 days of earned leave
  • An extra day off for your birthday
  • Various other leaves like marriage leave, casual leave, maternity leave, and paternity leave
  • Premium Group Medical Insurance for employees and five dependents
  • Personal accident insurance coverage
  • Life insurance coverage
  • Professional development reimbursement
  • Interest subsidy on loans - either vehicle or personal loans

Job Details

In this role, you will work on:

  • Data Security and Compliance:
  • Protect systems from data breaches, prioritizing data security.
  • Ensure compliance with PCI-DSS, HIPAA, SOC2, and other compliance policies, standards, and procedures.
  • Participate in the quarterly, bi-yearly, and yearly audit compliance activities.
  • Infrastructure and Security Services:
  • Build and maintain reliable infrastructure and security services for highly available and scalable services by utilizing native Azure/AWS/GCP infrastructure services from Azure/AWS/GCP and other industry leading tools.
  • System Administration and Automation:
  • Perform basic system administration tasks such as configuring servers, setting up HA/DR, automating routine tasks, and backup/restore procedures.
  • Implement automation to minimize manual work and achieve security and compliance objectives.
  • Automation and Tooling:
  • Develop and maintain automation frameworks, tools, and processes to streamline operations and improve efficiency.
  • Champion the adoption of infrastructure as code (IaC) principles for configuration management and deployment automation.
  • Performance Optimization:
  • Analyze system performance and identify opportunities for optimization and efficiency improvements.
  • Implement performance tuning strategies to enhance system reliability and scalability.
  • Monitoring and Observability:
  • Design and implement comprehensive monitoring and observability solutions to proactively identify and address system issues.
  • Utilize advanced monitoring tools and techniques to gain insights into system behavior and performance.
  • Incident Management and Postmortems:
  • Participate in incident management processes, ensuring timely resolution of incidents and minimizing impact on users.
  • Conduct postmortem reviews to identify root causes and implement preventive measures to mitigate future incidents.
  • Capacity Planning and Forecasting:
  • Perform capacity planning and forecasting to anticipate resource requirements and ensure adequate scalability.
  • Develop strategies for optimizing resource utilization and cost-effectiveness.
  • On-call Support and Troubleshooting:
  • Serve on the on-call team, acting as an escalation contact for service incidents.
  • Troubleshoot and resolve issues related to application development, deployment, and operations.
  • Work with Technical Support to troubleshoot customer issues.
  • Collaboration and Agile Support:
  • Work collaboratively with agile software development teams, providing support to developers, QA, and technical support.
  • Collaborate with other team members during our planned scheduled maintenance windows.
  • Customer Account Provisioning:
  • Provision new customer accounts, including handling complex orders in coordination with Progress Sales/Professional Services.
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • High-Availability Deployments:
  • Implement automated high-availability deployments, ensuring system reliability and uptime.
  • End-to-End Solution Understanding:
  • Become proficient in understanding how each software component, system design, and configuration are linked to form an end-to-end solution.

Your background:

  • Experience:
  • Proven experience as a Site Reliability Engineer (or similar position) in a production capacity.
  • You understand what it means to operate infrastructure as code and have experience developing services and automation to do so. Chef knowledge would be a plus.
  • You have a great ability to debug and optimize code and automate routine tasks to eliminate toil.
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership, initiative, grit, and drive.
  • You have designed and implemented applications and systems that scale, are resilient to failure, and are observable.
  • Technical Expertise:
  • Strong understanding of Windows, Linux, automation tools (Terraform, Ansible, Chef, or Puppet), Azure/AWS services (ECS, EKS, S3, and more), and scripting languages (Shell, Python, PowerShell, or others).
  • Knowledge of databases (Azure SQL, Postgres/RDS, Graph databases), Service Mesh (Linkerd or Envoy), API gateways, authentication services, 3rd party integrations, and more.
  • Proficient in managing containerized environments using Kubernetes, Docker, and Rancher, along with other related tools and technologies.
  • Security Knowledge:
  • Familiarity with security concepts, including cloud authentication, authorization, web attacks, and environment security.
  • Experience with network concepts, including TCP/IP, HTTP, and TLS.
  • Cloud Experience:
  • Experience with cloud-hosted apps/services (Azure/AWS preferred) and translating business requirements into securely implemented capabilities in the cloud.
  • Education:
  • Bachelor’s degree in computer science, Information Systems, or a related field.
  • Compliance and Communication:
  • Proven ability to adhere to policies, standards, and procedures related to change control and operational best practices.
  • Strong written and verbal communication skills for both technical and non-technical audiences.
  • Flexible and Proactive:
  • Willingness to be flexible in responding to customer issues and ability to identify product/deployment improvements for future mitigation.
  • You are interested in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Regulatory Compliance:
  • Experience with PCI, HIPAA, and SOC2 compliance.

Must be willing to work in US time zone [4:30 PM to 1:30AM IST]

If this sounds like you and fits your experience and career goals, we’d be happy to chat. What we offer in return is the opportunity to experience a great company culture with wonderful colleagues to learn from and collaborate with, and also to enjoy:

Compensation

  • Competitive remuneration package
  • Employee Stock Purchase Plan Enrolment

Vacation, Family, and Health

  • 30 days of earned leave
  • An extra day off for your birthday
  • Various other leaves like marriage leave, casual leave, maternity leave, and paternity leave
  • Premium Group Medical Insurance for employees and five dependents, personal accident insurance coverage, and life insurance coverage
  • Professional development reimbursement
  • Interest subsidy on loans - either vehicle or personal loans.

Apply now!

#LI-SR1

#LI-Hybrid

Together, We Make Progress

Progress is an inclusive workplace where opportunities to succeed are available to everyone. As a multicultural company serving a global community, we encourage a wide range of points of view and celebrate our diverse backgrounds. Our unique combination of perspectives inspires innovation, connects us to our customers and positively affects our communities. It is only by working together and learning from each other that we make Progress. Join us!

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Hyderabad, Telangana, India

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Devops Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Progress (Nasdaq: PRGS) empowers organizations to achieve transformational success in the face of disruptive change. Our software enables our customers to develop, deploy and manage responsible AI-powered applications and experiences with agility and ease. Customers get a trusted provider in

Progress, with the products, expertise and vision they need to succeed. Over 4 million developers and technologists at hundreds of thousands of enterprises depend on Progress. Learn more at www.progress.com.

Hyderabad, Telangana, India (Hybrid)

Bengaluru, Karnataka, India (On-Site)

Sofia, Sofia City Province, Bulgaria (Hybrid)

Sofia, Sofia City Province, Bulgaria (Hybrid)

Burlington, Massachusetts, United States (Hybrid)

Brno, South Moravian Region, Czechia (On-Site)

United States (Remote)

Limerick, County Limerick, Ireland (Hybrid)

View All Jobs

Get notified when new jobs are added by progress

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug