Staff Site Reliability Engineer

CyberArk

Job Summary

CyberArk is seeking a Staff Site Reliability Engineer to join their R&D team. The role involves designing and implementing AWS infrastructure, leading architecture for cloud-based automation, and providing guidance on SaaS environment reliability. The ideal candidate will have experience in cloud-scale problems, CI/CD pipelines, and a deep understanding of Site Reliability, infrastructure, and Cloud Platforms. Responsibilities include working with configuration management tools, ensuring architecture meets availability requirements, and implementing monitoring solutions.

Must Have

  • Design and implement AWS infrastructure components (VPCs, EC2, EKS, S3, CloudFormation).
  • Lead architecture and designs for deployment and management automation of cloud infrastructure and software.
  • Provide guidance to Site Reliability and DevOps Engineers on managing SaaS environment reliability and performance.
  • Architect and guide the team with configuration management tools (CloudFormation, Helm, Terraform, Salt, Ansible).
  • Ensure cloud-based architectures meet availability and recoverability requirements.
  • Architect and implement cloud-based monitoring, alerting, and reporting (Datadog, CloudWatch, ELK).
  • Support and guide on tooling to enable greater output and reliability.
  • Deep understanding of latest tech solutions, trends, and architecture details.
  • Work with Team Leads to identify improvements, prepare architecture road maps, and advocate to Product Management.
  • Minimum 4 years of experience managing AWS infrastructure.
  • Minimum 7 years in a senior, architect, or technical lead role (site reliability, systems engineering, or software development).
  • Deep understanding of Site Reliability, infrastructure, and Cloud Platform.
  • Expert understanding/experience of containerization services (Docker/Kubernetes).
  • Expert in observability tooling (Datadog, Grafana, Elasticsearch).
  • Solid understanding/experience of web services, databases, and related infrastructure/architectures.
  • Solid understanding of backup/restore best practices.
  • Strong expertise programming configuration management languages.
  • Strong expertise programming in Python / Java or equivalent language.
  • Excellent Troubleshooting Skills.
  • Experience supporting an enterprise-level SaaS environment.

Good to Have

  • Security Experience.
  • Experience applying AI/ML-driven approaches to observability, anomaly detection, capacity planning, or reliability improvements.

Perks & Benefits

  • Commissions or discretionary bonus based on employee’s performance.
  • Wide range of medical, dental, vision, financial, and other benefits.

Job Description

Company Description

About CyberArk:

CyberArk (NASDAQ: CYBR), is the global leader in Identity Security. Centered on privileged access management, CyberArk provides the most comprehensive security offering for any identity – human or machine – across business applications, distributed workforces, hybrid cloud workloads and throughout the DevOps lifecycle. The world’s leading organizations trust CyberArk to help secure their most critical assets. To learn more about CyberArk, visit our CyberArk blogs or follow us on X, LinkedIn or Facebook.

Job Description

CyberArk is the global leader in privileged access security, a critical layer of IT security to protect data, infrastructure and assets across the enterprise, in the cloud and throughout the DevOps pipeline. CyberArk delivers the industry’s most complete solution to reduce risk created by privileged credentials and secrets. The company is trusted by the world’s leading organizations, including more than 50 percent of the Fortune 100, to protect against external attackers and malicious insiders.

Job Description:

CyberArk is seeking a Staff Site Reliability Engineer looking to bring their knowledge, excitement, and energy to the team. If you have worked in the cloud solving scale problems, bringing visibility into your platform and accomplishing true CI/CD pipelines we want you on the team! Driven and excited to innovate is what we need all while allowing you to grow professionally and creating strong relationships that will last a lifetime.

Responsibilities:

  • Design Implementation of AWS infrastructure components such as VPCs, EC2, EKS, S3, tagging schemes, CloudFormation, etc.
  • Lead architecture, designs and feature analysis of deployment and management automation of cloud-based infrastructure and software
  • Provide guidance to Site Reliability and DevOps Engineers on managing the reliability and performance of SaaS environments as well as on building automation to prevent problem reoccurrence
  • Architecting and guiding the team with the use of configuration management tools in both Windows and Linux - CloudFormation, Helm, Terraform, Salt, Ansible
  • Ensuring cloud-based architectures meet availability and recoverability requirements
  • Architecture and implementation of cloud-based monitoring, alerting and reporting – Datadog, CloudWatch, ELK
  • Support and guidance on tooling that helps to enable teams for greater output and reliability.
  • Deep understanding of the latest tech solutions, trends, and ability to dive into the details of the architecture as needed.
  • Work with the Team Leads within the group to identify areas of improvement, prepare architecture road maps, and advocate to the Product Management group.

#LI-JH1

Qualifications

  • B.S. in Computer Science or equivalent experience
  • Minimum 4 years of experience managing AWS infrastructure
  • Minimum of 7 years in a senior, architect or a technical lead role of site reliability, systems engineering or software development
  • A deep understanding of Site Reliability, infrastructure and Cloud Platform
  • Expert understanding/experience of containerization services such as Docker/Kubernetes
  • Expert in observability tooling such as Datadog, Grafana, Elasticsearch
  • Solid understanding/experience of web services, databases and relating infrastructure/architectures
  • Solid understanding of backup/restore best practices
  • Strong level of expertise programming writing configuration management languages
  • Strong level of expertise programming in Python / Java or equivalent language
  • Excellent Troubleshooting Skills
  • Experience supporting an enterprise-level SaaS environment
  • Security Experience a plus
  • Experience applying AI/ML-driven approaches to observability, anomaly detection, capacity planning, or reliability improvements is a plus

Additional Information

CyberArk is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, creed, sex, sexual orientation, gender identity, national origin, disability, or protected Veteran status.

We are unable to sponsor or take over sponsorship of employment Visa at this time.

The salary range for this position is $126,000 – $185,000/year, plus commissions or discretionary bonus, which will be based on the employee’s performance. Base pay may also vary considerably depending on job-related knowledge, skills, and experience. The compensation package includes a wide range of medical, dental, vision, financial, and other benefits.

16 Skills Required For This Role

Saas Business Models Problem Solving Game Texts Linux Aws Ansible Terraform Grafana Elasticsearch Elk Helm Ci Cd Docker Kubernetes Python Java

Similar Jobs