Company Description
About CyberArk:
CyberArk (NASDAQ: CYBR), is the global leader in Identity Security. Centered on privileged access management, CyberArk provides the most comprehensive security offering for any identity – human or machine – across business applications, distributed workforces, hybrid cloud workloads and throughout the DevOps lifecycle. The world’s leading organizations trust CyberArk to help secure their most critical assets. To learn more about CyberArk, visit our CyberArk blogs or follow us on Twitter, LinkedIn or Facebook.
Job Description
We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our team. As an Sr.SRE, you will play a pivotal role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.
You will collaborate closely with development, operations, and other teams to implement and maintain efficient and resilient systems.
Responsibilities:
- Infrastructure Automation: Developing, deploying, and overseeing Infrastructure as Code (IaC) solutions using tools such as Terraform and Ansible to automate the provisioning, configuration, and deployment processes.
- Cloud Platform Expertise: Deep understanding of AWS cloud services, including EC2, S3, VPC, RDS, EKS, ECS, CF and more. Experience with serverless architecture and AWS Lambda functions is a plus.
- Containerization and Orchestration: Proficiency in containerization technologies (Docker) and orchestration platforms (Kubernetes) with deploying applications using tools like K8s and Helm.
- CI/CD Pipelines: Build and maintain robust CI/CD pipelines using tools like Jenkins.
- Monitoring and Alerting: Implement comprehensive monitoring and alerting solutions using tools like ELK, Datadog, CloudWatch, Grafana to proactively identify and resolve issues.
- Incident Management: Drive incident response processes, troubleshoot complex issues, and perform Root Cause analysis (RCA) to prevent future occurrences (CAPA).
- Performance Tuning: Continuously optimize system performance, identify bottlenecks, and implement strategies to improve scalability and efficiency.
- Cost Optimization: Identify and implement strategies to reduce cloud costs while maintaining performance and reliability.
- Security Best Practices: Adhere to security best practices and implement measures to protect infrastructure and data from vulnerabilities and threats.
- Collaboration and Communication: Work effectively with cross-functional teams to understand business requirements and provide technical guidance.
- SOP Documentation: Create and maintain documentation for infrastructure, processes, and incident management protocols.
Qualifications
- 7+ years of experience as a DevOps engineer or Site Reliability Engineer
- B.Tech computer
- Strong proficiency in AWS cloud services like EC2, S3, VPC, RDS, EKS, ECS, CF and more. AWS Certification helps.
- 3+ years of experience with serverless architectures using AWS Lambda.
- Strong scripting skills (Python, PowerShell, CDK, Shell scripting).
- Knowledge of CDK (Cloud Development Kit) for infrastructure as code.
- Experience with infrastructure as code tools (Terraform, Ansible) and AWX Tower for Ansible automation.
- Knowledge of containerization (Docker) and orchestration platforms (Kubernetes).
- Expertise in CI/CD pipelines and automation tools (Jenkins, GitHub).
- Exposure to monitoring and alerting tools (CloudWatch, Datadog, ELK, Grafana, NewRelic).
- Documenting SOP and RCAs.
- Understanding of security best practices and compliance standards. Security Certification is a plus.