Senior Cloud Site Reliability Engineer

Barracuda

5+ Years | Ottawa, ON, Canada (Remote) | Full Time | 2 months ago

Apply Now

Job Summary

Barracuda is seeking a passionate and experienced Senior Cloud Site Reliability Engineer for its Email Protection business unit. This role focuses on ensuring the availability and seamless scaling of high-volume, critical SaaS applications. Responsibilities include supporting application infrastructure, implementing automation for deployments, maintaining self-service platforms, managing service levels, participating in incident response, and contributing to disaster recovery plans. The ideal candidate will have strong technical acumen in operations, automation, and development, working with AWS, Kubernetes, and CI/CD tools to strengthen cyber resilience.

Must Have

Work with internal customers to understand application design and cloud infrastructure requirements, focusing on scalability and reliability.
Implement templates, tools, and scripts for infrastructure deployment to support development teams.
Help develop and maintain self-service platforms for Product Engineering team.
Implement and monitor SLIs, SLOs, and SLAs across services.
Participate in incident response processes and contribute to post-incident reviews.
Help maintain disaster recovery and business continuity plans.
Implement non-functional requirements including security, performance, and monitoring.
Assist with architecture implementation, solution design, and code reviews.
Implement solutions using AWS, Kubernetes, GitHub Actions, Jenkins, Terraform, and other current technologies.
Support initiatives to convert manual deployments to automated processes.
Maintain and enhance monitoring and reliability systems.
Participate in on-call rotation to ensure 24/7 system reliability.
5+ years hands-on infrastructure experience, including 3+ years cloud development and SRE/DevOps roles.
Strong knowledge of AWS cloud infrastructure, security, and operations in production environments.
Experience with Terraform, CloudFormation, or Pulumi for cloud infrastructure automation.
Experience with GitHub, GitHub Actions, Jenkins, and configuration management tools.
Knowledge of blue/green, canary, and rolling deployment strategies.
Experience with Docker, Kubernetes, and EKS in AWS environments.
Solid coding abilities in Python, Go, or similar languages.
Strong Linux knowledge including system administration.
Experience with monitoring tools like New Relic, CloudWatch, Prometheus, and Grafana.
Good debugging and troubleshooting capabilities.

Good to Have

AWS certifications (Solutions Architect, SysOps).
Kubernetes certifications (CKA, CKAD).

Perks & Benefits

A team where you can voice your opinion, make an impact, and where you and your experience are valued.
Internal mobility – there are opportunities for cross training and the ability to attain your next career step within Barracuda.
Equity, in the form of non-qualifying options.

Job Description

Description

Req ID: 26-321

Come join our passionate team! Barracuda is a leading cybersecurity company providing complete protection against complex threats. Our platform protects email, data, applications, and networks with innovative solutions, and a managed XDR service, to strengthen cyber resilience. Hundreds of thousands of IT professionals and managed service providers worldwide trust us to protect and support them with solutions that are easy to buy, deploy, and use.

We are committed to a candidate selection process and work environment that is inclusive and barrier free. To ensure candidates are assessed in a fair and equitable manner, accommodations will be provided to prospective employees in accordance with the Accessibility for Ontarians with Disabilities Act (AODA) and the Ontario Human Rights Code.

Envision yourself at Barracuda

We seek a passionate and experienced Senior Cloud Site Reliability Engineer / (SRE) for the Email Protection business unit with great technical acumen and a strong background in operations, automation, implementation, and development. You will be responsible for ensuring the availability of high volume, critical SaaS applications and seamless scaling. The application portfolio ranges from a broad spectrum of Email Protection products.

What will you be working on:

Application Infrastructure Support: Work with internal customers to understand application design and cloud infrastructure requirements, focusing on scalability and reliability
Infrastructure Automation: Implement templates, tools, and scripts for infrastructure deployment to support development teams
Platform Support: Help develop and maintain self-service platforms for Product Engineering team
Service Level Management: Implement and monitor SLIs, SLOs, and SLAs across services
Incident Management: Participate in incident response processes and contribute to post-incident reviews
Disaster Recovery: Help maintain disaster recovery and business continuity plans
Technical Implementation: Implement non-functional requirements including security, performance, and monitoring
Solution Implementation: Assist with architecture implementation, solution design, and code reviews
Technology Stack Implementation: Implement solutions using AWS, Kubernetes, GitHub Actions, Jenkins, Terraform, and other current technologies
Deployment Automation: Support initiatives to convert manual deployments to automated processes
Observability Systems: Maintain and enhance monitoring and reliability systems
On-Call Duties: Participate in on-call rotation to ensure 24/7 system reliability

What you bring to the role:

Technical Expertise: 5+ years hands-on infrastructure experience, including 3+ years cloud development and SRE/DevOps roles
Cloud Infrastructure: Strong knowledge of AWS cloud infrastructure, security, and operations in production environments
Infrastructure as Code: Experience with Terraform, CloudFormation, or Pulumi for cloud infrastructure automation
CI/CD & Automation: Experience with GitHub, GitHub Actions, Jenkins, and configuration management tools
Deployment Patterns: Knowledge of blue/green, canary, and rolling deployment strategies
Container Orchestration: Experience with Docker, Kubernetes, and EKS in AWS environments
Programming: Solid coding abilities in Python, Go, or similar languages
Operating Systems: Strong Linux knowledge including system administration
Observability: Experience with monitoring tools like New Relic, CloudWatch, Prometheus, and Grafana
Problem Solving: Good debugging and troubleshooting capabilities
Certifications: AWS certifications (Solutions Architect, SysOps) or Kubernetes certifications (CKA, CKAD) a plus

What you’ll get from us:

A team where you can voice your opinion, make an impact, and where you and your experience are valued. Internal mobility – there are opportunities for cross training and the ability to attain your next career step within Barracuda. In addition, you will receive equity, in the form of non-qualifying options.

#LI-remote

17 Skills Required For This Role

Saas Business Models Problem Solving Github Game Texts Linux Incident Response Aws Prometheus Terraform New Relic Grafana Ci Cd Docker Kubernetes Python Github Actions Jenkins