Senior Site Reliability Engineer

1 Year ago • 5 Years +

Devops

Job Description

Barracuda is seeking a Senior Site Reliability Engineer to join their Data Protection organization. The role involves building automation and platform designs, managing the day-to-day operations of the Data Protection system, resolving application and infrastructural issues, and deploying solutions to eliminate downtime. The ideal candidate will be responsible for ensuring high levels of uptime for the organization. They will be working with modern technologies and languages, with a focus on performance, monitoring, and observability within the Azure cloud environment. This role also includes collaborating with internal groups to design, develop, and deploy manageable, scalable, and robust services.

Must Have:

Experience in cloud infrastructure in Azure
Experience in automating and maintaining production environment
Experience with cloud infrastructure automation tools
Experience with CI/CD and automation tools
Working knowledge with deployment patterns/strategy
Experience with container and container orchestration tools
Ability to design and release code in languages like Python, Go, Ruby
Advanced Operating System skills with knowledge of Linux internals
Experience with observability and reliability tools
Experience with Data pipeline engineering and tools
Strong debugging skills with a systematic problem-solving approach
Ability to communicate effectively both verbally and in writing
Self-awareness and a true teamwork spirit
Bachelor's degree in a technology field or equivalent work experience

Perks:

A team where you can voice your opinion, make an impact, and where you and your experience are valued.
Internal mobility – there are opportunities for cross training and the ability to attain your next career step within Barracuda
in addition to equity, in the form of non-qualifying options

Add these skills to join the top 1% applicants for this job

team-management

communication

problem-solving

github

ruby

linux

azure

prometheus

ansible

terraform

new-relic

grafana

elk

puppet

spark

ci-cd

docker

kubernetes

python

github-actions

jenkins

Job ID 25-281

Come join our passionate team! Barracuda is a leading cybersecurity company providing complete protection against complex threats. Our platform protects email, data, applications, and networks with innovative solutions, and a managed XDR service, to strengthen cyber resilience. Hundreds of thousands of IT professionals and managed service providers worldwide trust us to protect and support them with solutions that are easy to buy, deploy, and use.

We know a diverse workforce adds to our collective value and strength as an organization. Barracuda Networks is proud to be an employer that complies with all applicable national, state and local laws pertaining to nondiscrimination and equal opportunity regardless of race, gender, religion, sex, sexual orientation, national origin, or disability.

Envision yourself at Barracuda

We are seeking a passionate, experienced Senior Site Reliability Engineer to join our Data Protection organization. We hire strong, collaborative leaders to inspire and enable teams to be successful delivering quality software.

The right candidate will have extensive experience in Site Reliability Engineering, ensuring the highest levels of uptime of the Data Protection Organization. You will be working to build the Automation and Platform design, manage the day-to-day operations of the DP system in production, fix application and infrastructural issues, and deploy them to eliminate downtime.

Our products are built using modern technologies and languages and deployed to Azure via a mature CI/CD pipeline. Performance, monitoring, and observability are first-class citizens in our ecosystem. Some products are on their journey ‘to the public cloud’, and successfully running the application with the lowest acceptable downtime is the key.

What you bring to the role:

Experience with developing, building, securing, and operating sophisticated and highly automated Cloud infrastructure in Azure a must
Prior success in automating and maintaining an efficient large scale real-world production environment
Extensive experience with orchestrating cloud infrastructure automation using tools like Terraform, CloudFormation, Azure Resource Manager (ARM) and Crossplane
Development experience with continuous integration (CI/CD) and automation tools such as GitHub, GitHub Actions, Jenkins, Packer, Ansible, Puppet, etc.
Working knowledge with deployment patterns/strategy including blue/green, canary, rolling deployment, draining, etc.
Comprehensive experience with containers and container orchestration tools (Docker, Kubernetes) in a Cloud Environment (Azure AKS)
The ability to design, author, and release code in languages like Python, Go, Ruby
Advanced Operating System skills with knowledge of Linux internals
Extensive experience working with observability and reliability tools like New Relic, Elk, CloudWatch, Prometheus, and Grafana
Experience with Data pipeline engineering and tools like Databricks, Apache Spark, Kafka, DataStage
Strong debugging skills with a systematic problem-solving approach to identify complex problems
Ability to communicate effectively both verbally and in writing
Self-awareness and a true teamwork spirit
Bachelor's degree in a technology field or equivalent work experience
Minimum of 5 years of experience in a Site Reliability Engineer (SRE) or similar role

What you will be working on:

Write clean, high-performance, and well tested, infrastructure code with a focus on reusability (Puppet /Ansible/ Terraform/Azure Resource Manager/Cloudformation/Crossplane/Packer)
Recommend and implement infrastructure best practices in alignment with standard SRE principles and supply guidance on system performance and throughput expectations.
Troubleshoot issues across the entire stack: hardware, software, application, and network
Establish, maintain, and adhere to Barracuda technical standards, policies, and procedures
Build and enhance our observability and reliability systems
Participate in an on-call rotation
Collaborate with internal groups to design, develop, and deploy manageable, scalable, and robust services
Perform RCA (Root Cause Analysis), partner with engineering and operation teams across the organization to roll out fixes
Provide technical guidance and mentorship to other engineers on reliability and scalability best practices, tools, and methodologies

What you will get from us:

A team where you can voice your opinion, make an impact, and where you and your experience are valued. Internal mobility – there are opportunities for cross training and the ability to attain your next career step within Barracuda, in addition to equity, in the form of non-qualifying options.

#LI-hybrid

Set alerts for more jobs like Senior Site Reliability Engineer

Set alerts for new jobs by Barracuda

Set alerts for new Devops jobs in India

Set alerts for new jobs in India

Set alerts for Devops (Remote) jobs