Site Reliability Engineer

1 Month ago • All levels • Devops

Job Summary

Job Description

Join Zuora’s Operations team as a Site Reliability Engineer to maintain the reliability, scalability, and performance of their SaaS platform. This role involves proactive service monitoring, incident response, infrastructure service management, and ownership of internal and external shared services. You will collaborate with various teams to ensure seamless service delivery. Responsibilities include architecting automation workflows, using AI/ML for predictive monitoring, leading incident response, identifying and eliminating reliability bottlenecks, and maintaining operational runbooks.
Must have:
  • Hands-on experience with Linux Server Administration and Python Programming
  • Experience with containerization and orchestration using Docker and Kubernetes
  • Working with messaging systems like Kafka and ActiveMQ
  • Experience with databases like MySQL and Oracle, and caching solutions like REDIS
  • Understands and applies AI/ML techniques in operations
  • Solid track record in incident management and root cause analysis
  • Proficient in developing and maintaining CI/CD pipelines with a strong emphasis on observability
  • Monitoring and observability using Prometheus, Grafana, and OpenTelemetry
  • Comfortable writing and maintaining runbooks
  • Keeps up-to-date with industry trends such as AIOps and SRE best practices
  • Collaborative mindset, working cross-functionally with engineering, product, and operations teams
  • 1+ years of experience working in a SaaS environment
Good to have:
  • Red Hat Certified System Administrator (RHCSA) – Red Hat
  • AWS / Azure / GCP Certifications
  • Certified Associate in Python Programming (PCAP) – Python Institute
  • Docker Certified Associate (DCA) or Certified Kubernetes Administrator (CKA)
  • Good knowledge of Jenkins
  • Advanced certifications in SRE or related fields
Perks:
  • Competitive compensation, variable bonus and performance reward opportunities, and retirement programs
  • Medical, dental and vision insurance
  • Generous, flexible time off
  • Paid holidays, “wellness” days and company wide end of year break
  • 6 months fully paid parental leave
  • Learning & Development stipend
  • Opportunities to volunteer and give back, including charitable donation match
  • Free resources and support for your mental wellbeing

Job Details

Company Overview

At Zuora, we do Modern Business. We’re helping people subscribe to new ways of doing business that are better for people, companies and ultimately the planet. It’s an approach resulting from the shift to the Subscription Economy that puts customers first by building recurring relationships instead of one-time product sales and focuses on sustainable growth. Through our leading expertise and multi-product suite, we are transforming all industries and working with the world’s most innovative companies to monetize new business models, nurture subscriber relationships and optimize their digital experiences.

 

The Team & Role

Join Zuora’s high-impact Operations team, where you’ll be instrumental in maintaining the reliability, scalability, and performance of our SaaS platform. This role involves proactive service monitoring, incident response, infrastructure service management, and ownership of internal and external shared services to ensure optimal system availability and performance.

You will work alongside a team of skilled engineers dedicated to operational excellence through automation, observability, and continuous improvement. In this cross-functional role, you’ll collaborate daily with Product Engineering & Management, Customer Support, Deal Desk, Global Services, and Sales teams to ensure a seamless and customer-centric service delivery model.

As a core member of the team, you’ll have the opportunity to design and implement operational best practices, contribute to service provisioning strategies, and drive innovations that enhance the overall platform experience. If you’re driven by solving complex problems in a fast-paced environment and are passionate about operational resilience and service reliability, we’d love to hear from you.

This is a hybrid position, so you’ll work both remotely and in the office.

 

Our Tech Stack: Linux Administration, Python, Docker, Kubernetes, MySQL, Kafka, ActiveMQ, Tomcat App & Web, Oracle, Load Balancers, REDIS Cache, Debezium, AWS, WAF, LBs, Jenkins, GitOps, Terraform, Ansible, Puppet, Prometheus, Grafana, Open Telemetry

 

What you’ll do

  • Architect and implement intelligent automation workflows for infrastructure lifecycle management, including self-healing systems, automated incident remediation, and configuration anomaly detection using Infrastructure as Code (IaC) and AI-driven tooling.
  • Leverage predictive monitoring and anomaly detection techniques powered by AI/ML to proactively assess system health, optimize performance, and preempt service degradation or outages.
  • Lead complex incident response efforts, applying deep root cause analysis (RCA) and postmortem practices to drive long-term stability, while integrating automated detection and remediation capabilities.
  • Identify and eliminate reliability bottlenecks through automated performance tuning, dynamic scaling policies, and advanced telemetry instrumentation.
  • Maintain and continuously evolve operational runbooks by incorporating machine learning insights, updating playbooks with AI-suggested resolutions, and identifying automation opportunities for manual steps.
  • Stay abreast of emerging trends in AI for IT operations (AIOps), distributed systems, and cloud-native technologies to influence strategic reliability engineering decisions and tool adoption.

 

Your experience

  • Hands-on experience with Linux Servers Administration and Python Programming.
  • Hands on experience working on Agentic AI
  • Collaborate on developing Multi Agentic Framework to Amplify Operations
  • Deep experience with containerization and orchestration using Docker and Kubernetes, managing highly available services at scale.
  • Working with messaging systems like Kafka and ActiveMQ, databases like MySQL and Oracle, and caching solutions like REDIS.
  • Understands and applies AI/ML techniques in operations, including anomaly detection, predictive monitoring, and self-healing systems.
  • Has a solid track record in incident management, root cause analysis, and building systems that prevent recurrence through automation.
  • Is proficient in developing and maintaining CI/CD pipelines with a strong emphasis on observability, performance, and reliability.
  • Monitoring and observability using Prometheus, Grafana, and OpenTelemetry, with a focus on real-time anomaly detection and proactive alerting.
  • Is comfortable writing and maintaining runbooks and enjoys enhancing them with automation and machine learning insights.
  • Keeps up-to-date with industry trends such as AIOps, distributed systems, SRE best practices, and emerging cloud technologies.
  • Brings a collaborative mindset, working cross-functionally with engineering, product, and operations teams to align system design with business objectives.
  • 1+ years of experience working in a SaaS environment.

 

Nice to haves:

  • Red Hat Certified System Administrator (RHCSA) – Red Hat
  • AWS / Azure / GCP Certifications
  • Certified Associate in Python Programming (PCAP) – Python Institute
  • Docker Certified Associate (DCA) or Certified Kubernetes Administrator (CKA)
  • Good knowledge of Jenkins
  • Advanced certifications in SRE or related fields

#ZEOLife at Zuora

As an industry pioneer, our work is constantly evolving and challenging us in new ways that require us to think differently, iterate often and learn constantly—it’s exciting. Our people, whom we refer to as “ZEOs” are empowered to take on a mindset of ownership and make a bigger impact here. Our teams collaborate deeply, exchange different ideas openly and together we’re making what’s next possible for our customers, community and the world.

As part of our commitment to building an inclusive, high-performance culture where ZEOs feel inspired, connected and valued, we support ZEOs with:

  • Competitive compensation, variable bonus and performance reward opportunities, and retirement programs
  • Medical, dental and vision insurance
  • Generous, flexible time off
  • Paid holidays, “wellness” days and company wide end of year break
  • 6 months fully paid parental leave
  • Learning & Development stipend
  • Opportunities to volunteer and give back, including charitable donation match
  • Free resources and support for your mental wellbeing

Specific benefits offerings may vary by country and can be viewed in more detail during your interview process.

Location & Work Arrangements

Organizations and teams at Zuora are empowered to design efficient and flexible ways of working, being intentional about scheduling, communication, and collaboration strategies that help us achieve our best results. In our dynamic, globally distributed company, this means balancing flexibility and responsibility — flexibility to live our lives to the fullest, and responsibility to each other, to our customers, and to our shareholders. For most roles, we offer the flexibility to work both remotely and at Zuora offices.

Our Commitment to an Inclusive Workplace

Think, be and do you! At Zuora, different perspectives, experiences and contributions matter. Everyone counts. Zuora is proud to be an Equal Opportunity Employer committed to creating an inclusive environment for all.

Zuora does not discriminate on the basis of, and considers individuals seeking employment with Zuora without regards to, race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics.

We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us by sending an email to assistance@zuora.com.

Similar Jobs

Clearwater Analytics - Solutions Consultant

Clearwater Analytics

New York, United States (On-Site)
2 Weeks ago
Nagarro - Senior Staff Engineer, Java Developer

Nagarro

Atlanta, Georgia, United States (On-Site)
8 Months ago
Quantic Dream - HR & Office Administrator

Quantic Dream

Stockholm, Stockholm County, Sweden (On-Site)
1 Month ago
Notion - Global Head of Revenue Enablement

Notion

San Francisco, California, United States (On-Site)
1 Month ago
DMG - Sr Product Manager - Contract Data

DMG

Cincinnati, Ohio, United States (On-Site)
2 Months ago
Capgemini - SAP End to End Solution Architect

Capgemini

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Postman - Senior Backend Engineer, Cloud Platform

Postman

Bengaluru, Karnataka, India (Hybrid)
2 Months ago
Cognite - Senior Solution Architect

Cognite

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (Remote)
1 Month ago
Social Discovery Group - Senior DevOps

Social Discovery Group

Warsaw, Masovian Voivodeship, Poland (Remote)
2 Months ago
Google - Software Engineer III, Infrastructure, Core

Google

(On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Salesforce - Technical Architect - MuleSoft

Salesforce

Tokyo, Japan (Remote)
6 Months ago
bounteous - Senior Java Developer

bounteous

Montreal, Quebec, Canada (Hybrid)
2 Months ago
Varonis  - Junior Security Analyst

Varonis

Morrisville, North Carolina, United States (On-Site)
2 Months ago
Riot Games - Principal Software Engineer, Product Tech-Lead - Unpublished R&D Product

Riot Games

Dublin, County Dublin, Ireland (On-Site)
7 Months ago
Kavalirio - Infor ERP Cloud Solutions Specialist

Kavalirio

Colorado Springs, Colorado, United States (On-Site)
1 Month ago
dun bradstreet - Account Executive III

dun bradstreet

United States (Remote)
1 Week ago
Salesforce - Territory Account Executive - SMB

Salesforce

Mexico City, Mexico (On-Site)
7 Months ago
AiDash - Software Development Engineer III - iOS

AiDash

Bengaluru, Karnataka, India (Hybrid)
2 Weeks ago
Rippling - Director of Engineering - Platform

Rippling

San Francisco, California, United States (On-Site)
2 Months ago
Tekion Corp - Senior Product Engagement Specialist

Tekion Corp

Bengaluru, Karnataka, India (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Costa Rica

Autodesk - Digital Sales Representative

Autodesk

Costa Rica (Hybrid)
4 Weeks ago
Granicus - Software Engineer 3

Granicus

Costa Rica (Remote)
1 Month ago
Zuora - Security Engineer

Zuora

Costa Rica (Remote)
2 Days ago
Autodesk - Senior Manager, Technology Sourcing, Strategic Sourcing

Autodesk

Costa Rica (Remote)
3 Weeks ago
London stock Exchange - ETL Support Engineer

London stock Exchange

Heredia, Costa Rica (On-Site)
1 Month ago
Evolution  - Certification Specialist (Costa Rica)

Evolution

San José, San José Province, Costa Rica (On-Site)
1 Year ago
Survay Monkey - Demand Generation Manager

Survay Monkey

Heredia, Costa Rica (Hybrid)
5 Days ago
Zuora - Technical Account Manager

Zuora

Costa Rica (Remote)
1 Month ago
Granicus - Customer Success Consultant - Strategic Accounts

Granicus

Costa Rica (Remote)
4 Weeks ago
Veeam Software - Sales Development Representative - FRENCH & ENGLISH

Veeam Software

San José Province, Costa Rica (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Google - Senior Staff Software Engineer, Google Cloud

Google

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Zazz - Solutions Architect - Mobile App Development

Zazz

India (On-Site)
6 Months ago
Nintendo - Senior Engineer, Cloud (NTD)

Nintendo

Redmond, Washington, United States (On-Site)
11 Months ago
Ion - Senior DevOps Engineer

Ion

Budapest, Hungary (On-Site)
7 Months ago
Patreon - Site Reliability Engineer

Patreon

New York, New York, United States (Remote)
3 Months ago
Palo Alto Networks - Senior Consulting Director, Cloud Security, Proactive Services (Unit 42)

Palo Alto Networks

Dallas, Texas, United States (Remote)
2 Days ago
Thousand Eyes - Senior Site Reliability Engineer II, Efficiency and Performance

Thousand Eyes

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Unity - Automation Infrastructure Engineer

Unity

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
2 Months ago
high radius - Lead Infrastructure Engineer

high radius

Hyderabad, Telangana, India (On-Site)
1 Month ago
Sonar Source - Solutions Engineer - Dubai

Sonar Source

Dubai, Dubai, United Arab Emirates (Remote)
8 Months ago

Get notifed when new similar jobs are uploaded

About The Company

15 years ago, Zuora was born out of a vision that we could evangelize a fundamentally new way of doing business by shifting the focus of companies to deliver recurring, people-centric services instead of a one-time sale of products. This is how we coined the term, the Subscription Economy®.


Today, we see others evangelizing this term, and building entire communities around it. The Subscription Economy isn’t (and never was) just about subscription business models but, direct, recurring relationships with customers through any business model. Subscriptions were only just scratching the surface and now, the market recognizes the Subscription Economy for what it truly is-a relationship-centric economy. Companies have realized that the path to growth going forward is to establish direct, digital relationships with their customers, and to nurture and monetize these relationships through an ever growing set of digital services.


Alongside this evolution, Zuora has been there every step of the way. We started with Zuora Billing, and have expanded our award-winning multi-product portfolio to include Zuora Revenue, Zuora Payments and Zuora Central Platform. More recently, we’ve added subscription experience platform Zephr to our family, further expanding our capabilities to serve as an intelligent hub that monetizes the complete quote to cash and revenue recognition process at scale. We call this Monetization.

Chennai, Tamil Nadu, India (On-Site)

Bengaluru, Karnataka, India (Remote)

United States (Remote)

Chennai, Tamil Nadu, India (Hybrid)

United States (Remote)

Redwood City, California, United States (On-Site)

Chennai, Tamil Nadu, India (Remote)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

View All Jobs

Get notified when new jobs are added by Zuora

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug