Site Reliability Engineering Manager

undefined ago • 5 Years + • Devops

Job Summary

Job Description

Aerospike is seeking a founding Site Reliability Engineering (SRE) Manager for a new AU-based team. This hands-on leadership role involves building and shaping a high-performing regional SRE group to ensure the uptime, reliability, and scalability of Aerospike Cloud deployments globally. The manager will set technical standards, establish team norms, act as a key escalation point, and collaborate with global teams to deliver high-quality operational outcomes, fostering a culture of ownership and technical excellence.
Must have:
  • Build and lead a new regional SRE team in Australia
  • Hire, onboard, and mentor top-tier engineers
  • Set technical direction for the AU team
  • Ensure strong execution across observability, reliability engineering, automation, and incident management
  • Participate in design reviews, architectural decisions, and complex troubleshooting efforts
  • Coordinate with other regional SRE leaders for consistent follow-the-sun coverage
  • Own and refine SRE processes for the region
  • Champion automation-first approaches and eliminate toil
  • Ensure on-call procedures follow industry standard best practices
  • Manage on-call schedules and escalation policies in PagerDuty
  • Participate in manager on-call duties
  • Drive incident retrospectives, root cause analyses, and on-call remediation activities
  • 5+ years providing 24x7 production support for cloud-based, business-critical systems
  • Demonstrated leadership in managing operations for enterprise-class organizations
  • 2+ years of experience in technical leadership or management roles
  • 2+ years of demonstrable experience with AWS, Google, or Azure
  • Experience with continuous integration/continuous deployment (CI/CD) pipelines
  • Experience with automation pipelines for cloud infrastructure and software using Terraform, Packer, and Ansible
  • Experience supporting distributed, multitenant, auto-scalable backend services
  • Experience with NoSQL or relational databases and database fundamentals
  • Experience with maintaining distributed services on virtual machines and containers (Docker) with orchestration (Kubernetes, EKS, GKE)
  • Experience with documenting complex procedures and architectures, including diagramming
  • Experience with cryptographic fundamentals and best practices
  • Experience assessing security vulnerabilities in code and running systems
Good to have:
  • Experience building or leading new engineering teams, especially within a globally distributed organization
  • Experience with large-scale, distributed systems or databases in public cloud environments
  • Knowledge of secure multi-cloud operations (AWS, Azure, GCP) in compliance-conscious environments (SOC2, ISO27001, etc.)
  • Linux administration and troubleshooting
  • Administering operational infrastructure
  • Scripting or operations engineering, preferably with bash, Python and golang
  • Command-line utilities such as grep, ssh, etc.
  • Version control systems, preferably Git
  • Secrets management systems, preferably cloud-native or Hashicorp Vault
  • Vulnerability management systems, preferably Github Dependabot, Snyk, and Tenable
  • Expertise with modern observability platforms such as Grafana, Prometheus, Elasticsearch, Datadog, Honeycomb
  • JIRA for issue tracking
  • Agile software methodologies such as SCRUM and Kanban
  • Software development experience using Aerospike or similar distributed databases

Job Details

Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.

Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank, Experian, Grab, HDFC Bank, PayPal, Sony Interactive Entertainment, The Trade Desk, and Wayfair, rely on Aerospike for customer 360, fraud detection, real-time bidding, profile stores, recommendation engines, and other use cases.

At Aerospike, we dream big and deliver even bigger. Our mission is to unleash the power of the world’s real-time data with a database built for infinite scale, speed, and sustainability.

If you're ready to shape the future of data, join us.

As the founding Site Reliability Engineering (SRE) Manager for our new AU-based team, you will play a critical leadership role in building and shaping a high-performing regional SRE group from the ground up. This team will be instrumental in ensuring the uptime, reliability, and scalability of Aerospike Cloud deployments across multiple global cloud product offerings.

This is a hands-on leadership role—you’ll set the technical bar, establish team norms, and act as a key escalation point for your region. You’ll work closely with peers, partners in product, engineering, and customer success across the US, EMEA, and APAC to deliver high-quality operational outcomes, while fostering a culture of ownership, resilience, and technical excellence.

Key Responsibilities

  • Team Formation & Leadership: Build and lead a new regional SRE team in Australia. Hire, onboard, and mentor top-tier engineers, establishing a high-performance culture grounded in psychological safety, continuous learning, and a bias for automation. We want leaders who can attract, develop, and retain top talent.
  • Technical Oversight: Set the technical direction for the AU team, ensuring strong execution across observability, reliability engineering, automation, and incident management. Be a hands-on leader who can participate in design reviews, architectural decisions, and complex troubleshooting efforts while sharing your expertise across engineering
  • Global Collaboration: Coordinate with other regional SRE leaders to drive consistent follow-the-sun coverage, enabling seamless handoffs and continuity of operations across time zones. Coordinate with Account Management and Professional Services to ensure successful onboarding of new cloud customers and maintenance activities for existing ones.
  • Operational Excellence: Own and refine SRE processes for the region, ensuring system health, performance, and availability. Champion automation-first approaches, eliminate toil, and scale operations through tools—not just headcount. Help set the maturity bar for products looking to earn the right to have SREs help bring their products to Aerospike Cloud customers.
  • On-Call Procedures: Ensure on-call procedures follow industry standard best practices, manage schedules and escalation policies in PagerDuty, and participate in the manager on-call duties. Drive incident retrospectives, root cause analyses, and on-call remediation activities.

Required Experience

  • 5+ years providing 24x7 production support for cloud-based, business-critical systems, with demonstrated leadership in managing operations for enterprise-class organizations during challenging situations (e.g., service incidents, degradations, disaster recovery, etc.)
  • 2+ years of experience in technical leadership or management roles
  • 2+ years of demonstrable experience with at least one of the major public cloud providers: AWS, Google, Azure
  • Experience with continuous integration/continuous deployment (CI/CD) pipelines.
  • Experience with automation pipelines for cloud infrastructure and software using technologies such as Terraform, Packer, and Ansible
  • Experience supporting distributed, multitenant, auto-scalable backend services
  • Experience with NoSQL or relational databases, and database fundamentals, including data storage, data replication, data modeling, and data access patterns
  • Experience with maintaining distributed services on both virtual machines and containers (Docker) with orchestration (Kubernetes, EKS, GKE)
  • Experience with documenting complex procedures and architectures, including diagramming
  • Experience with cryptographic fundamentals and best practices
  • Experience assessing security vulnerabilities in code and running systems

Preferred Skills and Qualifications

  • Experience building or leading new engineering teams, especially within a globally distributed organization, is a strong plus.
  • Experience with large-scale, distributed systems or databases in public cloud environments
  • Knowledge of secure multi-cloud operations (AWS, Azure, GCP) in compliance-conscious environments (SOC2, ISO27001, etc.)
  • Linux administration and troubleshooting
  • Administering operational infrastructure
  • Scripting or operations engineering, preferably with bash, Python and golang
  • Command-line utilities such as grep, ssh, etc.
  • Version control systems, preferably Git
  • Secrets management systems, preferably cloud-native or Hashicorp Vault
  • Vulnerability management systems, preferably Github Dependabot, Snyk, and Tenable
  • Expertise with modern observability platforms such as Grafana, Prometheus, Elasticsearch, Datadog, Honeycomb
  • JIRA for issue tracking
  • Agile software methodologies such as SCRUM and Kanban
  • Software development experience using Aerospike or similar distributed databases

Aerospike is an Equal Opportunity Employer. We are committed to providing an environment free from discrimination on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other basis covered by appropriate law.

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Australia

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Devops Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Headquartered in Mountain View, California, Aerospike also has a global presence with offices in London, Bangalore, and Tel Aviv. Aerospike does not accept resumes from staffing agencies with which we do not have a written agreement and specific engagement for a particular opening. Our employment activities, inquiries, and offers are managed through our HR/Talent department, and all candidates are presented through this channel only. We do not accept unsolicited resumes.

United States (Remote)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

United States (Remote)

Mountain View, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Mountain View, California, United States (On-Site)

Bengaluru, Karnataka, India (Hybrid)

View All Jobs

Get notified when new jobs are added by AeroSpike

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug