Staff Site Reliability Engineer

1 Month ago • 8 Years + • Devops

Job Summary

Job Description

Aerospike is seeking a Staff Site Reliability Engineer (SRE) to architect, build, and optimize enterprise-scale, highly resilient cloud platform infrastructure and services. The role involves establishing reliability, performance, and automation standards, driving infrastructure initiatives across multiple teams, implementing monitoring and observability, and leading improvements to enhance system efficiency, scalability, and stability. Key responsibilities include architecting cloud platforms, developing automation and tooling, setting monitoring standards, leading incident response and root cause analysis, implementing security best practices, collaborating with development teams, serving as an escalation point for incidents, establishing documentation standards, leading capacity planning, and mentoring engineers.
Must have:
  • 8+ years in SRE, DevOps, or related fields
  • Experience leading complex infrastructure projects
  • Deep knowledge of public cloud providers (AWS, GCP, Azure)
  • Advanced proficiency in automation, tooling, and infrastructure solutions
  • Extensive experience in CI/CD pipeline design
  • Deep understanding of Linux/Unix, networking, and distributed systems
  • Proficiency in scripting/development (Python, Bash, Go)
  • Extensive experience with Docker and Kubernetes
  • In-depth experience with monitoring, logging, and observability tools
  • Advanced problem-solving skills with an engineering mindset
  • Extensive experience implementing cloud security best practices
  • Excellent communication and influence skills
Good to have:
  • Experience managing and optimizing database deployments
  • Deep expertise with Aerospike or other distributed NoSQL databases
  • Comprehensive understanding of security principles in cloud environments
  • Advanced industry certifications (AWS, GCP, Azure)
  • Advanced Kubernetes certifications (CKA, CKD, CKS)
  • Advanced proficiency with configuration management tools
  • Experience leading technical initiatives and mentoring

Job Details

About Aerospike

Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.

Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank, Experian, Grab, HDFC Bank, PayPal, Sony Interactive Entertainment, The Trade Desk, and Wayfair, rely on Aerospike for customer 360, fraud detection, real-time bidding, profile stores, recommendation engines, and other use cases

Headquartered in Mountain View, California, Aerospike has a global presence with offices in London, Bangalore, and Tel Aviv.

In Bengaluru we follow hybrid models with mandate two days’ work from office.

Site Reliability Engineer

As a Staff Site Reliability Engineer (SRE) for Aerospike, you will be instrumental in architecting, building, and optimizing enterprise-scale, highly resilient cloud platform infrastructure and services. You will focus on establishing reliability, performance, and automation standards to ensure seamless delivery and operation across our cloud platform ecosystem. Your responsibilities will include driving robust infrastructure initiatives across multiple teams, implementing organization-wide monitoring and observability practices, and leading strategic improvement initiatives that enhance system efficiency, scalability, and overall platform stability at enterprise scale.

Key Responsibilities

  • Architecting, deploying, and optimizing enterprise-scale Aerospike cloud platform infrastructure and services across multiple environments
  • Driving the development and standardization of automation, tooling, and infrastructure solutions across multiple engineering teams to improve efficiency at scale
  • Building and establishing monitoring, alerting, and observability standards and implementations across the organization with cutting-edge solutions and best practices
  • Leading complex incident response activities across multiple teams, conducting detailed root cause analysis, and driving systematic improvements
  • Establishing and implementing security best practices and standards for cloud platform infrastructure and services impacting multiple teams
  • Collaborating with development teams and engineering leadership to ensure reliable service delivery and alignment with enterprise-scale SRE best practices
  • Serving as escalation point for critical production incidents, coordinating cross-team mitigation strategies
  • Establishing documentation standards, runbooks, and knowledge sharing practices for operational excellence
  • Leading capacity planning and performance optimization efforts at enterprise scale
  • Mentoring engineers across teams and sharing knowledge to build technical capabilities

Required Experience

  • 8+ years of experience in Site Reliability Engineering (SRE), DevOps, or related fields, with a focus on architecting scalable, resilient, and automated enterprise-scale systems
  • Experience leading complex infrastructure projects, driving measurable improvements in system reliability and performance
  • Deep knowledge of multiple public cloud providers (AWS, Google Cloud, Azure), including advanced cloud-native services and architectures
  • Advanced proficiency in automation, tooling, and infrastructure solutions to enable enterprise-scale automated and reproducible infrastructure
  • Extensive experience in CI/CD pipeline design and implementation, enabling seamless, automated software delivery and infrastructure updates at scale
  • Deep understanding of Linux/Unix systems, advanced networking concepts, and distributed system architectures
  • Comprehensive proficiency in scripting and software development using Python, Bash, Go, or similar languages to build sophisticated automation, tooling, and infrastructure solutions
  • Extensive experience with containerization and orchestration technologies such as Docker and Kubernetes for enterprise-scale service deployment and scaling
  • In-depth experience with monitoring, logging, and observability tools and methodologies to drive data-driven system improvements across multiple teams
  • Advanced problem-solving skills with an engineering-first mindset for improving system reliability, scalability, and performance at enterprise scale
  • Extensive experience implementing security best practices for cloud infrastructure, access control, and data protection across multiple teams
  • Excellent communication and influence skills to collaborate effectively across multiple teams and drive technical decisions

Preferred Skills and Qualifications

  • Extensive experience managing and optimizing database deployments and services in production environments at enterprise scale, ensuring high availability and performance
  • Deep expertise with Aerospike or other distributed NoSQL databases, including advanced features and enterprise-scale deployment optimization
  • Comprehensive understanding of security principles and implementation in complex cloud environments across multiple teams
  • Advanced industry certifications, such as AWS Solutions Architect Professional, Google Professional Cloud Architect, Azure Solutions Architect Expert, or equivalent
  • Advanced Kubernetes certifications (CKA, CKD, CKS) with extensive experience managing Kubernetes at enterprise scale
  • Advanced proficiency with configuration management and automation tools in complex, multi-team environments
  • Experience leading technical initiatives, mentoring, and driving best practices across multiple engineering teams.

 Aerospike is an Equal Opportunity Employer. We are committed to providing an environment free from discrimination on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other basis covered by appropriate law.

Similar Jobs

Pomelo - Market Operations Lead (2pm-10pm ET Shift)

Pomelo

United States (Remote)
1 Month ago
Playdawn Consulting - Senior Software Engineer (Games)

Playdawn Consulting

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Philips - Intern

Philips

Pune, Maharashtra, India (On-Site)
3 Weeks ago
AeroSpike - Performance Tooling Engineer

AeroSpike

United States (On-Site)
3 Months ago
Fandom  - Executive Assistant

Fandom

San Francisco, California, United States (Hybrid)
1 Month ago
Palo Alto Networks - Senior Principal DevSecOps Engineer

Palo Alto Networks

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
4 Weeks ago
Nice - Solution Architect

Nice

Hoboken, New Jersey, United States (Hybrid)
1 Month ago
Apple - On-device ML Infrastructure Engineer (ML Execution)

Apple

Cupertino, California, United States (On-Site)
3 Months ago
Nagarro - Staff Engineer, Cloud

Nagarro

India (Remote)
10 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Tencent - Senior Product Solution Architect (Tencent Cloud Enterprise)

Tencent

Hong Kong (On-Site)
5 Months ago
luxsoft - Compiler Engineer

luxsoft

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Qualcomm - Lead Engineer, Senior - C/C++(Android)

Qualcomm

Hyderabad, Telangana, India (On-Site)
3 Months ago
deel. - Senior Payroll Associate | Australia

deel.

Philippines (Remote)
3 Weeks ago
Big time  - Chinese Content Manager

Big time

(Remote)
1 Year ago
NinjaVan - Field Sales Executive

NinjaVan

Pekanbaru, Riau, Indonesia (Hybrid)
2 Months ago
Gunzilla - Lead Materials Artist

Gunzilla

Kyiv, Kyiv City, Ukraine (On-Site)
4 Months ago
Sportradar - Camera Operator

Sportradar

Allentown, Pennsylvania, United States (On-Site)
9 Months ago
Ramboll3 - HR Coordinator

Ramboll3

Gurugram, Haryana, India (On-Site)
3 Weeks ago
Spaulding Ridge - Client Director

Spaulding Ridge

Chicago, Illinois, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

ISS Stoxx - Site Reliability Engineer

ISS Stoxx

Mumbai, Maharashtra, India (On-Site)
3 Months ago
AccelData - Senior Platform Engineer

AccelData

Bengaluru, Karnataka, India (On-Site)
10 Months ago
deel. - Regional Manager, Immigration | APAC

deel.

India (Remote)
3 Weeks ago
PhonePe - Area Sales Manager

PhonePe

Dehradun, Uttarakhand, India (On-Site)
3 Months ago
Impronics Technologies - AWS Cloud Engineer

Impronics Technologies

Gurugram, Haryana, India (On-Site)
1 Year ago
Precisly - Video Solutions

Precisly

India (On-Site)
1 Month ago
Netomi - SDE II/III Backend

Netomi

India (Remote)
3 Months ago
PwC - IN_Manager_ Financial Due Diligence_Transaction services_Advisory_Mumbai

PwC

Mumbai, Maharashtra, India (On-Site)
3 Weeks ago
Capgemini - Apriso Lead

Capgemini

Pune, Maharashtra, India (On-Site)
2 Months ago
Ruselle Investments - Manager, Application Development

Ruselle Investments

Mumbai, Maharashtra, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Bungie - Senior Infrastructure Engineer

Bungie

(Hybrid)
4 Months ago
Nagarro - Associate Staff Engineer, DevOps

Nagarro

(On-Site)
9 Months ago
Match Group - Backend Software Engineer (Matching Platform)

Match Group

Seoul, South Korea (Hybrid)
2 Months ago
Ziff Davis - DevOps Engineer

Ziff Davis

(Remote)
3 Months ago
Reltio - Senior DevOps Engineer

Reltio

North Carolina, United States (On-Site)
1 Month ago
Adobe - Software Development Engineer, Site Reliability Engineering

Adobe

Bucharest, Bucharest, Romania (On-Site)
3 Months ago
Nagarro - Associate Staff Engineer, Cloud

Nagarro

Hyderabad, Telangana, India (On-Site)
10 Months ago
Match Group - Site Reliability Engineer

Match Group

Seoul, South Korea (Hybrid)
9 Months ago
Trellix - DevOps/Software Engineer

Trellix

Cork, County Cork, Ireland (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Headquartered in Mountain View, California, Aerospike also has a global presence with offices in London, Bangalore, and Tel Aviv. Aerospike does not accept resumes from staffing agencies with which we do not have a written agreement and specific engagement for a particular opening. Our employment activities, inquiries, and offers are managed through our HR/Talent department, and all candidates are presented through this channel only. We do not accept unsolicited resumes.

United States (Remote)

Mountain View, California, United States (On-Site)

Mountain View, California, United States (On-Site)

Mountain View, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Mountain View, California, United States (On-Site)

Bengaluru, Karnataka, India (Hybrid)

Bengaluru, Karnataka, India (Hybrid)

View All Jobs

Get notified when new jobs are added by AeroSpike

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug