Staff Site Reliability Engineer

1 Week ago • 6 Years + • DevOps

About the job

Summary

Zscaler seeks an experienced Staff Site Reliability Engineer to develop infrastructure, tools, and platforms for high-quality, reliable, and scalable services. This role involves close collaboration with various teams to enhance observability, automation, configuration management, continuous deployment, and reliability practices. Responsibilities include developing scalable portals for SRE dashboards and integrating SRE/AI Co-Pilot solutions. The ideal candidate will design a Chaos Engineering platform and possess strong experience in software development, cloud-SRE, DevOps, or system engineering. The position requires proficiency in various tools and technologies including but not limited to: Observability tools (OpenTelemetry/Prometheus/Grafana), databases (PostgreSQL, Redis), and SRE tools (Terraform, Kubernetes).
Must have:
  • 6+ years experience in software development
  • Cloud-SRE, DevOps, or System Engineering background
  • Experience in Infrastructure Management, Observability, Automation, and CI/CD
  • Portal Development expertise
  • SRE/AI Co-Pilot platform development
  • Chaos Engineering platform design
Good to have:
  • Bachelor's degree in Computer Science or related field
  • Proficiency in Java/Node.js/Python/Shell, React/Angular, Linux
  • Experience with OpenTelemetry/Prometheus/Grafana, ELK stack
  • Proficiency in PostgreSQL, OLAP/Time Series/Analytics DBs, Redis
  • Experience with Terraform, Puppet, Ansible, Docker, Kubernetes, Kafka, Spark, and Splunk
Perks:
  • Various health plans
  • Time off plans
  • Parental leave options
  • Retirement options
  • Education reimbursement
  • In-office perks
Not hearing back from companies?
Unlock the secrets to a successful job application and accelerate your journey to your next opportunity.

About Zscaler

Serving thousands of enterprise customers around the world including 40% of Fortune 500 companies, Zscaler (NASDAQ: ZS) was founded in 2007 with a mission to make the cloud a safe place to do business and a more enjoyable experience for enterprise users. As the operator of the world’s largest security cloud, Zscaler accelerates digital transformation so enterprises can be more agile, efficient, resilient, and secure. The pioneering, AI-powered Zscaler Zero Trust Exchange™ platform protects thousands of enterprise customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location. 

Named a Best Workplace in Technology by Fortune and others, Zscaler fosters an inclusive and supportive culture that is home to some of the brightest minds in the industry. If you thrive in an environment that is fast-paced and collaborative, and you are passionate about building and innovating for the greater good, come make your next move with Zscaler. 

Our Engineering team built the world's largest cloud security platform from the ground up, and we keep building. With more than 100 patents and big plans for enhancing services and increasing our global footprint, the team has made us and our multitenant architecture today's cloud security leader, with more than 15 million users in 185 countries. Bring your vision and passion to our team of cloud architects, software engineers, security experts, and more who are enabling organizations worldwide to harness speed and agility with a cloud-first strategy.

We're looking for an experienced Staff Site Reliability Engineer to join our SRE team. Reporting to the Senior Director, SRE, you'll be responsible for:

  • You will develop the infrastructure, tools, services, and platforms that allow our operations and primary product teams to deliver high-quality, reliable, and scalable services, enhancing the customer experience.
  • Your role requires close collaboration with several teams to improve our observability, automation, configuration management, continuous deployment, and reliability practices.

Job Location - Bangalore / Hyderabad

What We're Looking for (Minimum Qualifications)

  • Minimum of 6+ years of hands-on experience in software development within Cloud-SRE, DevOps, or System Engineering, with a background in developing Infrastructure Management, Observability, Automation, and CI/CD systems.
  • Portal Development: Design scalable portals for SRE dashboards, SLI/SLO/SLA management, error budgets, and executive dashboards to support data-driven decision-making.
  • SRE/AI Co-Pilot Platform: Develop and integrate SRE/AI Co-Pilot solutions to enhance the efficiency of operations and engineering teams.
  • Collaboration: Work with product, operations, and security teams to ensure seamless integration and deployment of new tools, features, and updates across the cloud
  • Chaos Engineering: Design a Chaos Engineering platform to guide failure modes and effects analysis, ensuring maximum resilience and scalability of infrastructure and applications.

What Will Make You Stand Out (Preferred Qualifications)

  • Bachelor's degree in computer science, a related technical field involving computer systems engineering, or equivalent practical experience.
  • Proficiency in a combination of Site Reliability Engineering and any Software Development languages - Java/Node.js/Python/Shell, React/Angular (UI), Operating systems Linux. Observability - OpenTelemetry/Prometheus/Grafana,ELK stack or any enterprise monitoring platform.
  • Proficiency in Databases - PostgreSQL, OLAP/Time Series/Analytics DBs, Redis. SRE Tools: Terraform, Puppet, Ansible, ETL, Docker, Kubernetes, Kafka, Spark, and Splunk. QA/Testing: Experience with relevant testing tools and frameworks.

#LI-Onsite

#LI-SK3

At Zscaler, we believe that diversity drives innovation, productivity, and success. We are looking for individuals from all backgrounds and identities to join our team and contribute to our mission to make doing business seamless and secure. We are guided by these principles as we create a representative and impactful team, and a culture where everyone belongs. For more information on our commitments to Diversity, Equity, Inclusion, and Belonging, visit the Corporate Responsibility page of our website.

Our Benefits program is one of the most important ways we support our employees. Zscaler proudly offers comprehensive and inclusive benefits to meet the diverse needs of our employees and their families throughout their life stages, including:

  • Various health plans
  • Time off plans for vacation and sick time
  • Parental leave options
  • Retirement options
  • Education reimbursement
  • In-office perks, and more!

By applying for this role, you adhere to applicable laws, regulations, and Zscaler policies, including those related to security and privacy standards and guidelines.

Zscaler is proud to be an equal opportunity and affirmative action employer. We celebrate diversity and are committed to creating an inclusive environment for all of our employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex (including pregnancy or related medical conditions), age, national origin, sexual orientation, gender identity or expression, genetic information, disability status, protected veteran status or any other characteristics protected by federal, state, or local laws.

See more information by clicking on the Know Your Rights: Workplace Discrimination is Illegal link.

Pay Transparency

Zscaler complies with all applicable federal, state, and local pay transparency rules. For additional information about the federal requirements, click here.

Zscaler is committed to providing reasonable support (called accommodations or adjustments) in our recruiting processes for candidates who are differently abled, have long term conditions, mental health conditions or sincerely held religious beliefs, or who are neurodivergent or require pregnancy-related support.

View Full Job Description

About The Company

Axonius gives customers the confidence to control complexity by mitigating threats, navigating risk, automating response actions, and informing business-level strategy. With solutions for both cyber asset attack surface management (CAASM) and SaaS management, Axonius is deployed in minutes and integrates with hundreds of data sources to provide a comprehensive asset inventory, uncover gaps, and automatically validate and enforce policies. Cited as one of the fastest-growing cybersecurity startups, with accolades from CNBC, Forbes, and Fortune, Axonius covers millions of assets, including devices and cloud assets, user accounts, and SaaS applications, for customers around the world. For more, visit Axonius.com.

California, United States (Remote)

England, United Kingdom (Remote)

California, United States (On-Site)

California, United States (Hybrid)

California, United States (On-Site)

New York, United States (Remote)

View All Jobs

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug