Senior Staff Site Reliability Engineer (Cortex Observability)

1 Hour ago • 5 Years + • Devops • $126,000 PA - $203,500 PA

Job Summary

Job Description

Palo Alto Networks is seeking a Senior Staff Site Reliability Engineer for their Cortex Observability team. This role involves operating and maintaining a large-scale GCP environment, focusing on the design, implementation, and enhancement of observability systems. The engineer will leverage deep knowledge of modern observability tools, including high cardinality metrics, tracing, and large-scale logging solutions. Responsibilities include collaborating with engineering teams to provide actionable insights into system performance and health, monitoring cloud platforms (GCP or AWS), improving monitoring processes and alerts, managing incidents efficiently, automating monitoring and alerting tasks, continuously evaluating and implementing new technologies, providing follow-the-sun operational coverage, and influencing the operability of the product to ensure service reliability and availability.
Must have:
  • 5+ years DevOps/SRE experience
  • High proficiency with observability tools (Thanos, Prometheus, Grafana, Open Telemetry)
  • Incident and alert management using Pagerduty, Prometheus Alert Manager
  • High proficiency in GCP or AWS
  • High proficiency with Kubernetes and Docker
  • High proficiency in Python and Linux Shell
  • Experience with Ansible and Terraform
  • Effective communication and interpersonal skills
  • Ability to troubleshoot complex problems
  • Ability to operate independently and take responsibility
Perks:
  • FLEXBenefits wellbeing spending account
  • Mental and financial health resources
  • Personalized learning opportunities
  • Restricted stock units
  • Bonus

Job Details

Company Description

Our Mission

At Palo Alto Networks® everything starts and ends with our mission:

Being the cybersecurity partner of choice, protecting our digital way of life.
Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for innovators who are as committed to shaping the future of cybersecurity as we are.

Who We Are

We take our mission of protecting the digital way of life seriously. We are relentless in protecting our customers and we believe that the unique ideas of every member of our team contributes to our collective success. Our values were crowdsourced by employees and are brought to life through each of us everyday - from disruptive innovation and collaboration, to execution. From showing up for each other with integrity to creating an environment where we all feel included.

As a member of our team, you will be shaping the future of cybersecurity. We work fast, value ongoing learning, and we respect each employee as a unique individual. Knowing we all have different needs, our development and personal wellbeing programs are designed to give you choice in how you are supported. This includes our FLEXBenefits wellbeing spending account with over 1,000 eligible items selected by employees, our mental and financial health resources, and our personalized learning opportunities - just to name a few!

At Palo Alto Networks, we believe in the power of collaboration and value in-person interactions. This is why our employees generally work full time from our office with flexibility offered where needed. This setup fosters casual conversations, problem-solving, and trusted relationships. Our goal is to create an environment where we all win with precision.

Job Description

Your Career

The Cortex team builds and delivers the industry’s most advanced SecOps platform, consisting of XDR, XSIAM, XSOAR, and XPANSE. As a member of the Cortex DevOps team,  your role involves operating and maintaining a large-scale GCP environment, including the design, implementation, and continuous enhancement of our comprehensive observability systems. To meet the opportunities that such a role provides, you will have a deep knowledge of modern observability and monitoring tools and practices, having managed high cardinality metrics, implemented tracing, and operationalized large-scale logging solutions. As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and actionable insights into our systems’ performance and health.

Your Impact

As a Senior Staff SRE with the Cortex Observability team, you will:

  • Cloud Expertise: Utilize your expertise in monitoring cloud platforms, particularly GCP, to optimize our infrastructure, leveraging cloud-native technologies
  • Monitoring Expertise: Improve monitoring processes, alerts, and metrics. Work with development teams to ensure that all of our services have the right monitoring and metrics in place so that we detect problems before our customers do
  • Incident Management: Leverage incident management processes to ensure efficient resolution of system issues and minimal impact on services
  • Automation: Automate complex monitoring and alerting tasks by building tools for cloud operations, such as automated remediation of known issues and auto-scaling
  • Continuously Improve: Stay up-to-date with cutting-edge technologies, evaluate their potential impact on our operations, and implement them when appropriate
  • On-Call: Provide follow-the-sun operational coverage in the production of our Observability infrastructure
  • Collaborate: Work with our Engineering team to influence the operability of the product and ensure the reliability and availability of our services

Qualifications

Your Experience 

  • DevOps/SRE Expertise: 5+ years of experience as a DevOps/SRE engineer with a passion for technology and a strong motivation for high reliability at the service level
  • Observability Tools: High proficiency with Thanos, Prometheus, Grafana, Open Telemetry and other monitoring tools
  • Incident and Alerts Management: Clear understanding of incident and alerts management using tools like Pagerduty and Prometheus Alert Manager
  • Cloud Proficiency: High proficiency in either Google Cloud Platform or Amazon Web Services
  • Kubernetes and Docker: High proficiency with Kubernetes and Docker for container orchestration
  • Scripting and Automation: High proficiency in Python programming and Linux Shell commands. Experience with Ansible and Terraform for infrastructure as code
  • Communication Skills: Effective communication and interpersonal skills, with the ability to work and coordinate between multiple teams in different time zones
  • Troubleshooting: Ability to effectively troubleshoot and address emerging and complex problems
  • Independence: Ability to operate independently, make decisions, take action, and take responsibility

Additional Information

The Team

We’re trailblazers who dream big, take risks, and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating together.

Compensation Disclosure

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be between $126000/YR - $203500/YR The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here.

Our Commitment

We’re problem solvers that take risks and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together.

We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at  accommodations@paloaltonetworks.com.

Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.

All your information will be kept confidential according to EEO guidelines.

Similar Jobs

Apexon - Embedded Engineer

Apexon

Baytown, Texas, United States (On-Site)
2 Weeks ago
Reddit - Senior Engineering Manager, Android Platform

Reddit

United States (Remote)
1 Month ago
Take-Two Interactive - Senior Information Security Analyst

Take-Two Interactive

New York, United States (Hybrid)
1 Month ago
Sword Health - Associate to CEO

Sword Health

Portugal (Hybrid)
6 Months ago
Playdawn Consulting - Motion Graphics Artist

Playdawn Consulting

Bengaluru, Karnataka, India (On-Site)
3 Weeks ago
Unity - Mobile Automation Engineer

Unity

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
3 Months ago
Sword Health - Senior DevOps Engineer

Sword Health

Porto, Porto District, Portugal (Hybrid)
6 Months ago
BigID - Site Reliability Engineer

BigID

Buenos Aires, Buenos Aires, Argentina (Remote)
3 Weeks ago
Loft Orbital - Senior SRE / DevOps

Loft Orbital

Toulouse, Occitanie, France (Hybrid)
8 Months ago
AeroSpike - Senior DevOps Engineer, Cloud

AeroSpike

United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Rackspace Technology - Principal MLOps Engineer

Rackspace Technology

San Antonio, Texas, United States (Remote)
3 Months ago
Sailpoint - Project Manager

Sailpoint

Mexico City, Mexico (Remote)
1 Month ago
TransUnion - Senior Advisor, Product Excellence

TransUnion

Chicago, Illinois, United States (Hybrid)
1 Month ago
Nintendo - Account Planner - Publisher and Developer Relations

Nintendo

Redmond, Washington, United States (On-Site)
3 Months ago
Tesla - Service Manager

Tesla

Vilnius, Vilnius County, Lithuania (On-Site)
5 Months ago
The Walt Disney Company - Spa Esthetician

The Walt Disney Company

Anaheim, California, United States (On-Site)
2 Months ago
GHX - Technical Account Manager

GHX

United States (On-Site)
1 Week ago
Morning Star - Assistant Vice President, Credit Ratings, US RMBS

Morning Star

New York, New York, United States (Hybrid)
9 Months ago
Reddit - Senior Staff Engineer, GraphQL

Reddit

United States (Remote)
1 Month ago
Visa - Compliance Plan Lead

Visa

Auckland, Auckland, New Zealand (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Rippling - Senior Customer Engineer

Rippling

New York, United States (On-Site)
4 Months ago
WebTech Corporation - Machine Operator

WebTech Corporation

Export, Pennsylvania, United States (On-Site)
1 Month ago
Divensi - SDE/Software Development Engineer

Divensi

Redmond, Washington, United States (On-Site)
8 Years ago
Apple - Partner Engineer/Program Manager

Apple

San Diego, California, United States (On-Site)
1 Month ago
Riot Games - Manager, QA - Competitive, VALORANT

Riot Games

Los Angeles, California, United States (On-Site)
2 Months ago
MiQ - Marketing Copywriter

MiQ

New York, United States (Hybrid)
2 Weeks ago
Notion - Technical Recruiter

Notion

San Francisco, California, United States (On-Site)
3 Days ago
Fandom  - Manager, Video Production

Fandom

San Francisco, California, United States (Hybrid)
3 Weeks ago
Next Level Business Services - Solution Architect

Next Level Business Services

Mount Laurel Township, New Jersey, United States (On-Site)
8 Months ago
The Walt Disney Company - Sr Software Engineer (Roku/BrightScript/SceneGraph)

The Walt Disney Company

Santa Monica, California, United States (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Rackspace Technology - Site Reliability Engineer III

Rackspace Technology

India (Remote)
4 Months ago
bounteous - Site Reliability Engineer

bounteous

Montreal, Quebec, Canada (Hybrid)
2 Months ago
bytedance - Software Engineer Intern (Cloud Native Infrastructure)

bytedance

San Jose, California, United States (On-Site)
3 Months ago
miniclip - Senior Cloud Infrastructure Engineer

miniclip

Lisbon, Lisbon, Portugal (On-Site)
3 Days ago
Granicus - DevOps Engineer III

Granicus

Bengaluru, Karnataka, India (Hybrid)
3 Days ago
Epic Games - Senior Platform Engineer

Epic Games

Cary, North Carolina, United States (On-Site)
2 Months ago
Enphase Energy - Sr. Staff Engineer Cloud

Enphase Energy

Bengaluru, Karnataka, India (On-Site)
6 Months ago
luxsoft - Senior C#/ .Net Azure Engineer

luxsoft

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
2 Weeks ago
Mashgin - Software Engineer, Infrastructure

Mashgin

Palo Alto, California, United States (Hybrid)
8 Months ago
undefined - Google Cloud Engineer - Infrastructure

Monterrey, Nuevo Leon, Mexico (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Our enterprise security platform detects and prevents known and unknown threats while safely enabling an increasingly complex and rapidly growing number of applications. Come be part of the team that redefined the firewall industry and is now the fastest-growing security company in history. Palo Alto Networks, the global cybersecurity leader, is shaping the cloud-centric future with technology that is transforming the way people and organizations operate. Our mission is to be the cybersecurity partner of choice, protecting our digital way of life. We help address the world's greatest security challenges with continuous innovation that seizes the latest breakthroughs in artificial intelligence, analytics, automation, and orchestration. By delivering an integrated platform and empowering a growing ecosystem of partners, we are at the forefront of protecting tens of thousands of organizations across clouds, networks, and mobile devices. Our vision is a world where each day is safer and more secure than the one before.

California, United States (Remote)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

London, England, United Kingdom (On-Site)

View All Jobs

Get notified when new jobs are added by Palo Alto Networks

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug