Job Description
Due to government environments this position supports, the role requires US Citizenship.
Your Career
Palo Alto Networks runs a large infrastructure and is one of the biggest GCP customers. As a Principal SRE, you'll be at the forefront of building and maintaining highly reliable, scalable, and secure cloud infrastructure within a FedRAMP compliant environment. You'll drive operational excellence, champion SRE best practices, and work collaboratively to ensure our systems are robust and performant. This includes automation, architecture, performance, observability, troubleshooting, security, and reliability.
Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps, Prometheus, Grafana, Loki, Docker, GCP, Backstage, MySQL, PagerDuty, FireHydrant, Python, Bash, Java, NodeJS and Go.
Your Impact
- Design, build, and operate reliable, secure Cloud infrastructure across multi-cloud environments
- Ensure applications are production-ready, scalable, and resilient, collaborating closely with developers, researchers, data scientists, and security experts
- Develop expertise in new technologies and rapidly integrate them into our existing infrastructure, embracing continuous learning and the adoption of AI tools
- Develop tools and automation frameworks, championing Infrastructure as Code (IaC) and Monitoring as Code (MaC) principles
- Automate robust deployments and orchestrate end-to-end monitoring and alerting solutions
- Participate in on-call rotations with SRE and Dev teams to support critical business and production systems
- Lead root cause analysis of critical business and production issues, driving improvements and preventing recurrence
- Contribute to the success of SRE and DevOps initiatives, aligning technical decisions with business goals and understanding their impact
Qualifications
Your Experience
- Must be a US Citizen to be considered
- 7+ years of experience in Infrastructure, SRE, or DevOps roles required
- BS or MS in Computer Science, a related field, or equivalent professional experience required
- 4+ years of experience with AWS and GCP and expertise in their architecture, services, advanced cloud networking, and PKI concepts
- Expertise in troubleshooting and resolving cloud infrastructure and service issues, identifying root cause and devising effective solutions for high volume transactions
- Proficiency with Python and shell scripting for automation; Golang is a plus
- Proficiency in Infrastructure as Code (IaC) with Terraform and Helm, leveraging AI tools for development
- Solid experience with Kubernetes, container networking, and container workloads
- Strong Linux administration skills
- Proficiency with CI/CD pipelines, GitOps principles, GitLab, and Jenkins
- Excellent written and verbal communication skills, with the ability to collaborate effectively and rally support across teams
- Self-disciplined, self-managed, and highly driven with a strong sense of ownership and urgency
- Ability to adapt quickly to evolving cloud technologies, security threats, and advancements through continuous learning
- Able to understand and address customer needs effectively, and provide RCA to customers
- Understanding how technical decisions impact the business and aligning cloud operations with business goals