Senior SRE (Site Reliability Engineer)

2 Months ago • 5 Years + • Devops

Job Summary

Job Description

As a Senior Site Reliability Engineer (SRE) at SailPoint, you will work within the development team to ensure the reliability, scalability, and performance of their services. Responsibilities include designing and implementing solutions to improve system reliability and availability, monitoring key operational metrics, and collaborating with teams to optimize performance and capacity. You will also automate processes, contribute to documentation, and participate in an on-call rotation. This role emphasizes collaboration with engineers and stakeholders to influence system design for optimal operability and reliability. You will also lead incident postmortem efforts.
Must have:
  • 5+ years of SRE experience
  • Strong understanding of SRE principles
  • Experience with cloud platforms
  • Proficiency in scripting languages
  • Experience with monitoring and logging tools
  • Experience with containerization and orchestration
  • Understanding of network protocols
  • Familiarity with DevOps practices
  • Experience with CI/CD toolchains
  • Strong problem-solving and troubleshooting skills
Good to have:
  • Experience with Kafka and relational databases
  • Experience with performance tuning
  • Experience with Grafana K6

Job Details

SailPoint is the leader in identity security for the cloud enterprise. Our identity security solutions secure and enable thousands of companies worldwide, giving our customers unmatched visibility into the entirety of their digital workforce, ensuring workers have the right access to do their job – no more, no less.

We are seeking a highly motivated and experienced Senior Site Reliability Engineer (SRE) to join an Identity Security Cloud software development team. This is an embedded role, meaning you will be a full member of the development team, working closely with software engineers, infrastructure platform services, engineering managers, and other stakeholders to ensure the reliability, scalability, and performance of teams’ services. You will be responsible for leveraging the infrastructure, tooling, and processes that support our applications in dev and production. This role offers a unique opportunity to directly influence the design and architecture of our systems from a reliability and performance perspective. 

 

Responsibilities: 

Work with the development and service owners at the intersection of development and operations to solve performance issues and ensure system scalability. 

  • Reliability Engineering: Design, develop, and implement solutions to improve the reliability, availability, performance, and scalability of our systems. Work with technical leaders and infrastructure platform services to develop alerts and dashboards. 

  • Operational Excellence: Own and improve key operational metrics (SLIs, SLOs, Error Budgets, monitoring and alerting) for team related services and drive continuous improvement through post-incident reviews and blameless postmortems of non-functional issues. Develop and maintain comprehensive monitoring, alerting to proactively identify and resolve issues. Create and maintain dashboards, conducting ongoing reviews to address and optimize gaps. Improve operational processes and team practices by working with technical leaders and NOC teams. 

  • Capacity Planning: Collaborate with technical leads, DevOps/SRE and infra teams to forecast capacity needs and ensure sufficient resources are available to support growth. 

  • Performance Optimization: Collaborate with performance SMEs to identify and address production performance bottlenecks through profiling, tuning, and optimization of services and infrastructure. 

  • Automation: Automate repetitive tasks and processes to improve efficiency and reduce manual intervention. 

  • Collaboration: Work closely with Software, Performance and Test Engineers to influence system design and architecture for operability and reliability. 

  • Documentation:Review and contribute to clear and concise documentation for systems, processes, runbooks, and procedures. 

  • On-Call:Participate in a 24/7 on-call rotation to gain subject matter expertise in the domain. 

  • Incident Management:Lead the incident postmortem efforts, working with the SMEs to ensure timely compilation of reports to help drive completion of post-incident action. 

  • Troubleshooting skills: Excellent diagnostic and problem-solving skills, with the ability to analyze complex systems and data 

 

Qualifications: 

  • Bachelor’s degree in computer science, a related field, or equivalent practical experience. 

  • Proven 5+ years of SRE experience  

  • Strong understanding of SRE principles and practices. 

  • Experience with cloud platforms (AWS, GCP, or Azure). 

  • Proficiency in at least one scripting language (e.g., Python, Bash, Go). 

  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, Honeycomb, OpenSearch). 

  • Level of coding experience beyond simple scripts with one of the programming languages such as Go, Java, or Python to help build reliability engineering; to evaluate and identify where service code can be optimized for enhanced reliability practices. 

  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes). 

  • Understanding of network protocols, and security best practices 

  • Familiarity with DevOps culture and practices and experience with CI/CD toolchains (Jenkins, ArgoCD, SpaceLift) 

  • Experience with Incident Response tools and processes (PagerDuty) 

  • Experience with Infrastructure as Code (Terraform, Helm) 

  • Strong problem-solving and troubleshooting skills. 

  • Excellent communication and collaboration skills.    

  • Ability to work independently and as part of a team to achieve the SRE agenda. 

 

Preferred Qualifications: 

  • Technology experience: Kafka, relational databases, performance tuning (JVM, Go) 

  • Experience with Grafana K6 – Continuous Performance Tool  

 

In the first 30 days you will: 

  • Meet team, understand the team’s mission and vision 

  • Gain clarity on various roles and expectations 

  • Complete development environment setup 

  • Read guides, documentation, perform mandatory training 

  • Learn company processes, benefits 

 

By 6 months you should: 

  • Understand team goals and OKR’s for the quarter and beyond 

  • Complete initial analysis and implementation of SRE team assignments 

  • Be comfortable with tools, systems and processes used on a day-to-day basis 

  • Complete project work, both supervised and unsupervised 

SailPoint is an equal opportunity employer and we welcome all qualified candidates to apply to join our team.  All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other category protected by applicable law.  

Alternative methods of applying for employment are available to individuals unable to submit an application through this site because of a disability.  Contact hr@sailpoint.com or mail to 11120 Four Points Dr, Suite 100, Austin, TX 78726, to discuss reasonable accommodations.

Similar Jobs

Saronic Technologies - Mission Operations Engineer

Saronic Technologies

Portsmouth, England, United Kingdom (On-Site)
2 Weeks ago
Ubisoft - Senior Gameplay Programmer 3C

Ubisoft

Montpellier, Occitanie, France (On-Site)
4 Months ago
Roof Stacks - Software Developer

Roof Stacks

Istanbul, İstanbul, Türkiye (Hybrid)
3 Months ago
Nagarro - Associate Engineer

Nagarro

New York, New York, United States (On-Site)
1 Year ago
OKX - Data Analyst & Business Strategy Director

OKX

Hong Kong (On-Site)
2 Months ago
Rackspace Technology - Cloud Practice Engineer III

Rackspace Technology

Jalisco, Mexico (Remote)
3 Months ago
Dream Sports - Software Development Engineer 3 - Backend (Platform)

Dream Sports

Mumbai, Maharashtra, India (On-Site)
4 Months ago
Sabre India - Principal Java Software Architect - The Intelligence Exchange

Sabre India

Kraków, Lesser Poland Voivodeship, Poland (Hybrid)
2 Weeks ago
Scopely - DevOps Lead

Scopely

Barcelona, Catalonia, Spain (Hybrid)
1 Month ago
bytedance - Site Reliability Engineer - AML

bytedance

San Jose, California, United States (On-Site)
9 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Morning Star - Senior Software Engineer

Morning Star

Bucharest, Bucharest, Romania (Hybrid)
2 Months ago
build a rocket boy - Animation Programmer

build a rocket boy

United Kingdom (Remote)
3 Months ago
Nordson Corporation - Senior Principal Materials Engineer

Nordson Corporation

Salem, New Hampshire, United States (On-Site)
2 Months ago
London stock Exchange - Senior Engineer, Site Reliability Engineering

London stock Exchange

Colombo, Western Province, Sri Lanka (Hybrid)
2 Months ago
PwC - Junior Functional Consultant  - Microsoft D365

PwC

Qormi, Malta (On-Site)
10 Months ago
Qualcomm - SRAM Characterization and Modeling Engineer

Qualcomm

Hsinchu City, Taiwan (On-Site)
2 Weeks ago
Naughty Dog - Producer

Naughty Dog

Santa Monica, California, United States (On-Site)
2 Months ago
Banyan Software - Operations and Supply Chain Manager

Banyan Software

Wayville, South Australia, Australia (On-Site)
2 Weeks ago
Aftershock Media Group - Project Manager

Aftershock Media Group

(Remote)
1 Month ago
ShyftLabs - Technical Lead - Data Integrations

ShyftLabs

Noida, Uttar Pradesh, India (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Mexico

Ion - Senior .NET Consultant

Ion

Mexico City, Mexico (Hybrid)
3 Years ago
HP - Supply Chain Planning Intern

HP

Tijuana, Baja California, Mexico (On-Site)
2 Weeks ago
Valeo - SALES TRAINEE

Valeo

San Luis Potosi, Mexico (On-Site)
3 Weeks ago
Nagarro - Associate Principal Engineer, Delivery

Nagarro

Mexico (Remote)
9 Months ago
LTI Mindtree - Senior Engineer - Industrial IoT

LTI Mindtree

Monterrey, Nuevo Leon, Mexico (On-Site)
4 Weeks ago
Mcdonalds - Software Engineer II React

Mcdonalds

Mexico City, Mexico (On-Site)
1 Month ago
Mcdonalds - Associate Technical Product Analyst

Mcdonalds

Mexico City, Mexico (Hybrid)
1 Month ago
Marsh McLennan - Investments Sales Director

Marsh McLennan

Mexico City, Mexico (Hybrid)
2 Months ago
Amber - Senior Game Economy Designer

Amber

Guadalajara, Jalisco, Mexico (Remote)
2 Months ago
oportun - Telephone Agent in Spanish

oportun

Leon, Guanajuato, Mexico (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Veeam Software - Senior Staff Platform Engineer

Veeam Software

California, United States (Remote)
1 Month ago
bytedance - Machine Learning Engineer - Machine Learning Infrastructure

bytedance

Seattle, Washington, United States (On-Site)
9 Months ago
Sailpoint - Principal SRE (Site Reliability Engineer)

Sailpoint

United States (Remote)
3 Months ago
Google - Software Engineer III, Engineering Productivity, Google Cloud Platforms

Google

Seattle, Washington, United States (On-Site)
3 Months ago
Lorikeet - Solutions Engineer

Lorikeet

London, England, United Kingdom (On-Site)
1 Month ago
Rush street interactive  - Senior Full-Stack Automation Engineer

Rush street interactive

Estonia (Hybrid)
4 Months ago
Barracuda - Managed Services Engineer - (Linux, AWS/Azure, Cloud Ops)

Barracuda

Bengaluru, Karnataka, India (On-Site)
5 Months ago
GoTo Group - Principal SRE Engineer (SE5)

GoTo Group

Bengaluru, Karnataka, India (On-Site)
9 Months ago
Boomi  - Software Engineer 2 - Platform Architecture Service

Boomi

Bengaluru, Karnataka, India (On-Site)
1 Month ago
endava - Senior Cloud Operations Engineer

endava

Cluj-Napoca, Cluj County, Romania (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

SailPoint is a leading provider of identity security for the modern enterprise. Enterprise security starts and ends with identities and their access, yet the ability to manage and secure identities today has moved well beyond human capacity. Using a foundation of artificial intelligence and machine learning, the SailPoint Identity Security Platform delivers the right level of access to the right identities and resources at the right time—matching the scale, velocity, and environmental needs of today’s cloud-oriented enterprise.

Illinois, United States (Remote)

United States (On-Site)

Singapore (Remote)

Austin, Texas, United States (Remote)

Austin, Texas, United States (Hybrid)

South Korea (Remote)

Austin, Texas, United States (Remote)

View All Jobs

Get notified when new jobs are added by Sailpoint

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug