Senior Application Monitoring Engineer

undefined ago • 5 Years + • Software Development & Engineering

Job Summary

Job Description

SailPoint is seeking a Senior Application Monitoring Engineer for its Reliability Engineering team. This role involves overseeing platform stability, performance, and production releases within a 24/7 NOC team. The engineer will design, develop, and improve end-to-end reliability for SailPoint's SaaS services, coach engineering teams on observability best practices, lead post-incident reviews, and collaborate to enhance system reliability. The position requires expertise in distributed systems, cloud infrastructure, containerization, and observability tools to ensure scalable and impactful solutions.
Must have:
  • Oversee the whole platform ensuring stability and performance.
  • Monitor production releases based on complexity and risk assessment.
  • Design, develop, and improve end-to-end reliability and maintainability for all services.
  • Coach engineering teams on observability best practices and SLOs.
  • Lead engineering teams through post-incident reviews.
  • Develop and implement automation tools and processes.
  • 5+ years in agile software development, infrastructure operations, or application management.
  • 4+ years using NOC or SRE tactics for highly available SaaS environments.
  • Experience with cloud infrastructure environments, preferably AWS.
  • Experience with containerization technology and/or Kubernetes.
  • Experience with metrics, tracing, and logging observability tools (Prometheus, Grafana, Honeycomb, Jaeger, Kibana).
  • Experience with incident management.
  • Strong understanding of Linux, software development, systems, networking, and Cloud concepts.
Good to have:
  • Experience with programming languages (Java, Python, Go).
  • Bachelor's degree in Computer Science or other technical discipline.

Job Details

SailPoint is the leader in identity security for the cloud enterprise. Our identity security solutions secure and enable thousands of companies worldwide, giving our customers unmatched visibility into the entirety of their digital workforce, ensuring workers have the right access to do their job – no more, no less.

IdentityNow is SailPoint’s Identity as a Service (IDaaS) product, and the Senior Monitoring Engineer will be a key player on our Reliability Engineering team and will use a combination of SRE and programming in Java (and Go lang desired) to maintain reliable, scalable, observable microservices for enterprise- grade, multi-tenant SaaS product. We are looking for engineers with broad experience in building and running distributed systems at global scale. If you enjoy analyzing complicated problems, innovating creative solutions, and collaborating across teams to build reliable, scalable, and impactful solutions, come join our Reliability Engineering team. We are a team of people that write software to solve scalability, observability, security, reliability, and operability problems.

What You’ll Make Happen:

  • As a member of 24/7 NOC team, oversee the whole platform ensuring stability and performance and monitor production releases based on the complexity and risk assessment.
  • Make it easy for everyone to create, consume, manage, and scale reliable cloud production services to achieve more
  • Work independently or collaboratively on SailPoint SaaS services to design, develop, and improve end-to-end reliability and maintainability for all services
  • Coach engineering teams on observability best practices such as setting up well defined Service Level Objectives (SLOs).
  • Lead engineering teams through post-incident reviews to define effective preventive actions
  • Collaborate effectively with developers to increase system reliability through short-term embedding programs
  • Enable our engineering teams to scale our enterprise operations by providing guidance, best practices and support as part of an SRE Centre of Excellence
  • Manage cross-functional requirements working with Engineering, Product, Services, and other departments
  • Develop and implement automation tools and processes to streamline operations and enhance system performance.
  • Be a mentor of quality for design reviews, code, test cases, automation, observability, root cause analysis, and self-healing
  • Focuses on expanding own skills and looking at improving their teammates' skills
  • Drive operational excellence to deliver frictionless operation, happy on call, and optimal customer experience

Requirements

  • 5+ years of experience working in an agile software development, infrastructure operations, or application management with SaaS software or cloud service provider organizations.
  • 4+ years of experience using NOC or SRE tactics to monitor Engineering production operations supporting a highly available environment for SaaS software or cloud service provider.
  • Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code.
  • Experience with containerization technology and/or Kubernetes
  • Experience with metrics, tracing, and logging observability tools such as Prometheus, Grafana, Honeycomb, Jaeger, and Kibana
  • Experience with incident management, including conducting incident reviews
  • Good to have experience with programming languages (Java, Python, Go, etc).
  • Strong understanding of Linux, software development, systems, networking, and Cloud concepts· Experience working with remote teams (US time zones).
  • Strong interpersonal and teaming skills - ability to set and enforce process and influence engineers who are not direct reports.
  • Have excellent communication skills - English fluency

Preferred:

  • Bachelor's degree in Computer Science or other technical discipline

Schedule

  • We have one 9x7 rotating shift schedules with some time night shift to cover Mexico team for this role.

What success looks like in the role

Within the first 30 days you will:

  • Onboard into your new role, get familiar with our product offering and technology, proactively meet peers and stakeholders, set up your test and development environment.
  • Seek to deeply understand business problems or common engineering challenges and propose software architecture designs to solve them elegantly by abstracting useful common patterns.

By 90 days:

  • Proactively collaborate on, discuss, debate and refine ideas, problem statements, and software designs with different (sometimes many) stakeholders, architects and members of your team.
  • Take a committed approach to prototyping and co-implementing systems alongside less experienced engineers on your team—there’s no room for ivory towers here.

By 6 months:

  • Share support of critical team systems by participating in call, learning the characteristics of currently running systems, and participating in improvements.
  • Occasionally serve as a debugging and implementation expert during escalations of systems issues that have evaded the ability of less experienced engineers to solve in a timely manner.
  • Collaborates with Support Management and Engineering Manager to quick resolution of escalation.

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Pune, Maharashtra, India

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Software Development & Engineering Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

SailPoint is a leading provider of identity security for the modern enterprise. Enterprise security starts and ends with identities and their access, yet the ability to manage and secure identities today has moved well beyond human capacity. Using a foundation of artificial intelligence and machine learning, the SailPoint Identity Security Platform delivers the right level of access to the right identities and resources at the right time—matching the scale, velocity, and environmental needs of today’s cloud-oriented enterprise.

Pune, Maharashtra, India (Hybrid)

Pune, Maharashtra, India (On-Site)

Pune, Maharashtra, India (Hybrid)

Mexico (Remote)

Austin, Texas, United States (On-Site)

Pune, Maharashtra, India (Remote)

Paris, Île-de-France, France (On-Site)

View All Jobs

Get notified when new jobs are added by Sailpoint

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug