Site Reliability Engineer

Zelis

Job Summary

Zelis is seeking a Site Reliability Engineer with a strong observability-focused mindset, adept at handling telemetry (metrics, logs, traces, events) and experienced with tools like Datadog and AWS CloudWatch. The role involves defining SLIs/SLOs, proactive monitoring, and incident management with clear communication and postmortem discipline. Key responsibilities include strategic and structured logging, an automation-first approach, security awareness regarding PII/PHI, global collaboration, and maintaining comprehensive documentation such as runbooks and system diagrams.

Must Have

  • Deep understanding of telemetry (Metrics, logs, traces, events)
  • Experience with observability tools (Datadog, AWS CloudWatch, SolarWinds, OpenTelemetry)
  • Ability to define and refine SLIs/SLOs to measure system health
  • Proactive monitoring, building dashboards and alerts
  • Handles high-severity incidents with clarity and focus
  • Strong communication during incidents
  • Writes blameless post-incident reports and drives follow-ups
  • Adds meaningful and structured logs
  • Automates repetitive tasks and incident responses
  • Understands implications of exposing PII/PHI in logs or dashboards
  • Maintains clear runbooks, escalation paths, and system diagrams

Perks & Benefits

  • Hybrid work flexibility
  • Comprehensive healthcare benefits
  • Financial wellness programs
  • Cultural celebrations

Job Description

About Us

Zelis is modernizing the healthcare financial experience in the United States (U.S.) across payers, providers, and healthcare consumers. We serve more than 750 payers, including the top five national health plans, regional health plans, TPAs and millions of healthcare providers and consumers across our platform of solutions. Zelis sees across the system to identify, optimize, and solve problems holistically with technology built by healthcare experts – driving real, measurable results for clients.

Why We Do What We Do

In the U.S., consumers, payers, and providers face significant challenges throughout the healthcare financial journey. Zelis helps streamline the process by offering solutions that improve transparency, efficiency, and communication among all parties involved. By addressing the obstacles that patients face in accessing care, navigating the intricacies of insurance claims, and the logistical challenges healthcare providers encounter with processing payments, Zelis aims to create a more seamless and effective healthcare financial system.

Zelis India plays a crucial role in this mission by supporting various initiatives that enhance the healthcare financial experience. The local team contributes to the development and implementation of innovative solutions, ensuring that technology and processes are optimized for efficiency and effectiveness. Beyond operational expertise, Zelis India cultivates a collaborative work culture, leadership development, and global exposure, creating a dynamic environment for professional growth. With hybrid work flexibility, comprehensive healthcare benefits, financial wellness programs, and cultural celebrations, we foster a holistic workplace experience. Additionally, the team plays a vital role in maintaining high standards of service delivery and contributes to Zelis’ award-winning culture.

Position Overview

Observability-Focused Mindset

  • Deep understanding of telemetry: Metrics, logs, traces, and events.
  • Experience with observability tools: e.g., Datadog, AWS CloudWatch, SolarWinds, OpenTelemetry.
  • Ability to define and refine SLIs/SLOs to measure system health.
  • Proactive monitoring: Builds dashboards and alerts that detect issues before users do.

Incident Management & Communication

  • Calm under pressure: Handles high-severity incidents with clarity and focus.
  • Strong communicator: Clearly articulates impact, status, and resolution steps to stakeholders.
  • Postmortem discipline: Writes blameless post-incident reports and drives follow-ups.
  • Collaboration: Works closely with devs, product, and support during incidents.

Logging & Iterative Improvements

  • Strategic logging: Adds meaningful logs that aid in debugging and performance analysis.
  • Log hygiene: Avoids noisy or redundant logs; uses structured logging.
  • Iterative mindset: Continuously improves logging as the application evolves.
  • Understands cost vs. value: Balances log verbosity with storage and performance impact.

Additional Qualities

  • Automation-first approach: Automates repetitive tasks and incident responses.
  • Security-aware: Understands implications of exposing PII/PHI in logs or dashboards.
  • Global collaboration: Comfortable working with distributed teams (e.g., US based Dev, SME).
  • Documentation: Maintains clear runbooks, escalation paths, and system diagrams.

Commitment to Diversity, Equity, Inclusion, and Belonging

At Zelis, we champion diversity, equity, inclusion, and belonging in all aspects of our operations. We embrace the power of diversity and create an environment where people can bring their authentic and best selves to work. We know that a sense of belonging is key not only to your success at Zelis, but also to your ability to bring your best each day.

Equal Employment Opportunity

Zelis is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Accessibility Support

We are dedicated to ensuring our application process is accessible to all candidates. If you are a qualified individual with a disability and require reasonable accommodation with any part of the application and/or interview process, please email talentacquisition@zelis.com.

6 Skills Required For This Role

Team Management Leadership Problem Solving Performance Analysis Game Texts Aws

Similar Jobs