Manager, Incident Response and Service Reliability

undefined ago • 8 Years + • $208,400 PA - $313,500 PA

Job Summary

Job Description

This role is for a Manager, Incident Response and Service Reliability, leading the incident response program for Apple Wallet. It's a hands-on, high-accountability position requiring technical fluency, operational rigor, and strong leadership. The manager will define and operate processes for detecting, triaging, prioritizing, and mitigating service-impacting incidents. Responsibilities include driving proactive identification of recurring issues, leading root cause analysis, and partnering with engineering for long-term fixes to improve reliability and reduce risk. The role ensures urgent incident handling, clear communication, and root cause resolution through collaboration with various teams.
Must have:
  • Define and own strategic vision for incident and problem management.
  • Lead end-to-end incident response program.
  • Own problem management function, driving root cause analysis and long-term fixes.
  • Manage a team of incident and problem managers.
  • Define and track operational health metrics (e.g., MTTD, MTTM, MTTR).
  • Oversee adoption and evolution of incident tooling.
  • Facilitate blameless post-incident reviews (PIRs) for clear accountability and durable outcomes.
  • Instill a culture of operational learning and resilience, drive systemic improvements.
Good to have:
  • Experience in payments, banking, or financial services (developer role)
  • Experience leading incident programs across global teams or regulated environments
  • Background in high-availability systems, payments infrastructure, or customer-critical services
  • Familiarity with root cause analysis frameworks, postmortem facilitation, and chaos testing
  • Experience integrating incident workflows with observability and BI platforms (Datadog, Grafana, Tableau)
  • Experience driving change in cross-functional or matrixed organizations
Perks:
  • Comprehensive medical and dental coverage
  • Retirement benefits
  • Discounted products and free services
  • Reimbursement for certain educational expenses (tuition)
  • Discretionary bonuses or commission payments
  • Relocation assistance
  • Opportunity to become an Apple shareholder (employee stock programs, ESPP)

Job Details

Are you passionate about operational excellence and protecting the customer experience? Are you drawn to solving some of the most complex and cross-functional challenges in an organization? Do you thrive on driving strategic changes that prevent problems before they happen? If so, you might be the right person to lead our Incident Management Team. This role focuses on building and leading the incident response program for Apple Wallet, one of our most impactful and customer-facing services. It’s a hands-on, high-accountability role that requires technical fluency, operational rigor, and strong leadership. At Apple, we don’t just build products- we craft the kind of wonder that’s revolutionized entire industries. Apple Wallet has changed the way we access the world, and is one of our fastest growing and most impactful services. If this excites you, apply to join our talented team.

The Product Operations team empowers Apple teams to execute at scale. We tackle complex organizational, technical, and operational challenges to ensure seamless execution and strategic alignment across Apple Wallet. As the manager for the Incident Response and Service Reliability Team, you will lead the team responsible for Apple Wallet’s real-time incident response program. You will define and operate the processes for detecting, triaging, prioritizing, and mitigating service-impacting incidents. You will drive the proactive identification of recurring issues, lead root cause analysis, and partner with engineering to implement long-term fixes that reduce risk and improve reliability. Through close collaboration with engineering, infrastructure, SRE, and product teams, you will ensure that incidents are handled with urgency, communication is clear, and issues are addressed at the root.

  • Define and own the strategic vision for incident and problem management, integrating tooling, response structure, and continuous improvement across engineering.
  • Lead the end-to-end incident response program, including severity classification, escalation protocols, stakeholder communication, and real-time coordination.
  • Own the problem management function by identifying systemic issues, driving root cause analysis, and partnering with engineering to implement long-term fixes.
  • Manage a team of incident and problem managers, setting priorities, execution standards, and development goals.
  • Define and track operational health metrics (e.g., MTTD, MTTM, MTTR), and drive improvements in detection, mitigation, and recovery timelines.
  • Oversee the adoption and evolution of incident tooling- e.g. monitoring, alerting, automation, documentation, and reporting.
  • Facilitate blameless post-incident reviews (PIRs) that result in clear accountability, cross-functional alignment, and durable outcomes.
  • Instill a culture of operational learning and resilience, drive systemic and architectural improvements to reduce incident volume, minimize customer impact, and increase operational resilience.
  • Bachelor’s degree or equivalent practical experience.
  • 8+ years of experience in incident management, technical program management, or SRE/infra leadership roles.
  • Demonstrated experience building or scaling an incident management program in a production or customer-facing environment.
  • Proven ability to define, measure, and influence operational metrics (e.g., MTTD, MTTR, etc.).
  • Strong cross-functional collaboration skills, particularly with engineering, product, and executive stakeholders.
  • Excellent communication skills under pressure, with the ability to drive clarity and urgency.
  • Experience with incident tooling (e.g., PagerDuty, Opsgenie, Slack bots, observability platforms).
  • Experience working in payments, banking, or other financial services companies in a developer role (SRE, DevOps or other engineering experience).
  • Experience leading incident programs across global teams or regulated environments.
  • Background in high-availability systems, payments infrastructure, or customer-critical services.
  • Familiarity with root cause analysis frameworks, postmortem facilitation, and chaos testing.
  • Experience integrating incident workflows with observability and BI platforms (e.g., Datadog, Grafana, Tableau).
  • Experience driving change in cross-functional or matrixed organizations.

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $208,400 and $313,500, and your base pay will depend on your skills, qualifications, experience, and location.

Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant

.

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in New York, New York, United States

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Cupertino, California, United States (On-Site)

New York, New York, United States (On-Site)

Cupertino, California, United States (On-Site)

Cupertino, California, United States (On-Site)

Cupertino, California, United States (On-Site)

Sunnyvale, California, United States (On-Site)

Beaverton, Oregon, United States (On-Site)

Mesa, Arizona, United States (On-Site)

Maiden, North Carolina, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Apple

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug