Command Center Manager

7 Minutes ago • 6 Years +
Campaign Management

Job Description

We are seeking a strategic and operationally strong Command Center / Site Reliability Manager to lead our global incident response and network operations functions. This leadership role drives operational excellence, leads a high-performing team, and ensures the resilience and reliability of production systems. Responsibilities include 24x7 incident detection, escalation, communication, and resolution of critical service outages, alongside real-time monitoring and triage of infrastructure and application health.
Good To Have:
  • Prior experience supporting high-availability SaaS or telecommunications systems is a strong plus.
  • Experience with customer-facing incident communication practices.
Must Have:
  • Lead end-to-end management of Critical Service Outages (P0/P1 incidents), driving timely resolution through coordinated incident response, effective communication, and robust post incident reviews.
  • Oversee a 24x7 Network Operations Center (NOC), implementing scalable observability, alerting, and monitoring strategies to ensure infrastructure, application, and network reliability.
  • Build and develop a high-performing team of incident managers, NOC engineers, and shift leads, fostering operational maturity through training and collaboration.
  • Define and uphold standards for incident SLAs, escalation processes, runbooks, and playbooks, ensuring continuous shift coverage and comprehensive KPI reporting.
  • 6+ years of experience in Technical Operations, Site Reliability, NOC, or Incident Management roles.
  • 2+ years in a people management or team leadership role.
  • Deep knowledge of major incident management, escalation practices, and real-time service recovery strategies.
  • Strong technical understanding of cloud-native architectures (AWS, Azure, GCP), infrastructure monitoring, and DevOps practices.
  • Proven experience working with observability tools (e.g., Datadog, Splunk, Grafana, Prometheus), incident tools (PagerDuty), and ITSM platforms (e.g., ServiceNow, Jira).
Perks:
  • Comprehensive suite of benefits designed to help each member of our team thrive.
  • Voluntary healthcare coverage in countries where applicable.
  • Paid time off to recharge and spend time with loved ones.
  • Open Mentoring Program designed to create meaningful connections that support growth.

Add these skills to join the top 1% applicants for this job

saas-business-models
communication
leadership
game-texts
incident-response
aws
azure
prometheus
grafana
splunk
jira

We are seeking a strategic and operationally strong Command Center / Site Reliability Manager to lead our global incident response and network operations functions. This leadership role is responsible for driving operational excellence, leading a high-performing team, and ensuring the resilience and reliability of our production systems and services. You will lead the team responsible for 24x7 incident detection, escalation, communication, and resolution of critical service outages while overseeing real-time monitoring and triage of infrastructure and application health.

  • ### Lead end-to-end management of Critical Service Outages (P0/P1 incidents), driving timely resolution through coordinated incident response, effective communication with stakeholders, and robust post incident reviews with actionable remediation.
  • ### Oversee a 24x7 Network Operations Center (NOC), implementing scalable observability, alerting, and monitoring strategies to ensure infrastructure, application, and network reliability. Continuously optimize alert triage, diagnostics, and noise reduction to boost efficiency.
  • ### Build and develop a high-performing team of incident managers, NOC engineers, and shift leads. Foster operational maturity through training, performance management, and close collaboration with Engineering, SRE, DevOps, and Product teams.
  • ### Define and uphold standards for incident SLAs, escalation processes, runbooks, and playbooks, while ensuring continuous shift coverage, smooth handoffs, and comprehensive KPI reporting on system health and incident trends. Required Skills:
  • ### 6+ years of experience in Technical Operations, Site Reliability, NOC, or Incident Management roles.
  • ### 2+ years in a people management or team leadership role. • Deep knowledge of major incident management, escalation practices, and real-time service recovery strategies.
  • ### Strong technical understanding of cloud-native architectures (AWS, Azure, GCP), infrastructure monitoring, and DevOps practices.
  • ### Proven experience working with observability tools (e.g., Datadog, Splunk, Grafana, Prometheus), incident tools (PagerDuty), and ITSM platforms (e.g., ServiceNow, Jira)
  • ### Prior experience supporting high-availability SaaS or telecommunications systems is a strong plus. • Experience with customer-facing incident communication practices

Set alerts for more jobs like Command Center Manager
Set alerts for new jobs by Sprinkler
Set alerts for new Campaign Management jobs in India
Set alerts for new jobs in India
Set alerts for Campaign Management (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙