Command Center Manager

Sprinkler

6+ Years | Gurgaon, Haryana, India (On Site) | Full Time | 2 months ago

Apply Now

Job Summary

We are seeking a strategic and operationally strong Command Center / Site Reliability Manager to lead our global incident response and network operations functions. This leadership role drives operational excellence, leads a high-performing team, and ensures the resilience and reliability of production systems. Responsibilities include 24x7 incident detection, escalation, communication, and resolution of critical service outages, alongside real-time monitoring and triage of infrastructure and application health.

Must Have

Lead end-to-end management of Critical Service Outages (P0/P1 incidents), driving timely resolution through coordinated incident response, effective communication, and robust post incident reviews.
Oversee a 24x7 Network Operations Center (NOC), implementing scalable observability, alerting, and monitoring strategies to ensure infrastructure, application, and network reliability.
Build and develop a high-performing team of incident managers, NOC engineers, and shift leads, fostering operational maturity through training and collaboration.
Define and uphold standards for incident SLAs, escalation processes, runbooks, and playbooks, ensuring continuous shift coverage and comprehensive KPI reporting.
6+ years of experience in Technical Operations, Site Reliability, NOC, or Incident Management roles.
2+ years in a people management or team leadership role.
Deep knowledge of major incident management, escalation practices, and real-time service recovery strategies.
Strong technical understanding of cloud-native architectures (AWS, Azure, GCP), infrastructure monitoring, and DevOps practices.
Proven experience working with observability tools (e.g., Datadog, Splunk, Grafana, Prometheus), incident tools (PagerDuty), and ITSM platforms (e.g., ServiceNow, Jira).

Good to Have

Prior experience supporting high-availability SaaS or telecommunications systems is a strong plus.
Experience with customer-facing incident communication practices.

Perks & Benefits

Comprehensive suite of benefits designed to help each member of our team thrive.
Voluntary healthcare coverage in countries where applicable.
Paid time off to recharge and spend time with loved ones.
Open Mentoring Program designed to create meaningful connections that support growth.

Job Description

We are seeking a strategic and operationally strong Command Center / Site Reliability Manager to lead our global incident response and network operations functions. This leadership role is responsible for driving operational excellence, leading a high-performing team, and ensuring the resilience and reliability of our production systems and services. You will lead the team responsible for 24x7 incident detection, escalation, communication, and resolution of critical service outages while overseeing real-time monitoring and triage of infrastructure and application health.

### Lead end-to-end management of Critical Service Outages (P0/P1 incidents), driving timely resolution through coordinated incident response, effective communication with stakeholders, and robust post incident reviews with actionable remediation.
### Oversee a 24x7 Network Operations Center (NOC), implementing scalable observability, alerting, and monitoring strategies to ensure infrastructure, application, and network reliability. Continuously optimize alert triage, diagnostics, and noise reduction to boost efficiency.
### Build and develop a high-performing team of incident managers, NOC engineers, and shift leads. Foster operational maturity through training, performance management, and close collaboration with Engineering, SRE, DevOps, and Product teams.
### Define and uphold standards for incident SLAs, escalation processes, runbooks, and playbooks, while ensuring continuous shift coverage, smooth handoffs, and comprehensive KPI reporting on system health and incident trends. Required Skills:
### 6+ years of experience in Technical Operations, Site Reliability, NOC, or Incident Management roles.
### 2+ years in a people management or team leadership role. • Deep knowledge of major incident management, escalation practices, and real-time service recovery strategies.
### Strong technical understanding of cloud-native architectures (AWS, Azure, GCP), infrastructure monitoring, and DevOps practices.
### Proven experience working with observability tools (e.g., Datadog, Splunk, Grafana, Prometheus), incident tools (PagerDuty), and ITSM platforms (e.g., ServiceNow, Jira)
### Prior experience supporting high-availability SaaS or telecommunications systems is a strong plus. • Experience with customer-facing incident communication practices

11 Skills Required For This Role

Saas Business Models Communication Leadership Game Texts Incident Response Aws Azure Prometheus Grafana Splunk Jira

Similar Jobs

Campaign Management

Bit Egg Inc • Bangkok, Thailand (On Site)

Sr. Manager, Visa Campaign Solutions Expansion

Visa • Atlanta, Georgia, United States of America (Hybrid)

Developer Campaign Marketer - Claude Code

Anthropic • Hybrid

CRM Campaign Executive

ComeOn Group • St. Julian's, Malta (Hybrid)

Sr. Demand Generation + Marketing Campaigns Manager

Authzed • United States (Remote)

Brand Campaigns Lead

Salesforce • Chicago, Illinois, United States (Hybrid)

Email and CRM Coordinator (Maternity Cover)

TMI Group • London, United Kingdom (On Site)

Interactive Country Manager

AGS - American Gaming Systems • Brighton, Truro, England, United Kingdom (On Site)

Campaign Manager (m/f/d)

Axel springer • Berlin, Berlin, Germany (On Site)

CRM & Loyalty Manager

Aristocrat Leisure Limited • Boston, Massachusetts, United States (Hybrid)

Marketing

Editorial Assistant, British Vogue

Condé Nast • London, United Kingdom (Hybrid)

Mid-level Technical Writer

Netomi • Toronto, Ontario, Canada (Remote)

Influencer Manager

Game drive • Utrecht, The Netherlands (On Site)

Social Media Research

ancient 8 • Ho Chi Minh City, Vietnam (On Site)

Coordinator, Publicity & Communications

DreamWorks • Universal City, California, United States (Hybrid)

Photographer (THE STANDARD LIFE)

Bit Egg Inc • Bangkok, Thailand (On Site)

Internship (THE STANDARD)

Bit Egg Inc • Bangkok, Thailand (Hybrid)

Manager, Marketing Effectiveness (MEFF) Consulting

Bit Egg Inc • Bangkok, Thailand (Remote)

Content Creator for knd (Kam Ni Dee) program

Bit Egg Inc • Bangkok, Thailand (On Site)

Content Creator (Sport News)

Bit Egg Inc • Bangkok, Thailand (On Site)