Senior Director, Site Reliability Engineering, Technical Operations Center & Observability

2 Months ago • All levels • Devops

Job Summary

Job Description

Take-Two Interactive Software, Inc. is seeking a Senior Director of Site Reliability Engineering (SRE), Technical Operations Center (TOC), and Observability. This role involves leading global teams to ensure the reliability, scalability, and performance of critical systems across cloud and on-premise environments. Responsibilities include overseeing SRE, TOC, and enterprise observability strategy, implementing proactive monitoring, managing incident response, and maintaining platform stability. The ideal candidate will possess strong leadership and technical expertise to drive operational excellence, minimize downtime, and deliver a seamless experience. This includes establishing SLOs/SLIs, developing a 24/7 incident response model, driving root cause analysis, partnering with engineering teams, and championing automation.
Must have:
  • Lead global SRE/TOC teams for system reliability
  • Oversee enterprise observability initiatives (logging, monitoring, tracing)
  • Establish SLOs/SLIs and reliability metrics
  • Develop 24/7 incident response and command practices
  • Drive root cause analysis (RCA) and continuous improvement
  • Partner with engineering and infrastructure teams
  • Own and optimize TOC operations
  • Champion automation and self-healing systems
  • Hire, mentor, and develop global technical teams
  • Experience with cloud platforms (AWS, GCP, Azure)
  • Experience with observability tools (Datadog, Prometheus, Grafana, etc.)
  • Proficiency in IaC tools (Terraform, CloudFormation, Ansible)
  • Experience with CI/CD and DevOps tooling
  • Experience with container orchestration (Kubernetes, Docker)
  • Strong analytical and decision-making skills
  • Ability to lead multi-functional teams during incidents
Good to have:
  • Deep understanding of hybrid and on-prem infrastructure
  • Familiarity with incident response tools (PagerDuty, ServiceNow)
  • Understanding of networking fundamentals
  • Solid understanding of security standards and compliance
Perks:
  • Fitness allowance
  • Employee discount programs
  • Free games & events
  • Stocked pantries
  • Great Company Culture
  • Growth opportunities
  • Work Hard, Play Hard events
  • Medical (HSA & FSA), dental, vision
  • 401(k) with company match
  • Employee stock purchase plan
  • Commuter benefits
  • In-house wellness program
  • Learning & development opportunities
  • Charitable giving platform with company match

Job Details

Who We Are:

Headquartered in New York City, Take-Two Interactive Software, Inc. is a leading developer, publisher, and marketer of interactive entertainment for consumers around the globe. We develop and publish products principally through Rockstar Games, 2K, and Zynga. Our products are designed for console gaming systems, PC, and mobile, including smartphones and tablets. We deliver our products through physical retail, digital download, online platforms, and cloud streaming services. The Company’s common stock is publicly traded on NASDAQ under the symbol TTWO. For more corporate and product information please visit our website at http://www.take2games.com.

While our offices (physical and virtual) are casual and inviting, we are deeply committed to our core tenets of creativity, innovation and efficiency, and individual and team development opportunities. Our industry and business are continually evolving and fast-paced, providing numerous opportunities to learn and hone your skills. We work hard, but we also like to have fun, and believe that we provide a great place to come to work each day to pursue your passions.

 

The Challenge:

The Senior Director of SRE/TOC and Observability will lead global teams responsible for the reliability, scalability, and performance of critical systems across both cloud and on-prem environments. This role is responsible for Site Reliability Engineering, Technical Operations Center (TOC), and enterprise observability strategy, ensuring proactive monitoring, incident response, and platform stability. The ideal candidate combines deep technical expertise with strong leadership skills to drive operational excellence, minimize downtime, and deliver a seamless experience to internal and external collaborators.

What You’ll Take On:

  • Provide strategic leadership for global Site Reliability Engineering (SRE) and Technical Operations Center (TOC) teams, ensuring high availability and resilience of critical systems.
  • Supervise enterprise-wide observability initiatives, including logging, monitoring, tracing, and alerting frameworks to improve system visibility and incident response.
  • Establish and implement SLOs/SLIs, performance baselines, and reliability metrics aligned with business goals.
  • Develop and scale a 24/7 incident response model, including incident command practices, on-call rotations, and critical issue protocols.
  • Drive root cause analysis (RCA) and continuous improvement processes following major incidents.
  • Partner with engineering, infrastructure, and security teams to embed reliability and operational standard methodologies into system design and delivery pipelines.
  • Own and optimize TOC operations, including real-time monitoring, alert triage, and first-line response to critical issues.
  • Champion automation, tooling, and self-healing systems to reduce manual interventions and improve uptime.
  • Hire, mentor, and develop a high-performing team across multiple geographies and time zones.
  • Collaborate with product and business partners to align operational strategies with customer needs and growth plans.
  • Track and report on platform health, incident trends, and reliability critical metrics to executive leadership.

What You Bring:

Infrastructure & Cloud:

  • Deep experience with cloud platforms: AWS, GCP, and/or Azure
  • Proven understanding of hybrid and on-prem infrastructure (VMware, bare metal, etc.)
  • Expertise in high-availability architecture, scalability, and disaster recovery planning

Monitoring & Observability:

  • Hands-on experience with observability tools: Datadog, Prometheus, Grafana, New Relic, Splunk, ELK stack, or similar
  • Building and tuning SLOs/SLIs, alerting thresholds, and dashboards

Automation & DevOps:

  • Proficiency in Infrastructure as Code (IaC) tools: Terraform, CloudFormation, Ansible
  • Understanding of CI/CD pipelines and DevOps tooling (e.g., Jenkins, GitLab CI, ArgoCD)
  • Experience with container orchestration platforms: Kubernetes, Docker, Helm

Incident Management & TOC Operations:

  • Experience with incident response tools: PagerDuty, ServiceNow
  • Familiarity with incident command processes, RCA frameworks, and postmortem best practices
  • Understanding of networking fundamentals, DNS, load balancing, and traffic routing

Security & Compliance:

  • Solid understanding of security standard processes, access control, and vulnerability management
  • Awareness of compliance standards (e.g., SOC 2, ISO 27001, HIPAA) relevant to operational reliability

Leadership & Communication:

  • Strong analytical and decision-making skills under pressure
  • Ability to lead multi-functional teamwork during high-severity incidents
  • Experience scaling and mentoring global technical teams

 

What We Offer You:

  • Great Company Culture. Ranked as one of the most creative and innovative places to work, creativity, innovation, efficiency, diversity and philanthropy are among the core tenets of our organization and are integral drivers of our continued success.
  • Growth: As a global entertainment company, we pride ourselves on creating environments where employees are encouraged to be themselves, inquisitive, collaborative and to grow within and around the company.
  • Work Hard, Play Hard. Our employees bond, blow-off steam, and flex some creative muscles – through corporate boot camp classes, company parties, game release events, monthly socials, and team challenges.
  • Benefits. Medical (HSA & FSA), dental, vision, 401(k) with company match, employee stock purchase plan, commuter benefits, in-house wellness program, broad learning & development opportunities, a charitable giving platform with company match and more!
  • Perks. Fitness allowance, employee discount programs, free games & events and stocked pantries.

Please be aware that Take-Two does not conduct job interviews or make job offers over third-party messaging apps such as Telegram, WhatsApp, or others. Take-Two also does not engage in any financial exchanges during the recruitment or onboarding process, and the Company will never ask a candidate for their personal or financial information over an app or other unofficial chat channel. Any attempt to do so may be the result of a scam or phishing exercise. Take-Two’s in-house recruitment team will only contact individuals through their official Company email addresses (i.e., via a take2games.com email domain). If you need to report an issue or otherwise have questions, please contact Careers@take2games.com

As an equal opportunity employer, Take-Two Interactive Software, Inc. (“Take-Two”) is committed to fostering and celebrating the diverse thoughts, cultures, and backgrounds of its talent, partners, and communities throughout its organization. Consistent with this commitment, Take-Two does not discriminate or retaliate against any employee or job applicant because of their race, color, religion, sex (including pregnancy, sexual orientation, and gender identity), national origin, age, disability, and genetic information (including family medical history), or on the basis of any other trait protected by applicable law. If you need to report a concern or have questions regarding Take-Two’s equal opportunity commitment, please contact Careers@take2games.com.

 

Similar Jobs

Riot Games - Principal Software Engineer (ML Focused) - League Studio, League Data Central

Riot Games

Los Angeles, California, United States (On-Site)
9 Months ago
Loyalty Juggernaut - Product Manager

Loyalty Juggernaut

Hyderabad, Telangana, India (On-Site)
1 Year ago
Toast - Retail Account Executive

Toast

Elizabeth, New Jersey, United States (Hybrid)
1 Month ago
Moon Active - Delivery Manager

Moon Active

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
Flow - People Operations Generalist

Flow

New York, United States (On-Site)
1 Month ago
playrix  - Senior Release Automation Engineer (Gardenscapes)

playrix

Ireland (Remote)
6 Months ago
Qualcomm - Engineer - Multimedia Automation & Execution

Qualcomm

Hyderabad, Telangana, India (On-Site)
2 Months ago
Nintendo - DevOps Engineer

Nintendo

Redmond, Washington, United States (On-Site)
7 Months ago
Apple - Senior ML Infrastructure Engineer

Apple

Cupertino, California, United States (On-Site)
2 Months ago
Riot Games - Senior Software Engineer, Services - Esports Platform & Experiences

Riot Games

Los Angeles, California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

BioFire - National Field System Engineer

BioFire

United States (On-Site)
2 Months ago
Make - Senior Data Engineer

Make

Olomouc, Olomouc Region, Czechia (On-Site)
3 Months ago
Neolytix - Lead Full Stack Developer

Neolytix

Gurugram, India (Hybrid)
1 Month ago
London stock Exchange - Software Engineer

London stock Exchange

Paris, Île-de-France, France (On-Site)
2 Months ago
Easybrain - Product Manager

Easybrain

Limassol, Limassol, Cyprus (Hybrid)
11 Months ago
lifechruh - Filmmaker

lifechruh

Edmond, Oklahoma, United States (On-Site)
3 Months ago
Nice - Contact Centre Operations

Nice

Pune, Maharashtra, India (Hybrid)
1 Month ago
Oliver Plus - Integrated Designer (French Fluent)

Oliver Plus

South Africa (Remote)
3 Months ago
Side - Functionality Quality Assurance Test Lead (FQA Lead) - Gaming

Side

Braga, Braga, Portugal (On-Site)
4 Weeks ago
Aesir Interactive - Game Producer (Senior)

Aesir Interactive

Munich, Bavaria, Germany (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Austin, Texas, United States

Open Systems Technologies - Forklift Material Handler

Open Systems Technologies

Duluth, Georgia, United States (On-Site)
1 Month ago
Roblox - Senior Software Engineer, Shopping

Roblox

San Mateo, California, United States (On-Site)
1 Month ago
Celestial AI - Package Reliability Engineer

Celestial AI

Santa Clara, California, United States (On-Site)
1 Month ago
Mattel Inc - Loyalty Program Manager

Mattel Inc

El Segundo, California, United States (On-Site)
3 Months ago
Arketa - Part-Time Recruiter

Arketa

United States (Remote)
1 Month ago
Next Level Business Services - SAP MM Consultant

Next Level Business Services

Commerce, California, United States (On-Site)
10 Months ago
luxsoft - Sr. Hogan IDS Developer

luxsoft

United States (Remote)
3 Months ago
Bosch Group - Regional Sales Manager - Southwest

Bosch Group

Las Vegas, Nevada, United States (On-Site)
1 Month ago
WebFX - Junior Customer Success Analyst (Data Focus)

WebFX

Harrisburg, Pennsylvania, United States (On-Site)
4 Months ago
Aptive - Consolidations Manager

Aptive

Troy, Michigan, United States (On-Site)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Synechron - .NET Developer (Azure AD, Azure Cloud, C#, .NET Core)

Synechron

Pune, Maharashtra, India (On-Site)
1 Year ago
Rackspace Technology - Senior Solution Architect (Applications)

Rackspace Technology

England, United Kingdom (Hybrid)
2 Months ago
Gusto - Staff Machine Learning Engineer - Platform

Gusto

Denver, Colorado, United States (Remote)
3 Weeks ago
Salesforce - Platform Solution Engineer

Salesforce

Brussels, Brussels, Belgium (Hybrid)
1 Month ago
extreme network - Cloud Operations Engineer – Monitoring Lead

extreme network

Vaughan, Ontario, Canada (Hybrid)
2 Months ago
C3 IoT - Solution Engineer

C3 IoT

Paris, Île-de-France, France (On-Site)
1 Month ago
GoTo Group - Cloud Security Manager

GoTo Group

Jakarta, Jakarta, Indonesia (On-Site)
5 Months ago
PhonePe - SRE - 2 (Big Data)

PhonePe

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Capgemini - AWS SDK Engineer

Capgemini

Bengaluru, Karnataka, India (On-Site)
2 Months ago
CME Group - Staff Infrastructure Engineer

CME Group

Tokyo, Japan (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Take-Two Interactive Software, Inc. is a leading developer, publisher, and marketer of interactive entertainment for consumers around the globe. We develop and publish products principally through Rockstar Games, 2K, and Zynga. Our products are designed for console gaming systems, PC, and mobile, including smartphones and tablets. We deliver our products through physical retail, digital download, online platforms, and cloud streaming services. For more information, visit

New York, United States (Hybrid)

New York, New York, United States (Hybrid)

Massachusetts, United States (Remote)

Massachusetts, United States (Remote)

California, United States (Hybrid)

New York, United States (Remote)

New York, United States (Hybrid)

New York, United States (Hybrid)

Vancouver, British Columbia, Canada (Hybrid)

New York, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Take-Two Interactive