Senior Director, Site Reliability Engineering, Technical Operations Center & Observability

10 Hours ago • All levels • Devops

Job Summary

Job Description

Take-Two Interactive Software, Inc. is seeking a Senior Director of Site Reliability Engineering (SRE), Technical Operations Center (TOC), and Observability. This role involves leading global teams to ensure the reliability, scalability, and performance of critical systems across cloud and on-premise environments. Responsibilities include overseeing SRE, TOC, and enterprise observability strategy, implementing proactive monitoring, managing incident response, and maintaining platform stability. The ideal candidate will possess strong leadership and technical expertise to drive operational excellence, minimize downtime, and deliver a seamless experience. This includes establishing SLOs/SLIs, developing a 24/7 incident response model, driving root cause analysis, partnering with engineering teams, and championing automation.
Must have:
  • Lead global SRE/TOC teams for system reliability
  • Oversee enterprise observability initiatives (logging, monitoring, tracing)
  • Establish SLOs/SLIs and reliability metrics
  • Develop 24/7 incident response and command practices
  • Drive root cause analysis (RCA) and continuous improvement
  • Partner with engineering and infrastructure teams
  • Own and optimize TOC operations
  • Champion automation and self-healing systems
  • Hire, mentor, and develop global technical teams
  • Experience with cloud platforms (AWS, GCP, Azure)
  • Experience with observability tools (Datadog, Prometheus, Grafana, etc.)
  • Proficiency in IaC tools (Terraform, CloudFormation, Ansible)
  • Experience with CI/CD and DevOps tooling
  • Experience with container orchestration (Kubernetes, Docker)
  • Strong analytical and decision-making skills
  • Ability to lead multi-functional teams during incidents
Good to have:
  • Deep understanding of hybrid and on-prem infrastructure
  • Familiarity with incident response tools (PagerDuty, ServiceNow)
  • Understanding of networking fundamentals
  • Solid understanding of security standards and compliance
Perks:
  • Fitness allowance
  • Employee discount programs
  • Free games & events
  • Stocked pantries
  • Great Company Culture
  • Growth opportunities
  • Work Hard, Play Hard events
  • Medical (HSA & FSA), dental, vision
  • 401(k) with company match
  • Employee stock purchase plan
  • Commuter benefits
  • In-house wellness program
  • Learning & development opportunities
  • Charitable giving platform with company match

Job Details

Who We Are:

Headquartered in New York City, Take-Two Interactive Software, Inc. is a leading developer, publisher, and marketer of interactive entertainment for consumers around the globe. We develop and publish products principally through Rockstar Games, 2K, and Zynga. Our products are designed for console gaming systems, PC, and mobile, including smartphones and tablets. We deliver our products through physical retail, digital download, online platforms, and cloud streaming services. The Company’s common stock is publicly traded on NASDAQ under the symbol TTWO. For more corporate and product information please visit our website at http://www.take2games.com.

While our offices (physical and virtual) are casual and inviting, we are deeply committed to our core tenets of creativity, innovation and efficiency, and individual and team development opportunities. Our industry and business are continually evolving and fast-paced, providing numerous opportunities to learn and hone your skills. We work hard, but we also like to have fun, and believe that we provide a great place to come to work each day to pursue your passions.

 

The Challenge:

The Senior Director of SRE/TOC and Observability will lead global teams responsible for the reliability, scalability, and performance of critical systems across both cloud and on-prem environments. This role is responsible for Site Reliability Engineering, Technical Operations Center (TOC), and enterprise observability strategy, ensuring proactive monitoring, incident response, and platform stability. The ideal candidate combines deep technical expertise with strong leadership skills to drive operational excellence, minimize downtime, and deliver a seamless experience to internal and external collaborators.

What You’ll Take On:

  • Provide strategic leadership for global Site Reliability Engineering (SRE) and Technical Operations Center (TOC) teams, ensuring high availability and resilience of critical systems.
  • Supervise enterprise-wide observability initiatives, including logging, monitoring, tracing, and alerting frameworks to improve system visibility and incident response.
  • Establish and implement SLOs/SLIs, performance baselines, and reliability metrics aligned with business goals.
  • Develop and scale a 24/7 incident response model, including incident command practices, on-call rotations, and critical issue protocols.
  • Drive root cause analysis (RCA) and continuous improvement processes following major incidents.
  • Partner with engineering, infrastructure, and security teams to embed reliability and operational standard methodologies into system design and delivery pipelines.
  • Own and optimize TOC operations, including real-time monitoring, alert triage, and first-line response to critical issues.
  • Champion automation, tooling, and self-healing systems to reduce manual interventions and improve uptime.
  • Hire, mentor, and develop a high-performing team across multiple geographies and time zones.
  • Collaborate with product and business partners to align operational strategies with customer needs and growth plans.
  • Track and report on platform health, incident trends, and reliability critical metrics to executive leadership.

What You Bring:

Infrastructure & Cloud:

  • Deep experience with cloud platforms: AWS, GCP, and/or Azure
  • Proven understanding of hybrid and on-prem infrastructure (VMware, bare metal, etc.)
  • Expertise in high-availability architecture, scalability, and disaster recovery planning

Monitoring & Observability:

  • Hands-on experience with observability tools: Datadog, Prometheus, Grafana, New Relic, Splunk, ELK stack, or similar
  • Building and tuning SLOs/SLIs, alerting thresholds, and dashboards

Automation & DevOps:

  • Proficiency in Infrastructure as Code (IaC) tools: Terraform, CloudFormation, Ansible
  • Understanding of CI/CD pipelines and DevOps tooling (e.g., Jenkins, GitLab CI, ArgoCD)
  • Experience with container orchestration platforms: Kubernetes, Docker, Helm

Incident Management & TOC Operations:

  • Experience with incident response tools: PagerDuty, ServiceNow
  • Familiarity with incident command processes, RCA frameworks, and postmortem best practices
  • Understanding of networking fundamentals, DNS, load balancing, and traffic routing

Security & Compliance:

  • Solid understanding of security standard processes, access control, and vulnerability management
  • Awareness of compliance standards (e.g., SOC 2, ISO 27001, HIPAA) relevant to operational reliability

Leadership & Communication:

  • Strong analytical and decision-making skills under pressure
  • Ability to lead multi-functional teamwork during high-severity incidents
  • Experience scaling and mentoring global technical teams

 

What We Offer You:

  • Great Company Culture. Ranked as one of the most creative and innovative places to work, creativity, innovation, efficiency, diversity and philanthropy are among the core tenets of our organization and are integral drivers of our continued success.
  • Growth: As a global entertainment company, we pride ourselves on creating environments where employees are encouraged to be themselves, inquisitive, collaborative and to grow within and around the company.
  • Work Hard, Play Hard. Our employees bond, blow-off steam, and flex some creative muscles – through corporate boot camp classes, company parties, game release events, monthly socials, and team challenges.
  • Benefits. Medical (HSA & FSA), dental, vision, 401(k) with company match, employee stock purchase plan, commuter benefits, in-house wellness program, broad learning & development opportunities, a charitable giving platform with company match and more!
  • Perks. Fitness allowance, employee discount programs, free games & events and stocked pantries.

Please be aware that Take-Two does not conduct job interviews or make job offers over third-party messaging apps such as Telegram, WhatsApp, or others. Take-Two also does not engage in any financial exchanges during the recruitment or onboarding process, and the Company will never ask a candidate for their personal or financial information over an app or other unofficial chat channel. Any attempt to do so may be the result of a scam or phishing exercise. Take-Two’s in-house recruitment team will only contact individuals through their official Company email addresses (i.e., via a take2games.com email domain). If you need to report an issue or otherwise have questions, please contact Careers@take2games.com

As an equal opportunity employer, Take-Two Interactive Software, Inc. (“Take-Two”) is committed to fostering and celebrating the diverse thoughts, cultures, and backgrounds of its talent, partners, and communities throughout its organization. Consistent with this commitment, Take-Two does not discriminate or retaliate against any employee or job applicant because of their race, color, religion, sex (including pregnancy, sexual orientation, and gender identity), national origin, age, disability, and genetic information (including family medical history), or on the basis of any other trait protected by applicable law. If you need to report a concern or have questions regarding Take-Two’s equal opportunity commitment, please contact Careers@take2games.com.

 

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Austin, Texas, United States

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Devops Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Take-Two Interactive Software, Inc. is a leading developer, publisher, and marketer of interactive entertainment for consumers around the globe. We develop and publish products principally through Rockstar Games, 2K, and Zynga. Our products are designed for console gaming systems, PC, and mobile, including smartphones and tablets. We deliver our products through physical retail, digital download, online platforms, and cloud streaming services. For more information, visit

New York, New York, United States (On-Site)

New York, United States (On-Site)

Vancouver, British Columbia, Canada (On-Site)

Vancouver, British Columbia, Canada (On-Site)

London, England, United Kingdom (Hybrid)

Massachusetts, United States (Remote)

Austin, Texas, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Take-Two Interactive