Senior Director, Site Reliability Engineering, Technical Operations Center & Observability

1 Month ago • All levels • Devops

Job Summary

Job Description

Take-Two Interactive Software, Inc. is seeking a Senior Director of Site Reliability Engineering (SRE), Technical Operations Center (TOC), and Observability. This role involves leading global teams to ensure the reliability, scalability, and performance of critical systems across cloud and on-premise environments. Responsibilities include overseeing SRE, TOC, and enterprise observability strategy, implementing proactive monitoring, managing incident response, and maintaining platform stability. The ideal candidate will possess strong leadership and technical expertise to drive operational excellence, minimize downtime, and deliver a seamless experience. This includes establishing SLOs/SLIs, developing a 24/7 incident response model, driving root cause analysis, partnering with engineering teams, and championing automation.
Must have:
  • Lead global SRE/TOC teams for system reliability
  • Oversee enterprise observability initiatives (logging, monitoring, tracing)
  • Establish SLOs/SLIs and reliability metrics
  • Develop 24/7 incident response and command practices
  • Drive root cause analysis (RCA) and continuous improvement
  • Partner with engineering and infrastructure teams
  • Own and optimize TOC operations
  • Champion automation and self-healing systems
  • Hire, mentor, and develop global technical teams
  • Experience with cloud platforms (AWS, GCP, Azure)
  • Experience with observability tools (Datadog, Prometheus, Grafana, etc.)
  • Proficiency in IaC tools (Terraform, CloudFormation, Ansible)
  • Experience with CI/CD and DevOps tooling
  • Experience with container orchestration (Kubernetes, Docker)
  • Strong analytical and decision-making skills
  • Ability to lead multi-functional teams during incidents
Good to have:
  • Deep understanding of hybrid and on-prem infrastructure
  • Familiarity with incident response tools (PagerDuty, ServiceNow)
  • Understanding of networking fundamentals
  • Solid understanding of security standards and compliance
Perks:
  • Fitness allowance
  • Employee discount programs
  • Free games & events
  • Stocked pantries
  • Great Company Culture
  • Growth opportunities
  • Work Hard, Play Hard events
  • Medical (HSA & FSA), dental, vision
  • 401(k) with company match
  • Employee stock purchase plan
  • Commuter benefits
  • In-house wellness program
  • Learning & development opportunities
  • Charitable giving platform with company match

Job Details

Who We Are:

Headquartered in New York City, Take-Two Interactive Software, Inc. is a leading developer, publisher, and marketer of interactive entertainment for consumers around the globe. We develop and publish products principally through Rockstar Games, 2K, and Zynga. Our products are designed for console gaming systems, PC, and mobile, including smartphones and tablets. We deliver our products through physical retail, digital download, online platforms, and cloud streaming services. The Company’s common stock is publicly traded on NASDAQ under the symbol TTWO. For more corporate and product information please visit our website at http://www.take2games.com.

While our offices (physical and virtual) are casual and inviting, we are deeply committed to our core tenets of creativity, innovation and efficiency, and individual and team development opportunities. Our industry and business are continually evolving and fast-paced, providing numerous opportunities to learn and hone your skills. We work hard, but we also like to have fun, and believe that we provide a great place to come to work each day to pursue your passions.

 

The Challenge:

The Senior Director of SRE/TOC and Observability will lead global teams responsible for the reliability, scalability, and performance of critical systems across both cloud and on-prem environments. This role is responsible for Site Reliability Engineering, Technical Operations Center (TOC), and enterprise observability strategy, ensuring proactive monitoring, incident response, and platform stability. The ideal candidate combines deep technical expertise with strong leadership skills to drive operational excellence, minimize downtime, and deliver a seamless experience to internal and external collaborators.

What You’ll Take On:

  • Provide strategic leadership for global Site Reliability Engineering (SRE) and Technical Operations Center (TOC) teams, ensuring high availability and resilience of critical systems.
  • Supervise enterprise-wide observability initiatives, including logging, monitoring, tracing, and alerting frameworks to improve system visibility and incident response.
  • Establish and implement SLOs/SLIs, performance baselines, and reliability metrics aligned with business goals.
  • Develop and scale a 24/7 incident response model, including incident command practices, on-call rotations, and critical issue protocols.
  • Drive root cause analysis (RCA) and continuous improvement processes following major incidents.
  • Partner with engineering, infrastructure, and security teams to embed reliability and operational standard methodologies into system design and delivery pipelines.
  • Own and optimize TOC operations, including real-time monitoring, alert triage, and first-line response to critical issues.
  • Champion automation, tooling, and self-healing systems to reduce manual interventions and improve uptime.
  • Hire, mentor, and develop a high-performing team across multiple geographies and time zones.
  • Collaborate with product and business partners to align operational strategies with customer needs and growth plans.
  • Track and report on platform health, incident trends, and reliability critical metrics to executive leadership.

What You Bring:

Infrastructure & Cloud:

  • Deep experience with cloud platforms: AWS, GCP, and/or Azure
  • Proven understanding of hybrid and on-prem infrastructure (VMware, bare metal, etc.)
  • Expertise in high-availability architecture, scalability, and disaster recovery planning

Monitoring & Observability:

  • Hands-on experience with observability tools: Datadog, Prometheus, Grafana, New Relic, Splunk, ELK stack, or similar
  • Building and tuning SLOs/SLIs, alerting thresholds, and dashboards

Automation & DevOps:

  • Proficiency in Infrastructure as Code (IaC) tools: Terraform, CloudFormation, Ansible
  • Understanding of CI/CD pipelines and DevOps tooling (e.g., Jenkins, GitLab CI, ArgoCD)
  • Experience with container orchestration platforms: Kubernetes, Docker, Helm

Incident Management & TOC Operations:

  • Experience with incident response tools: PagerDuty, ServiceNow
  • Familiarity with incident command processes, RCA frameworks, and postmortem best practices
  • Understanding of networking fundamentals, DNS, load balancing, and traffic routing

Security & Compliance:

  • Solid understanding of security standard processes, access control, and vulnerability management
  • Awareness of compliance standards (e.g., SOC 2, ISO 27001, HIPAA) relevant to operational reliability

Leadership & Communication:

  • Strong analytical and decision-making skills under pressure
  • Ability to lead multi-functional teamwork during high-severity incidents
  • Experience scaling and mentoring global technical teams

 

What We Offer You:

  • Great Company Culture. Ranked as one of the most creative and innovative places to work, creativity, innovation, efficiency, diversity and philanthropy are among the core tenets of our organization and are integral drivers of our continued success.
  • Growth: As a global entertainment company, we pride ourselves on creating environments where employees are encouraged to be themselves, inquisitive, collaborative and to grow within and around the company.
  • Work Hard, Play Hard. Our employees bond, blow-off steam, and flex some creative muscles – through corporate boot camp classes, company parties, game release events, monthly socials, and team challenges.
  • Benefits. Medical (HSA & FSA), dental, vision, 401(k) with company match, employee stock purchase plan, commuter benefits, in-house wellness program, broad learning & development opportunities, a charitable giving platform with company match and more!
  • Perks. Fitness allowance, employee discount programs, free games & events and stocked pantries.

Please be aware that Take-Two does not conduct job interviews or make job offers over third-party messaging apps such as Telegram, WhatsApp, or others. Take-Two also does not engage in any financial exchanges during the recruitment or onboarding process, and the Company will never ask a candidate for their personal or financial information over an app or other unofficial chat channel. Any attempt to do so may be the result of a scam or phishing exercise. Take-Two’s in-house recruitment team will only contact individuals through their official Company email addresses (i.e., via a take2games.com email domain). If you need to report an issue or otherwise have questions, please contact Careers@take2games.com

As an equal opportunity employer, Take-Two Interactive Software, Inc. (“Take-Two”) is committed to fostering and celebrating the diverse thoughts, cultures, and backgrounds of its talent, partners, and communities throughout its organization. Consistent with this commitment, Take-Two does not discriminate or retaliate against any employee or job applicant because of their race, color, religion, sex (including pregnancy, sexual orientation, and gender identity), national origin, age, disability, and genetic information (including family medical history), or on the basis of any other trait protected by applicable law. If you need to report a concern or have questions regarding Take-Two’s equal opportunity commitment, please contact Careers@take2games.com.

 

Similar Jobs

Coupa - Sr. IT Support Specialist

Coupa

Bogota, Colombia (Hybrid)
2 Months ago
Everlaw - Senior Product Marketing Manager

Everlaw

Oakland, California, United States (Hybrid)
6 Days ago
Putnam - Principal, Value Communications (HTA and Market Access)

Putnam

Westport, County Mayo, Ireland (Hybrid)
2 Months ago
Make - Senior Data Engineer

Make

Olomouc, Olomouc Region, Czechia (On-Site)
2 Months ago
InMobiInMobi - Senior Associate - People Operations

InMobiInMobi

Bengaluru, Karnataka, India (On-Site)
4 Days ago
Apple - Sr. Software Engineer - Cloud Platform, Kubernetes (ASE)

Apple

Cupertino, California, United States (On-Site)
1 Month ago
Tennr - Solutions Engineer

Tennr

New York, New York, United States (On-Site)
3 Months ago
Nice - Senior Software Engineer (.Net, AWS)

Nice

Pune, Maharashtra, India (Hybrid)
2 Weeks ago
Veeam Software - Solution Engineer

Veeam Software

Singapore, Singapore (On-Site)
2 Months ago
Applike - (Senior) DevOps Engineer

Applike

Hamburg, Hamburg, Germany (Hybrid)
3 Years ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Lionbridge Games - AI Program Director

Lionbridge Games

(Remote)
5 Months ago
TransUnion - Sr. Counsel- US Financial Services

TransUnion

Chicago, Illinois, United States (Hybrid)
2 Months ago
bytedance - Senior Software Engineer - Network Security

bytedance

San Jose, California, United States (On-Site)
3 Months ago
Lighthouse Studios - Mid-Senior Toon Boom Animators (Rick and Morty & Top Secret Series)

Lighthouse Studios

Kilkenny, County Kilkenny, Ireland (On-Site)
3 Months ago
Marsh McLennan - Bank Systems Administrator

Marsh McLennan

Warsaw, Masovian Voivodeship, Poland (Hybrid)
2 Months ago
Match Group - Central Sr. People Director

Match Group

Dallas, Texas, United States (Hybrid)
2 Weeks ago
Adtran - Student (System Verification Test)

Adtran

Gdynia, Pomeranian Voivodeship, Poland (Hybrid)
1 Week ago
beghou consulting - Sr. Consultant

beghou consulting

Emeryville, California, United States (Hybrid)
4 Months ago
Palo Alto Networks - Sr. Manager, Software Firewall CSP OEM Partnerships GTM (Global)

Palo Alto Networks

Santa Clara, California, United States (On-Site)
1 Month ago
Mendix - Product Analyst

Mendix

Rotterdam, South Holland, Netherlands (Hybrid)
3 Days ago

Get notifed when new similar jobs are uploaded

Jobs in Austin, Texas, United States

Roblox - Data Scientist / Senior Data Scientist - Social Communities

Roblox

San Mateo, California, United States (On-Site)
1 Month ago
Qualcomm - Compute Chipset Project Engineer

Qualcomm

San Diego, California, United States (On-Site)
2 Months ago
Jam City - Senior Game Designer

Jam City

Los Angeles, California, United States (Remote)
1 Month ago
beghou consulting - Associate Manager, Life Sciences Commercial Strategy & Operations

beghou consulting

New York, New York, United States (Hybrid)
1 Month ago
upwork - Senior Platform Engineer

upwork

United States (Remote)
2 Weeks ago
Next Level Business Services - Collibra Lead

Next Level Business Services

Dallas, Texas, United States (On-Site)
9 Months ago
Scanline VFX - Research Scientist

Scanline VFX

Los Angeles, California, United States (Hybrid)
8 Months ago
TFL Group - Director of Partnerships

TFL Group

Overland Park, Kansas, United States (On-Site)
6 Months ago
Mercury - Senior Release Engineer

Mercury

San Francisco, California, United States (Remote)
2 Weeks ago
Adyen - Enterprise Account Manager

Adyen

San Francisco, California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

WebMD - Site Reliability Engineer

WebMD

Boise, Idaho, United States (On-Site)
1 Month ago
Palo Alto Networks - Senior Manager, DevOps Engineering (Cortex)

Palo Alto Networks

Santa Clara, California, United States (On-Site)
1 Week ago
PhonePe - Server Administrator (Devops and Linux)

PhonePe

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Tencent - Tencent Cloud - Senior Cloud Architect (R&D & Solution Design)

Tencent

Singapore (On-Site)
8 Months ago
Apple - Senior DevOps Engineer - Retail Engineering POS

Apple

Sunnyvale, California, United States (On-Site)
3 Weeks ago
Xepelin - Senior DevOps Engineer

Xepelin

Buenos Aires, Buenos Aires, Argentina (Remote)
1 Year ago
Lambda - Technical Solutions Enablement Engineer

Lambda

San Francisco, California, United States (Hybrid)
3 Months ago
InMobiInMobi - SDE III - Devops

InMobiInMobi

Bengaluru, Karnataka, India (On-Site)
2 Months ago
bytedance - Infrastructure Software Engineer in Edge Cloud

bytedance

Seattle, Washington, United States (On-Site)
3 Months ago
Nice - Senior Cloud Database Engineer

Nice

Southampton, England, United Kingdom (Hybrid)
2 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Take-Two Interactive Software, Inc. is a leading developer, publisher, and marketer of interactive entertainment for consumers around the globe. We develop and publish products principally through Rockstar Games, 2K, and Zynga. Our products are designed for console gaming systems, PC, and mobile, including smartphones and tablets. We deliver our products through physical retail, digital download, online platforms, and cloud streaming services. For more information, visit

New York, United States (On-Site)

Austin, Texas, United States (On-Site)

Las Vegas, Nevada, United States (On-Site)

New York, United States (On-Site)

New York, United States (On-Site)

New York, United States (On-Site)

Massachusetts, United States (Remote)

London, England, United Kingdom (Hybrid)

Massachusetts, United States (Remote)

Texas, United States (Remote)

View All Jobs

Get notified when new jobs are added by Take-Two Interactive