Incident Response Manager - Infrastructure Engineering

3 Months ago • 5-5 Years • Operations

Job Summary

Job Description

ByteDance is seeking an Incident Response Manager - Infrastructure Engineering to join their Data Systems Infrastructure (DSI) team. The IRC team is responsible for quick detection and incident response, thorough investigation of alerts, classification and triage, and business intelligence through rigorous analysis. This role will oversee operations within the IROC across all ByteDance datacenter sites. The ideal candidate will have 5+ years of experience in service center or similar operations, strong knowledge of technical elements, outstanding communication skills, and the ability to work with minimal direction.
Must have:
  • 5+ years experience in service center, or similar 24x7 operations center environment.
  • Experience in a technology company or experience as a team lead, and experience in operation program management.
  • Strong knowledge of technical elements associated with systems such as Server Health, Datacenter Environment and IP Networks.
  • Outstanding verbal and written communication skills required, work with minimal direction, meeting goals, attention to details and an eye for continuous improvements.
  • Good data analytics and presentation skills.
  • Basic working knowledge of data protection policies such as GDPR and the need to keep sensitive information secure.
Good to have:
  • 5+ years of experience as an incident and problem manager.
  • Works well under pressure and within time constraints to solve problems and complete deliverables.
  • Experience with Ticketing, Grafana, Servers and Data Center Systems.
  • Working knowledge and/or certifications in ITIL, CompTIA Server+, Schneider Electric Data Center Certified Associate (DCCA), Data Analytics and Visualization.
  • Knowledge of Cybersecurity, Lenel and Avigilon systems is a plus.

Job Details

Responsibilities
Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok and Helo as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content. Why Join Us Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible. Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. To us, every challenge, no matter how ambiguous, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. Join us. The Data Systems Infrastructure (DSI) team sits within the ByteDance global technology structure and supports the company's fast growth by building and operating hyper-scale datacenters, managing the life cycle of server fleet, providing cloud solutions, and developing various infrastructure services, making sure they are scalable and are reliable. In order to facilitate the hyper-growth of our TikTok Platform, we are launching 3 new datacenters in Europe that will be launched in 2023 and 2024 that will be located in Ireland and Norway The Incident Response Center (IRC) is the first layer of defense responsible for quick detection and incident response using various monitoring and automation tools, conducting thorough investigation of alerts, classification and triage. The Incident Response Manager is responsible for delivering operations within the IROC across all ByteDance datacenter sites in the respective regions. IRC team is expected to respond to all alarms/alerts set in Server Automation Operations System (SAOS), Data Center Infrastructure Management (DCIM) to quickly discover anomalies and engage Subject Matter Expert (SME) teams to start issue triage. The IRC team provides business intelligence through rigorous analysis of alerts and issues which reduce and prevent recurring incidents . Responsibilities - Delivering global operations within the IROC (Incident Response Operation Center) ByteDance datacenter. - First responder and layer of defense responsible for quick detection and incident response using various monitoring and automation tools, conduct thorough investigation of alerts, classification and triage. - Respond to all infrastructure, facilities, security, and safety events notified via various means, such as alarms/alerts set in Server Operations and Maintenance, Datacenter Infrastructure Management, Network & Grafana, and other functions. - Respond to incidents and critical situations in a problem-solving manner, and conduct in-depth investigation of alerts. - Provide insights into the effectiveness of the incident response and recovery process through regular reports - Analyze trends and patterns in events to identify opportunities for improvement and optimization - Monitor the performance of incident response against the agreed-upon SLAs by alerting and notifying stakeholders - Escalation Management notifying or initiating discussions with higher-level support teams engaging in resolution processes - Identify, assess and communicate potential risks arising through event monitoring that could affect customer's service - Support program managers and facilitate project deliverables, improve overall operational security and engineering initiatives
Qualifications
Minimum Qualifications - 5+ years experience in service center, or similar 24x7 operations center environment. - Experience in a technology company or experience as a team lead, and experience in operation program management. - Strong knowledge of technical elements associated with systems such as Server Health, Datacenter Environment and IP Networks. - Outstanding verbal and written communication skills required, work with minimal direction, meeting goals, attention to details and an eye for continuous improvements. - Good data analytics and presentation skills. - Basic working knowledge of data protection policies such as GDPR and the need to keep sensitive information secure. - Willingness to be on call including weekends, nights, and holidays. Preferred Qualifications - 5+ years of experience as an incident and problem manager. - Works well under pressure and within time constraints to solve problems and complete deliverables. - Experience with Ticketing, Grafana, Servers and Data Center Systems. - Working knowledge and/or certifications in ITIL, CompTIA Server+, Schneider Electric Data Center Certified Associate (DCCA), Data Analytics and Visualization. - Knowledge of Cybersecurity, Lenel and Avigilon systems is a plus. ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

Similar Jobs

IGT - Systems Engineer

IGT

Las Vegas, Nevada, United States (Remote)
3 Months ago
The Walt Disney Company - Sr Software Engineer

The Walt Disney Company

Raleigh, North Carolina, United States (On-Site)
3 Weeks ago
Hasbro - Sr Software Development Engineer

Hasbro

Renton, Washington, United States (On-Site)
1 Week ago
Knuddels - System Administrator*in (m/w/d) - remote oder Karlsruhe

Knuddels

Karlsruhe, Baden-Württemberg, Germany (Remote)
7 Months ago
PlayStation Global - Staff Service Reliability Engineer

PlayStation Global

Berlin, Berlin, Germany (On-Site)
4 Months ago
Evolution - French Speaking Game Presenter

Evolution

Spa, Wallonia, Belgium (On-Site)
8 Months ago
blackkitestudios - Production Scheduler

blackkitestudios

London, England, United Kingdom (On-Site)
5 Days ago
DraftKings - Operations Associate

DraftKings

Austin, Texas, United States (On-Site)
1 Month ago
CloudHire - Operations Support Specialist

CloudHire

Medellín, Antioquia, Colombia (Remote)
4 Months ago
Assystems - Formateur Technique Planification Nucléaire H/F

Assystems

Pierrelatte, Auvergne-Rhône-Alpes, France (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

UXBERT Labs - Senior DevOps Engineer

UXBERT Labs

Riyadh, Riyadh Province, Saudi Arabia (Hybrid)
4 Weeks ago
ZeniMax Media - Senior Backend Programmer

ZeniMax Media

Rockville, Maryland, United States (On-Site)
5 Months ago
InMobiInMobi - SDE III - Devops

InMobiInMobi

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Epic Games - Senior Observability Engineer

Epic Games

Cary, North Carolina, United States (On-Site)
1 Month ago
Electronic Arts - System Engineer

Electronic Arts

Hyderabad, Telangana, India (On-Site)
6 Months ago
Alphasense - Join AlphaSense India Talent Community

Alphasense

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Axinous - Staff Software Development Engineer - Java, Kafka, AWS

Axinous

Bengaluru, Karnataka, India (Hybrid)
1 Month ago
Ajmera Infotech - Site Reliability Engineer - Kubernetes

Ajmera Infotech

San Jose, California, United States (On-Site)
3 Weeks ago
HackaJob - Broadcast Support Engineer

HackaJob

Hyderabad, Telangana, India (Hybrid)
5 Months ago
Lululemon - Senior Engineer I - Performance Testing [T500-11941]

Lululemon

Bengaluru, Karnataka, India (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Singapore

Rolls Royce - Service Operations Manager PSB APAC (Commissioning)

Rolls Royce

Singapore (On-Site)
3 Months ago
ByteDance - Training and Mechanism Partner - Global Organisational Culture

ByteDance

Singapore (On-Site)
3 Months ago
The Walt Disney Company - Coordinator, Live Events - 1 Year Contract

The Walt Disney Company

Singapore, Singapore (On-Site)
1 Month ago
PwC - Senior Manager - ASR R&Q Independence Office

PwC

Singapore (On-Site)
4 Months ago
Garena - Associate/Senior Associate, Game Operations

Garena

Singapore (On-Site)
3 Months ago
ByteDance - BytePlus Recommendation Product & Solution Lead

ByteDance

Singapore (On-Site)
3 Months ago
ByteDance - Corporate Level Risk Control Business Partner

ByteDance

Singapore (On-Site)
1 Month ago
OKX - Product Operations, Token Listing

OKX

Singapore, Singapore (On-Site)
4 Months ago
ByteDance - LLM Training Operation (Reasoning and Knowledge), Analyst

ByteDance

Singapore (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Operations Jobs

The Walt Disney Company - Senior Content Distribution Engineer

The Walt Disney Company

New York, New York, United States (On-Site)
1 Month ago
GoTo Group - Area Operations Staff

GoTo Group

Makassar, South Sulawesi, Indonesia (On-Site)
2 Months ago
The Walt Disney Company - MXP Operations Manager

The Walt Disney Company

London, England, United Kingdom (Hybrid)
2 Weeks ago
Trek - Service Advisor/Technician (Part Time)

Trek

New York, New York, United States (On-Site)
1 Month ago
Rank group - Team Lead

Rank group

Swansea, Wales, United Kingdom (On-Site)
1 Month ago
Unity - Partner Relations Manager, Industry

Unity

Austin, Texas, United States (On-Site)
3 Months ago
Tesla - Service Preparation Specialist

Tesla

London, England, United Kingdom (On-Site)
1 Week ago
Sporty Group - IN Associate - Fraud & Risk Operations

Sporty Group

Mumbai, Maharashtra, India (On-Site)
8 Months ago
Tesla - Service Advisor

Tesla

Gelderland, Netherlands (On-Site)
1 Week ago
The Walt Disney Company - Hotel Maintenance Manager

The Walt Disney Company

Chessy, Île-de-France, France (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

About The Company

Where imagination meets innovation, delivering limitless gaming experiences.

Taguig, Metro Manila, Philippines (On-Site)

Singapore (On-Site)

Dubai, Dubai, United Arab Emirates (On-Site)

State Of São Paulo, Brazil (On-Site)

Seattle, Washington, United States (On-Site)

San Jose, California, United States (On-Site)

San Jose, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by ByteDance

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug