Production System Engineer, Infrastructure Engineering

1 Hour ago • All levels

Job Summary

Job Description

As a Production Systems Engineer, you'll contribute to the stability, efficiency, and scalability of data center and server operations globally. Your responsibilities will encompass the entire server lifecycle, from design and deployment to retirement. You'll develop automation tools, monitor infrastructure health, troubleshoot complex issues, and participate in disaster recovery efforts. Collaboration with cross-functional teams is essential to understand business objectives and implement innovative solutions. This role also includes on-call support for critical issues in the production environment. You will be part of the Infrastructure Engineering team supporting the company's fast growth.
Must have:
  • Enhance stability, efficiency, and scalability of data center operations.
  • Participate in the entire lifecycle of the server fleet.
  • Develop and deploy automation tools for server management.
  • Monitor and improve datacenter infrastructure health.
  • Troubleshoot and resolve complex technical issues in high-pressure environments.

Job Details

About the Team The Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale datacenters. The team manages the end to end lifecycle of server fleet, providing cloud solutions and various infrastructure services ensuring that they are scalable and are reliable. Embark on an exciting expedition to explore the rapidly expanding ByteDance domain in the United States, Europe, and Asia. Here, the Infrastructure Engineering team is crafting monumental data citadels that encircle the planet, sheltering legions of hundreds of thousands of servers. As the maestro of our production systems, you will embark on a captivating odyssey, taming the life cycles of these servers. Your adventure will begin with the orchestration of their initial deployment, navigating the intricate terrain of OS installation, summoning services like a digital magician, and maintaining vigilant watch over our inventory. But, like any epic tale, there will be times of challenge when you become a troubleshooter extraordinaire, mending and restoring with unwavering dedication. Eventually, you'll guide them into the sunset, orchestrating their decommissioning and ensuring their rebirth through recycling, all while contributing to the pulsating rhythm of ByteDance's technological evolution. Key Responsibilities: - Operation: As a Production Systems Engineer, your mission is to contribute to enhancing the stability, efficiency, effectiveness, and scalability of our data center and server operations, platform, and service on a worldwide scale. - Lifecycle Enhancement: Participate in and enhance the entire lifecycle of the server fleet - from system design/introduction consultation to launch reviews, deployment, operation, and retirement. - Automation: Develop and deploy tools and solutions to enhance the automation, reliability, scalability, and operability of servers in the datacenter. - Monitoring: Develop and deploy tools and solutions for improving the availability, latency, and overall service of the datacenter infrastructure, server, and network health. - Disaster Recovery: Troubleshoot and resolve complex technical issues in a high-pressure, time-sensitive environment. Conduct high-level root-cause analysis for service interruption and establish preventive measures. Practice sustainable incident response and postmortem. - Cross-team Collaboration: Collaborate with stakeholders such as infrastructure architects, project managers, data center operations engineers, platform developers, supply chain teams, and our internal customers to comprehend overarching business objectives. Additionally, you will have the chance to design and implement innovative solutions for our Core IDCs and CDN/Edge. - On-call: Engage in our on-call support spanning across regions and incident response teams to address critical issues in the production environment.

Similar Jobs

Ziff Davis - QA Engineer

Ziff Davis

Canada (Remote)
3 Weeks ago
NVIDIA - Senior VLSI Integration Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (Hybrid)
3 Months ago
Easybrain - Senior Data Engineer

Easybrain

Cyprus (On-Site)
10 Months ago
CRB workforce  - Senior Cloud Engineer

CRB workforce

Salt Lake City, Utah, United States (On-Site)
1 Week ago
Turbulent - Senior DevOps Engineer

Turbulent

Montreal, Quebec, Canada (On-Site)
1 Month ago
Nagarro - Associate Principal Engineer, DevOps

Nagarro

India (Remote)
7 Months ago
Zazz - Java Developer

Zazz

(Remote)
3 Months ago
Nintendo - CONTRACT - Sr Engineer (NTD)

Nintendo

Redmond, Washington, United States (On-Site)
6 Months ago
Google - Customer Engineer, Google Workspace

Google

Istanbul, İstanbul, Türkiye (On-Site)
1 Month ago
ION - Senior DevSecOps Engineer, Italy

ION

London, England, United Kingdom (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Fortra - SOC Analyst

Fortra

United States (On-Site)
1 Week ago
Interactive Brokers - Java Software Engineer

Interactive Brokers

Tallinn, Harju County, Estonia (Hybrid)
1 Week ago
Qualcomm - Senior Devops Engineer

Qualcomm

Hyderabad, Telangana, India (On-Site)
2 Days ago
Riot Games - Principal Software Engineer - VALORANT, Foundations, Build Platforms

Riot Games

Los Angeles, California, United States (On-Site)
8 Months ago
Sinch - Team Lead/Lead System Engineer - System & Operations

Sinch

Delhi, India (On-Site)
2 Weeks ago
Lockwood - DevOps Engineer

Lockwood

Nottingham, England, United Kingdom (On-Site)
4 Weeks ago
Brillio - DB Migration Engineer - R01531207

Brillio

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
Booming games - PostgreSQL Developer

Booming games

Prague, Prague, Czechia (On-Site)
1 Month ago
CrowdStricke - Lead Threat Hunter

CrowdStricke

United States (Remote)
1 Week ago
Aptive - QA Engineer (Python automation, SDET)

Aptive

Bengaluru, Karnataka, India (Hybrid)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Singapore

ByteDance - Legal Counsel, Global AI Products

ByteDance

Singapore (On-Site)
1 Month ago
Illuminia - Engineer 2, Product Lifecycle (Mechanical)

Illuminia

Singapore, Singapore (On-Site)
3 Weeks ago
ByteDance - System Engineer, STE Intern - 2025 Start

ByteDance

Singapore (On-Site)
1 Month ago
The Walt Disney Company - MarkOps Consultant - Contract

The Walt Disney Company

Singapore, Singapore (On-Site)
6 Months ago
Sandbox VR - Brand Ambassador

Sandbox VR

Singapore (On-Site)
7 Months ago
Animoca Brands - Investment and Strategic Partnership Associate/Manager

Animoca Brands

Singapore, Singapore (Hybrid)
2 Months ago
Tencent - Global Talent Sourcing Intern 103811

Tencent

Singapore (On-Site)
5 Months ago
Garena - Site Reliability Engineer/Senior Site Reliability Engineer

Garena

Singapore (On-Site)
1 Month ago
ByteDance - Technical Program Manager, Security Engineering

ByteDance

Singapore (On-Site)
1 Month ago
Illuminia - Senior / Staff Software Engineer (Instruments)

Illuminia

Singapore, Singapore (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Google - Software Developer III, Site Reliability Development

Google

Waterloo, Ontario, Canada (On-Site)
1 Month ago
ByteDance - Backend Software Engineer - Foundational Technology

ByteDance

Singapore (On-Site)
2 Months ago
Rebellion - Senior DevOps Engineer (AWS/Azure)

Rebellion

Oxford, England, United Kingdom (Hybrid)
1 Month ago
NVIDIA - Senior Site Reliability Engineer

NVIDIA

Westford, Massachusetts, United States (On-Site)
2 Months ago
Dream Sports - SDE 2 - DevOps

Dream Sports

Mumbai, Maharashtra, India (On-Site)
1 Month ago
Google - Software Engineer III, Infrastructure and Operations

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Month ago
The Walt Disney Company - Senior Software Engineer - Scala

The Walt Disney Company

New York, New York, United States (On-Site)
1 Month ago
NVIDIA - Senior Site Reliability Engineer - AI Research Clusters

NVIDIA

Hyderabad, Telangana, India (Hybrid)
1 Month ago
Buckman - Senior Lead Digital Innovation Engineer - Solution Architect

Buckman

Chennai, Tamil Nadu, India (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.

Taguig, Metro Manila, Philippines (On-Site)

San Jose, California, United States (On-Site)

Ho Chi Minh City, Vietnam (On-Site)

San Diego, California, United States (On-Site)

Singapore (On-Site)

San Jose, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by bytedance

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug