Production System Engineer

2 Weeks ago • 3 Years + • Network Engineering • DevOps

Job Summary

Job Description

ByteDance's Infrastructure Engineering team seeks a Production System Engineer to enhance data center operations globally. Responsibilities include improving the lifecycle of infrastructure systems (design to decommissioning), automating processes, monitoring service health, resolving technical issues, collaborating with cross-functional teams, and creating documentation. The role requires strong Linux, automation, and coding skills (Bash, Python, Golang preferred), along with data center and server hardware expertise. On-call participation is expected. The ideal candidate possesses experience in Agile methodologies and project management.
Must have:
  • 3+ years experience in system infrastructure operations
  • Intermediate level server hardware expertise
  • Data center operations experience
  • Proficiency in Linux, Bash, Python, Golang
  • Automation and monitoring tool experience
  • Troubleshooting and problem-solving skills
  • Collaboration and communication skills
Good to have:
  • Golang
  • REST APIs
  • Gin
  • Ansible
  • Load Balancer
  • SQL
  • Hive
  • Hadoop
  • Clickhouse
  • Message Queue
  • Redis

Job Details

Responsibilities
About ByteDance Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content. Why Join Us Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible. Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. To us, every challenge, no matter how ambiguous, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. Join us. About the Team The Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale datacenters. The team manages the end to end lifecycle of server fleet, providing cloud solutions and various infrastructure services ensuring that they are scalable and are reliable. Embark on an exciting expedition to explore the rapidly expanding ByteDance domain in the United States, Europe, and Asia. Here, the Infrastructure Engineering team is crafting monumental data citadels that encircle the planet, sheltering legions of hundreds of thousands of servers. As the maestro of our production systems, you will embark on a captivating odyssey, taming the life cycles of these servers. Your adventure will begin with the orchestration of their initial deployment, navigating the intricate terrain of OS installation, summoning services like a digital magician, and maintaining vigilant watch over our inventory. But, like any epic tale, there will be times of challenge when you become a troubleshooter extraordinaire, mending and restoring with unwavering dedication. Eventually, you'll guide them into the sunset, orchestrating their decommissioning and ensuring their rebirth through recycling, all while contributing to the pulsating rhythm of ByteDance's technological evolution. Key Responsibilities: • Operation: As a Production Systems Engineer, your mission is to contribute to enhancing the quality, reliability, efficiency, effectiveness, and scalability of our data center operations, platform, and service on a worldwide scale. • Lifecycle Improvement: Engage in and improve the whole lifecycle of Infrastructure systems - from system design consulting through to launch reviews, deployment, operation, and refinement. • Automation: Deliver tools and solutions to improve the automation, reliability, scalability, and operability of services. • Monitoring: Deliver tools and solutions to improve monitor availability, latency, and overall service, server infrastructure and network health. • Disaster Recovery: Troubleshoot and resolve complex technical issues in a high-pressure, time-sensitive environment. Conduct high-level root-cause analysis for service interruption and establish preventive measures. Practice sustainable incident response and postmortem. • Cross-team Collaboration: Partner with stakeholders like infrastructure architects, project managers, data center operations engineers, platform developers, supply chain teams, and our internal customers to understand overarching business objectives. You will also have the opportunity to design and implement innovative solutions for our Core IDCs and CDN/Edge Services. • Technical Documentation: Create and maintain standard operating procedures and knowledge bases. • On-call: Participate in our on-call across continents and incident response teams to solve critical problems in production.
Qualifications
Qualifications • Education: Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. • Experience: Minimal 3 years of experience in systems infrastructure operations or related fields, working with data center or CDN production systems and system design/validation. • Server Hardware: We seek individuals with more than just a basic understanding. You should be at an intermediate level, where your hands-on experience in labs or data centers has forged a deep connection with server architecture. • Data Center: An intermediate level of expertise is preferred here. We're on the lookout for those who are well-versed in the intricate details of operations, from small things like OS installations and break-fix to high-impact projects like planning and operations (covering the full infrastructure lifecycle) to the new design-build facilities or renovations to existing systems. • Monitoring: Your knowledge should transcend the ordinary; we prefer intermediate-level skills. We expect you to be a maestro in the orchestration of tools and designs for monitoring server health, network switches, and the power and temperature conditions of the data center. • Automation: We welcome those who have delved into the realm of automation, ideally at an intermediate level. Your qualifications should reflect at least one automation project, showcasing your commitment to streamlining processes. • Linux: In the realm of Linux, we are in search of individuals with intermediate-level proficiency. Your mastery of this operating system should shine brightly. • Coding: As you navigate the digital landscape, fluency in Bash, Python, and Golang is strongly favored. Your coding skills will be your trusty companions on this adventure. • Network: When it comes to networks, we're seeking at least a basic-level understanding. Your ability to chart the course through the network labyrinth is essential. • Communication: Experience managing and coordinating teams in the global environment. • Project Management: Experience in the preparation of project plans and specifications, drafting scopes of work, and managing multiple projects simultaneously. • Experience in Agile methodologies (e.g., Kanban, Scrum) with experience in user stories, sprint planning, and backlog management. • Preferred But Not Required Skills: Golang, REST APIs, Gin, Ansible, Load Balancer, SQL, Hive, Hadoop, Clickhouse, Message Queue, Redis. ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

Similar Jobs

Keywords Studios (Player Support) - Video Game Engine Programmer

Keywords Studios (Player Support)

Tokyo, Japan (Remote)
6 Months ago
Luxoft - Neoxam Consultant

Luxoft

Sydney, New South Wales, Australia (On-Site)
3 Months ago
The Mill - Senior Systems Engineer

The Mill

New York, New York, United States (On-Site)
7 Months ago
Activision - Senior Cloud Security Engineer

Activision

Barcelona, Catalonia, Spain (On-Site)
1 Month ago
Wargaming - System Administrator

Wargaming

Belgrade, Serbia (Hybrid)
1 Month ago
ByteDance - Senior/Tech Lead Network Software Development Engineer, Switch - Seattle

ByteDance

Seattle, Washington, United States (On-Site)
3 Months ago
Trend Micro - (Sr.) Threat Researcher

Trend Micro

Taipei City, Taiwan (On-Site)
4 Months ago
ByteDance - Software Engineer Intern (SDN) - 2025 Summer (PhD)

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
ION - Senior Network Engineer

ION

Clifton, New Jersey, United States (On-Site)
4 Months ago
ByteDance - Network Engineer Graduate (Tech Infra - IaaS) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Playrix - Senior Release Engineer

Playrix

Portugal (Remote)
3 Months ago
Unity - Senior DevOps Engineer

Unity

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
4 Months ago
Technicolor Creative Studios - Supervisor-Compositing

Technicolor Creative Studios

Adelaide, South Australia, Australia (On-Site)
3 Months ago
Modio - Cloud Systems Engineer

Modio

Victoria, Australia (On-Site)
6 Days ago
Paytm - Data Engineering - Data Engineer

Paytm

Noida, Uttar Pradesh, India (On-Site)
2 Months ago
NVIDIA - Senior AI-HPC Storage Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
PwC - Cloud Security Engineering - Senior Manager

PwC

Prague, Prague, Czechia (On-Site)
4 Months ago
NVIDIA - Senior Network Storage Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
1 Month ago
NOVOMATIC - DevOps Engineer (Sportsbetting)

NOVOMATIC

Zabierzów, Lesser Poland Voivodeship, Poland (Hybrid)
5 Months ago
Scopely - Senior 2D Game Artist (Generalist)

Scopely

Bengaluru, Karnataka, India (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Singapore

HP - College Intern - Category Management

HP

Singapore, Singapore (On-Site)
5 Months ago
OKX - Senior Administrative Manager

OKX

Singapore, Singapore (On-Site)
4 Months ago
ByteDance - Data Center Commercial Manager - Data Center Development

ByteDance

Singapore (On-Site)
3 Months ago
PwC - Technology/AI Adoption – Consultant

PwC

Singapore (On-Site)
2 Months ago
ByteDance - Innovation Tech Solution Sales (Enterprise), SEA - BytePlus

ByteDance

Singapore (On-Site)
3 Months ago
PwC - Risk Services - AI Support Specialist

PwC

Singapore (On-Site)
4 Months ago
Sandbox VR - Assistant Store Manager

Sandbox VR

Singapore (On-Site)
4 Months ago
The Walt Disney Company - Manager, Sourcing Compliance - Sustainability Training

The Walt Disney Company

Singapore, Singapore (On-Site)
1 Month ago
Netflix - Senior Manager, Art & Print & Production Operations Lead

Netflix

Singapore, Singapore (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Network Engineering Jobs

ByteDance - Software Development Engineer Graduate, AI/LLM Network (High Speed Network)- 2025 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
ByteDance - Network Engineer, Optical Long-Haul and Submarine

ByteDance

Seattle, Washington, United States (On-Site)
2 Weeks ago
Rackspace Technology - Network Security Engineer I - IN R-20493

Rackspace Technology

Gurugram, Haryana, India (Hybrid)
1 Month ago
ByteDance - Network Engineer, Optical Long-Haul and Submarine

ByteDance

Hillsboro, Oregon, United States (On-Site)
1 Week ago
ByteDance - Software Development Engineer Graduate (Intent-Based Networking) - 2025 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
ByteDance - Software Development Engineer Graduate (SDN Traffic Intelligence & Control) - 2025 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
ByteDance - Software Development Engineer Graduate (Intent-based networking) - 2025 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
3 Months ago
Microsoft - Critical Infrastructure Network Engineer

Microsoft

Singapore (On-Site)
1 Month ago
ByteDance - Senior Network Engineer- Seattle

ByteDance

Seattle, Washington, United States (On-Site)
3 Months ago
Axinous - Senior Network Engineer

Axinous

United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Where imagination meets innovation, delivering limitless gaming experiences.

Taguig, Metro Manila, Philippines (On-Site)

Singapore (On-Site)

Dubai, Dubai, United Arab Emirates (On-Site)

State Of São Paulo, Brazil (On-Site)

Seattle, Washington, United States (On-Site)

San Jose, California, United States (On-Site)

San Jose, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by ByteDance

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug