Software Engineer Intern (Inference Infrastructure) - 2026 Start (PHD)

10 Minutes ago • 2 Years + • $118,560 PA - $118,560 PA

Research Development

Job Description

The Inference Infrastructure team at ByteDance is seeking a PhD Software Engineer Intern for 2026 to design and build large-scale, container-based cluster management and orchestration systems for LLM inference. This role involves architecting cloud-native GPU and AI accelerator infrastructure, collaborating on world-class inference solutions using vLLM, SGLang, and TensorRT-LLM, and contributing to open-source communities. Interns will work in a hyper-scale environment, focusing on performance, scalability, and cost-efficiency to enable AI workloads from research to production.

Good To Have:

Experience contributing to or operating large-scale cluster management systems (Kubernetes, Ray).
Experience with workload scheduling, GPU orchestration, scaling, and isolation in production environments.
Hands-on experience with GPU programming (CUDA) or inference engines (vLLM, SGLang, TensorRT-LLM).
Familiarity with public cloud providers (AWS, Azure, GCP) and their ML platforms.
Strong knowledge of ML systems (Ray, DeepSpeed, PyTorch) and distributed training/inference platforms.
Excellent communication skills and ability to collaborate across global, cross-functional teams.
Passion for system efficiency, performance optimization, and open-source innovation.

Must Have:

Design and build large-scale, container-based cluster management and orchestration systems.
Architect next-generation cloud-native GPU and AI accelerator infrastructure.
Collaborate across teams to deliver world-class inference solutions.
Stay current with advances in open source, AI/ML, and LLM infrastructure.
Write high-quality, production-ready code.
B.S./M.S. in Computer Science or related fields with 2+ years experience, or Ph.D. with strong systems/ML publications.
Strong understanding of large model inference, distributed/parallel systems, or high-performance networking.
Hands-on experience building cloud or ML infrastructure.
Solid knowledge of Docker and Kubernetes.
Proficiency in Go, Rust, Python, or C++.

Perks:

Day one access to health insurance
Life insurance
Wellbeing benefits
10 paid holidays per year
Paid sick time (56 hours if hired in first half, 40 if hired in second half)
Housing allowance (for non-100% remote interns)

Add these skills to join the top 1% applicants for this job

cross-functional

communication

data-analytics

cpp

talent-acquisition

game-texts

cuda

networking

aws

rust

azure

pytorch

docker

microservices

kubernetes

python

machine-learning

Responsibilities

About the Team The Inference Infrastructure team is the creator and open-source maintainer of AIBrix, a Kubernetes-native control plane for large-scale LLM inference. We are part of ByteDance’s Core Compute Infrastructure organization, responsible for designing and operating the platforms that power microservices, big data, distributed storage, machine learning training and inference, and edge computing across multi-cloud and global datacenters. With ByteDance’s rapidly growing businesses and a global fleet of machines running hundreds of millions of containers daily, we are building the next generation of cloud-native, GPU-optimized orchestration systems. Our mission is to deliver infrastructure that is highly performant, massively scalable, cost-efficient, and easy to use—enabling both internal and external developers to bring AI workloads from research to production at scale. We are expanding our focus on LLM inference infrastructure to support new AI workloads, and are looking for engineers passionate about cloud-native systems, scheduling, and GPU acceleration. You’ll work in a hyper-scale environment, collaborate with world-class engineers, contribute to the open-source community, and help shape the future of AI inference infrastructure globally. We are looking for talented individuals to join us for an internship in 2026. PhD Internships at ByteDance aim to provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. PhD internships at ByteDance provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. Our dynamic internship experience blends hands-on learning, enriching community-building and development events, and collaboration with industry experts. Applications will be reviewed on a rolling basis - we encourage you to apply early. Please state your availability clearly in your resume (Start date, End date).

Responsibilities

Design and build large-scale, container-based cluster management and orchestration systems with extreme performance, scalability, and resilience.
Architect next-generation cloud-native GPU and AI accelerator infrastructure to deliver cost-efficient and secure ML platforms.
Collaborate across teams to deliver world-class inference solutions using vLLM, SGLang, TensorRT-LLM, and other LLM engines.
Stay current with the latest advances in open source (Kubernetes, Ray, etc.), AI/ML and LLM infrastructure, and systems research; integrate best practices into production systems.
Write high-quality, production-ready code that is maintainable, testable, and scalable.

Qualifications

Minimum Qualifications

B.S./M.S. in Computer Science, Computer Engineering, or related fields with 2+ years of relevant experience (Ph.D. with strong systems/ML publications also considered).
Strong understanding of large model inference, distributed and parallel systems, and/or high-performance networking systems.
Hands-on experience building cloud or ML infrastructure in areas such as resource management, scheduling, request routing, monitoring, or orchestration.
Solid knowledge of container and orchestration technologies (Docker, Kubernetes).
Proficiency in at least one major programming language (Go, Rust, Python, or C++).

Preferred Qualifications

Experience contributing to or operating large-scale cluster management systems (e.g., Kubernetes, Ray).
Experience with workload scheduling, GPU orchestration, scaling, and isolation in production environments.
Hands-on experience with GPU programming (CUDA) or inference engines (vLLM, SGLang, TensorRT-LLM).
Familiarity with public cloud providers (AWS, Azure, GCP) and their ML platforms (SageMaker, Azure ML, Vertex AI).
Strong knowledge of ML systems (Ray, DeepSpeed, PyTorch) and distributed training/inference platforms.
Excellent communication skills and ability to collaborate across global, cross-functional teams.
Passion for system efficiency, performance optimization, and open-source innovation.

Job Information

【For Pay Transparency】Compensation Description (Hourly) - Campus Intern

The hourly rate range for this position in the selected city is $57- $57.

Benefits may vary depending on the nature of employment and the country work location. Interns have day one access to health insurance, life insurance, wellbeing benefits and more. Interns also receive 10 paid holidays per year and paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year). Interns who are not working 100% remote may also be eligible for housing allowance.

The Company reserves the right to modify or change these benefits programs at any time, with or without notice.

For Los Angeles County (unincorporated) Candidates:

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of the conditional offer of employment:

1. Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues;

2. Appropriately handling and managing confidential information including proprietary and trade secret information and access to information technology systems; and

3. Exercising sound judgment.

About Us

Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, CapCut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.

Why Join ByteDance

Inspiring creativity is at the core of ByteDance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day.

As ByteDancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.

Diversity & Inclusion

ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

Reasonable Accommodation

ByteDance is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at https://tinyurl.com/RA-request

Set alerts for more jobs like Software Engineer Intern (Inference Infrastructure) - 2026 Start (PHD)

Set alerts for new jobs by bytedance

Set alerts for new Research Development jobs in United States

Set alerts for new jobs in United States

Set alerts for Research Development (Remote) jobs