Research Scientist in Large Model System

bytedance

| Seattle, Washington, United States of America (On Site) | Full Time | 1 months ago

Apply Now

Job Summary

ByteDance is seeking a Research Scientist in Large Model System to develop and optimize large-scale machine learning systems for their proprietary general-purpose model with multimodal capabilities. This role involves designing and developing system architecture, solving technical difficulties like high concurrency and scalability, covering various sub-directions of ML systems, researching advanced technologies, and collaborating with algorithm teams to jointly optimize algorithms and systems. The ideal candidate will contribute to applications in search, recommendation, advertising, content creation, conversation, and customer service.

Must Have

Excellent coding ability
Solid foundation in data structures and basic algorithms
Proficient in C/C++ or Python
Familiar with at least one mainstream machine learning framework (TensorFlow/PyTorch)
Master the principles of distributed systems
Strong sense of responsibility, good learning ability, communication ability, and self-motivation
Good communication and collaboration skills

Good to Have

Prior experience in large-scale projects or papers with great influence in the field of large models
Familiar with LLM, CV-related algorithms, and technologies
Experienced in large model training and RL algorithms
Experience in CUDA, RDMA, AI Infrastructure, HW/SW Co-Design, High-Performance Computing, ML Hardware Architecture (GPU, Accelerators, Networking), ML for System, and Distributed Storage

Perks & Benefits

Day one access to medical, dental, and vision insurance
401(k) savings plan with company match
Paid parental leave
Short-term and long-term disability coverage
Life insurance
Wellbeing benefits
10 paid holidays per year
10 paid sick days per year
17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure)

Job Description

Responsibilities

Leveraging substantial data and computing resources and through continued investment in these domains, we have developed a proprietary general-purpose model with multimodal capabilities. In the Chinese market, Doubao models power over 50 ByteDance apps and business lines, including Doubao, Coze, and Dreamina, and is available to external enterprise clients via Volcano Engine. Today, the Doubao app stands as the most widely used AIGC application in China. Responsibilities Responsible for the machine learning system development of the company's large-scale models, researching new applications and solutions of related technologies in areas such as search, recommendation, advertising, content creation, conversation, and customer service, meeting the growing demand for intelligent interaction from users, and comprehensively improving users' lifestyles and communication methods in the future world. The main work directions include:

1. Responsible for the design and development of the architecture of large-scale machine learning systems, solving technical difficulties such as high concurrency, high reliability, and high scalability of the system.

2. Covering various sub-directions of machine learning system, including resource scheduling, model training, model inference, data management, and workflow orchestration.

3. Responsible for the research and introduction of advanced technologies in machine learning systems, such as the latest hardware architecture, heterogeneous computing systems, and compiler-based optimization technologies.

4. Working closely with the algorithm teams to optimize the algorithm and system jointly.

Qualifications

Minimum Qualifications:

Excellent coding ability, solid foundation in data structures and basic algorithms, proficient in C/C++ or Python
Familiar with at least one mainstream machine learning framework (TensorFlow/PyTorch).
Master the principles of distributed systems.
Strong sense of responsibility, good learning ability, communication ability, and self-motivation.
Good communication and collaboration skills, able to explore new technologies with the team and promote technological progress.

Preferred Qualifications:

Prior experience in large-scale projects or papers with great influence in the field of large models.
Familiar with LLM, CV-related algorithms, and technologies, and experienced in large model training and RL algorithms.
Experience in one of the following fields: CUDA, RDMA, AI Infrastructure, HW/SW Co-Design, High-Performance Computing, ML Hardware Architecture (GPU, Accelerators, Networking), ML for System, and Distributed Storage.

12 Skills Required For This Role

Communication Data Analytics Cpp Data Structures Game Texts Cuda Networking Pytorch Python Algorithms Tensorflow Machine Learning

Similar Jobs

Research Development

Software Engineer, BigQuery AI Developer Experience

Google • Kirkland, Washington, United States of America (On Site)