About the Team ByteDance Doubao Large Model Team was established in 2023, dedicated to developing the most advanced AI large model technology in the industry, becoming a world-class research team, and contributing to the development of technology and society. The Doubao large model team has a long-term vision and determination in the field of AI, with research directions covering NLP, CV, speech, etc. They have laboratories and research positions in China, Singapore, the US and other places. The team relies on sufficient data, computing and other resources on the platform, continuously invests in related fields, and has launched self-developed general large models, providing MultiModal Machine Learning capabilities. Downstream support includes 50 + businesses such as Doubao, Coze, Dreamina, and is open to enterprise customers through Volcengine. Currently, Doubao APP has become the largest AIGC application in the Chinese market. 1. Assume responsibility for the design and development of components associated with the storage of machine learning systems, catering to diverse business scenarios of large model inference (LLM/S2S/VLM/multimodal, etc.). This includes model distribution and loading, KVCache optimization, enhancement of data IO performance, and improvement of TTFT and TBT in LLM serving 2. Take charge of designing and implementing a multi-level storage system for large model inference. Comprehensively utilize various media, including HBM, host memory, distributed disk, and remote large-capacity storage systems (HDFS/object storage) for data storage and migration management. Realize an integrated hierarchical system of "near-compute cache + remote large-capacity storage". 3. Be accountable for optimizing the hit rate of large model KV Cache. Formulate customized optimization strategies from multiple system dimensions, such as the inference framework, traffic scheduling, and multi-level cache. Optimize data IO performance by fully leveraging NVLink, RDMA high-speed network, and GPU Direct technologies on the near-compute side to achieve efficient data transmission. Optimize the storage strategy of data replicas to achieve a reasonable distribution of load traffic and stored data. 4. Undertake the design and implementation of efficient and user-friendly data access interfaces. Realize seamless docking with the inference framework, and manage the lifecycle of KV Cache. 5. Be responsible for the access, management, operation and maintenance, and monitoring of the multi-level storage system in the Kubernetes scenario to ensure stability. 6. Assume the task of system setup and disaster recovery in multi-datacenter, multi-region, and multi-cloud scenarios, and optimize data placement across clusters.