CCL(Collective Communication Library) Lead Engineer
4 Hours ago • 10 Years +
Software Development & Engineering
Job Description
This role involves architecting and implementing a new collective communication library tailored for the NPU's unique architecture and topology. Key responsibilities include defining the technical vision, API, and performance targets for the communication library, as well as driving hardware-software co-design to influence future NPU and interconnect architecture. The position requires expertise in high-performance systems software development and collective communication algorithms.
Good To Have:
A Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture)
Prior experience building a high-performance communication library (e.g., NCCL, MPI) or parallel runtime from the ground up
Experience with performance analysis and optimization for AI accelerators (GPUs, TPUs, or other NPUs) and their specific interconnects (e.g., NVLink, CXL, RoCE)
Must Have:
Master’s degree in Computer Science, Computer Engineering, or a related field
Minimum of 10 years of professional experience in high-performance systems software development
Strong collaboration and problem-solving skills for complex technical issues
Expert-level understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
Full-stack knowledge, from CPU/accelerator architecture and OS internals to the packet level of networking fabrics like RDMA/RoCE
Deep understanding of high-radix interconnect topologies and Network-on-Chip (NoC) architectures
Proven experience leading significant software projects with a track record of delivering complex, high-performance, and reliable software
Add these skills to join the top 1% applicants for this job
performance-analysis
talent-acquisition
game-texts
networking
algorithms
Responsibilities and Opportunities
Architecting and implementing a new collective communication library from scratch, specifically engineered for our NPU’s unique architecture and topology
Defining the technical vision, API, and performance targets for the communication library
Driving the hardware-software co-design process to influence future NPU and interconnect architecture
Key Qualifications
Master’s degree in Computer Science, Computer Engineering, or a related field
Minimum of 10 years of professional experience in high-performance systems software development
Strong collaboration and problem-solving skills for complex technical issues
Expert-level understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
Full-stack knowledge, from CPU/accelerator architecture and OS internals to the packet level of networking fabrics like RDMA/RoCE
Deep understanding of high-radix interconnect topologies and Network-on-Chip (NoC) architectures.
Proven experience leading significant software projects with a track record of delivering complex, high-performance, and reliable software
Ideal Qualifications
A Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture)
Prior experience building a high-performance communication library (e.g., NCCL, MPI) or parallel runtime from the ground up
Experience with performance analysis and optimization for AI accelerators (GPUs, TPUs, or other NPUs) and their specific interconnects (e.g., NVLink, CXL, RoCE)
The selection process may vary by job and may change depending on the schedule and circumstances.
The selection schedule and results will be individually notified via the email address you provided when applying.
Notes
This announcement may close early if recruitment is completed.
If there are false facts in the application, admission may be canceled.
Employment may be restricted if the legal qualifications required for recruitment and job performance are not met.
Being a veteran or a person with a disability does not negatively affect the hiring process.
The scope of duties may change considering the candidate's overall career and experience. If such a change is necessary, it will be communicated with the candidate at an appropriate time before the final offer of employment.
For inquiries related to recruitment, please contact the email address below.
recruit@rebellions.ai
Set alerts for more jobs like CCL(Collective Communication Library) Lead Engineer
Set alerts for new jobs by Rebellions
Set alerts for new Software Development & Engineering jobs in South Korea
Set alerts for new jobs in South Korea
Set alerts for Software Development & Engineering (Remote) jobs