CCL(Collective Communication Library) Lead Engineer

Rebellions

Job Summary

This role involves architecting and implementing a new collective communication library tailored for the NPU's unique architecture and topology. Key responsibilities include defining the technical vision, API, and performance targets for the communication library, as well as driving hardware-software co-design to influence future NPU and interconnect architecture. The position requires expertise in high-performance systems software development and collective communication algorithms.

Must Have

  • Master’s degree in Computer Science, Computer Engineering, or a related field
  • Minimum of 10 years of professional experience in high-performance systems software development
  • Strong collaboration and problem-solving skills for complex technical issues
  • Expert-level understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
  • Full-stack knowledge, from CPU/accelerator architecture and OS internals to the packet level of networking fabrics like RDMA/RoCE
  • Deep understanding of high-radix interconnect topologies and Network-on-Chip (NoC) architectures
  • Proven experience leading significant software projects with a track record of delivering complex, high-performance, and reliable software

Good to Have

  • A Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture)
  • Prior experience building a high-performance communication library (e.g., NCCL, MPI) or parallel runtime from the ground up
  • Experience with performance analysis and optimization for AI accelerators (GPUs, TPUs, or other NPUs) and their specific interconnects (e.g., NVLink, CXL, RoCE)

Job Description

Responsibilities and Opportunities

  • Architecting and implementing a new collective communication library from scratch, specifically engineered for our NPU’s unique architecture and topology
  • Defining the technical vision, API, and performance targets for the communication library
  • Driving the hardware-software co-design process to influence future NPU and interconnect architecture

Key Qualifications

  • Master’s degree in Computer Science, Computer Engineering, or a related field
  • Minimum of 10 years of professional experience in high-performance systems software development
  • Strong collaboration and problem-solving skills for complex technical issues
  • Expert-level understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
  • Full-stack knowledge, from CPU/accelerator architecture and OS internals to the packet level of networking fabrics like RDMA/RoCE
  • Deep understanding of high-radix interconnect topologies and Network-on-Chip (NoC) architectures.
  • Proven experience leading significant software projects with a track record of delivering complex, high-performance, and reliable software

Ideal Qualifications

  • A Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture)
  • Prior experience building a high-performance communication library (e.g., NCCL, MPI) or parallel runtime from the ground up
  • Experience with performance analysis and optimization for AI accelerators (GPUs, TPUs, or other NPUs) and their specific interconnects (e.g., NVLink, CXL, RoCE)

Application Process

  • Document screening - Online interview (including coding test) - Culture-fit interview - On-site interview - Compensation negotiation - Final acceptance
  • The selection process may vary by job and may change depending on the schedule and circumstances.
  • The selection schedule and results will be individually notified via the email address you provided when applying.

Notes

  • This announcement may close early if recruitment is completed.
  • If there are false facts in the application, admission may be canceled.
  • Employment may be restricted if the legal qualifications required for recruitment and job performance are not met.
  • Being a veteran or a person with a disability does not negatively affect the hiring process.
  • The scope of duties may change considering the candidate's overall career and experience. If such a change is necessary, it will be communicated with the candidate at an appropriate time before the final offer of employment.
  • For inquiries related to recruitment, please contact the email address below.
  • recruit@rebellions.ai

5 Skills Required For This Role

Performance Analysis Talent Acquisition Game Texts Networking Algorithms