CCL(Collective Communication Library) Lead Engineer

4 Hours ago • 10 Years +
Software Development & Engineering

Job Description

This role involves architecting and implementing a new collective communication library tailored for the NPU's unique architecture and topology. Key responsibilities include defining the technical vision, API, and performance targets for the communication library, as well as driving hardware-software co-design to influence future NPU and interconnect architecture. The position requires expertise in high-performance systems software development and collective communication algorithms.
Good To Have:
  • A Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture)
  • Prior experience building a high-performance communication library (e.g., NCCL, MPI) or parallel runtime from the ground up
  • Experience with performance analysis and optimization for AI accelerators (GPUs, TPUs, or other NPUs) and their specific interconnects (e.g., NVLink, CXL, RoCE)
Must Have:
  • Master’s degree in Computer Science, Computer Engineering, or a related field
  • Minimum of 10 years of professional experience in high-performance systems software development
  • Strong collaboration and problem-solving skills for complex technical issues
  • Expert-level understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
  • Full-stack knowledge, from CPU/accelerator architecture and OS internals to the packet level of networking fabrics like RDMA/RoCE
  • Deep understanding of high-radix interconnect topologies and Network-on-Chip (NoC) architectures
  • Proven experience leading significant software projects with a track record of delivering complex, high-performance, and reliable software

Add these skills to join the top 1% applicants for this job

performance-analysis
talent-acquisition
game-texts
networking
algorithms

Responsibilities and Opportunities

  • Architecting and implementing a new collective communication library from scratch, specifically engineered for our NPU’s unique architecture and topology
  • Defining the technical vision, API, and performance targets for the communication library
  • Driving the hardware-software co-design process to influence future NPU and interconnect architecture

Key Qualifications

  • Master’s degree in Computer Science, Computer Engineering, or a related field
  • Minimum of 10 years of professional experience in high-performance systems software development
  • Strong collaboration and problem-solving skills for complex technical issues
  • Expert-level understanding of collective communication algorithms (e.g., All-Reduce, All-Gather, Reduce-Scatter) and their performance characteristics
  • Full-stack knowledge, from CPU/accelerator architecture and OS internals to the packet level of networking fabrics like RDMA/RoCE
  • Deep understanding of high-radix interconnect topologies and Network-on-Chip (NoC) architectures.
  • Proven experience leading significant software projects with a track record of delivering complex, high-performance, and reliable software

Ideal Qualifications

  • A Ph.D. in a related field (HPC, Parallel Computing, Computer Architecture)
  • Prior experience building a high-performance communication library (e.g., NCCL, MPI) or parallel runtime from the ground up
  • Experience with performance analysis and optimization for AI accelerators (GPUs, TPUs, or other NPUs) and their specific interconnects (e.g., NVLink, CXL, RoCE)

Application Process

  • Document screening - Online interview (including coding test) - Culture-fit interview - On-site interview - Compensation negotiation - Final acceptance
  • The selection process may vary by job and may change depending on the schedule and circumstances.
  • The selection schedule and results will be individually notified via the email address you provided when applying.

Notes

  • This announcement may close early if recruitment is completed.
  • If there are false facts in the application, admission may be canceled.
  • Employment may be restricted if the legal qualifications required for recruitment and job performance are not met.
  • Being a veteran or a person with a disability does not negatively affect the hiring process.
  • The scope of duties may change considering the candidate's overall career and experience. If such a change is necessary, it will be communicated with the candidate at an appropriate time before the final offer of employment.
  • For inquiries related to recruitment, please contact the email address below.
  • recruit@rebellions.ai

Set alerts for more jobs like CCL(Collective Communication Library) Lead Engineer
Set alerts for new jobs by Rebellions
Set alerts for new Software Development & Engineering jobs in South Korea
Set alerts for new jobs in South Korea
Set alerts for Software Development & Engineering (Remote) jobs
Contact Us
hello@outscal.com
Made in INDIA 💛💙