This role involves designing and developing monitoring and management tools for Rebellions NPU accelerators, utilizing libraries for GPU hardware control, and creating diagnostic tools for fault detection and performance analysis. The engineer will collaborate with hardware and driver teams, continuously optimizing tools. Key qualifications include a minimum of 6 years in Linux systems engineering, proficiency in Linux OS (CentOS, RHEL, Ubuntu, Debian), C/C++ for low-level programming, and Python for scripting, alongside strong communication and problem-solving skills.
Good To Have:- Understanding of GPU internals, including memory management, clocking behavior, and power states.
- Experience in developing or using debugging tools for performance analysis or fault detection.
- Ability to quickly learn and adapt to new and emerging technologies.
- Ability to work across multiple teams and contribute to cross-functional collaboration.
Must Have:- Design and develop monitoring and management tools for Rebellions NPU accelerators.
- Utilize libraries to access and control GPU hardware features programmatically.
- Develop diagnostic tools for fault detection, performance analysis, and system reliability.
- Collaborate closely with hardware engineers, driver developers, and architects to ensure seamless integration between software tools and hardware components.
- Continuously benchmark and optimize monitoring and management tools to align with market demands.
- Minimum of 6 years of experience in Linux systems engineering.
- Proficiency in Linux operating systems such as CentOS, RHEL (Red Hat Enterprise Linux), Ubuntu, or Debian.
- Proficiency in C/C++ for low-level system programming.
- Proficiency in Python for scripting and extending functionalities.
- Strong written and verbal communication skills, with the ability to deliver effective presentations.
- Excellent problem-solving and collaboration skills.