GPGPU Performance Tooling Engineer
rivos
Job Summary
We are seeking a talented individual to work on software aimed at enhancing the Deep Learning ecosystem and assisting hardware engineers in building advanced Deep Learning parallel systems. The role focuses on designing and implementing features for the Perfetto framework, enabling users to more efficiently measure code performance. This position may involve working on the lower-level libraries that support performance data collection. The candidate will gain technical and organizational skills from experienced professionals, learning to write performant and readable code, manage projects effectively, and collaborate with the Open Source community. We are strong advocates for Open Source and Free software, contributing our improvements back to the projects we utilize.
Must Have
- Experience with Perfetto profiling framework
- Ability to write code in C or C++
- Experience with Protobuf
- Understanding of computer architecture
- Strong problem-solving skills
- Excellent communication skills
Good to Have
- Experience profiling GPGPU architectures
- Familiarity with deep learning frameworks
- Ability to write code in Rust
- Coursework/experience with Machine Learning
Job Description
Responsibilities
- Develop and modify the Open Source Perfetto framework to enable software developers to improve performance of their code.
- Work on underlying libraries and drivers to enable performance data collection.
- Ensure performance monitoring overhead is minimized
- Build tooling to facilitate measuring performance in different scenarios (on simulators, FPGAs, or real hardware)
- Write unit tests and benchmark tools to validate the performance and correctness of your changes.
- Stay current with advancements in the field.
Requirements
- Experience with Perfetto profiling framework
- Ability to write code in C or C++
- Experience with Protobuf
- Understanding of computer architecture
- Strong problem-solving skills and ability to work in a fast-paced, collaborative environment.
- Excellent skills in problem solving, written and verbal communication
- Strong organization skills, and highly self-motivated.
- Ability to work well in a team and be productive under aggressive schedules.
Optional Requirements
- Experience with profiling and optimizing low-level performance (memory bandwidth, latency, throughput) on GPGPU architectures.
- Familiarity with deep learning frameworks (TensorFlow, PyTorch, etc.).
- Ability to write code in Rust
- Coursework or experience with Machine Learning algorithms
Education and Experience
- Bachelor’s, Master’s, or PhD in Computer Engineering, Software Engineering or Computer Science