This AI System and Software Research Intern role focuses on GPU/NPU software development and optimization, including implementing high-performance kernels and profiling for bottlenecks. The intern will also prototype, develop, and tune Robotics AI systems, collaborating with algorithm and hardware teams to deploy models with real-time constraints. Additionally, the role involves research into Agentic Systems, designing and accelerating KV Cache for large model inference and exploring agent-based inference frameworks in robotic AI scenarios. Candidates should be pursuing a Master's or Ph.D. in relevant fields, proficient in C/C++ and Python, with solid CUDA experience and familiarity with AI models and profiling tools.
Good To Have:- Experience with KV Cache, attention mechanism optimization, or model compression (quantization, pruning, distillation)
- Hands-on work with Agentic/agent based AI frameworks (e.g., ReAct, Tool Use, Auto GPT)
- Development experience on NPUs or other heterogeneous accelerators
- Contributions to open source projects such as TensorRT, ONNX Runtime, OneAPI
- Linux system tuning, driver development, or low-level hardware interface knowledge
Must Have:- Currently enrolled in a Master's or Ph.D. program (Computer Science, Electrical Engineering, AI, Mathematics, or related fields)
- Proficient in C/C++ and Python
- Solid understanding of the CUDA programming model
- 1 year of hands-on CUDA experience (kernel development, streams, memory management, optimization)
- Experience with profiling tools such as Nsight, VTune, Perf, TensorBoard
- Familiarity with Transformers, CNNs, RNNs and typical performance bottlenecks during inference
- Good reading/writing skills in English
- Effective teamwork across multidisciplinary groups
- Strong passion for pushing extreme boundaries of GPU/NPU acceleration, robotics AI, and Agentic systems