AI System and Software Research Intern
Intel
Job Summary
This AI System and Software Research Intern role focuses on GPU/NPU software development and optimization, including implementing high-performance kernels and profiling for bottlenecks. The intern will also prototype, develop, and tune Robotics AI systems, collaborating with algorithm and hardware teams to deploy models with real-time constraints. Additionally, the role involves research into Agentic Systems, designing and accelerating KV Cache for large model inference and exploring agent-based inference frameworks in robotic AI scenarios. Candidates should be pursuing a Master's or Ph.D. in relevant fields, proficient in C/C++ and Python, with solid CUDA experience and familiarity with AI models and profiling tools.
Must Have
- Currently enrolled in a Master's or Ph.D. program (Computer Science, Electrical Engineering, AI, Mathematics, or related fields)
- Proficient in C/C++ and Python
- Solid understanding of the CUDA programming model
- 1 year of hands-on CUDA experience (kernel development, streams, memory management, optimization)
- Experience with profiling tools such as Nsight, VTune, Perf, TensorBoard
- Familiarity with Transformers, CNNs, RNNs and typical performance bottlenecks during inference
- Good reading/writing skills in English
- Effective teamwork across multidisciplinary groups
- Strong passion for pushing extreme boundaries of GPU/NPU acceleration, robotics AI, and Agentic systems
Good to Have
- Experience with KV Cache, attention mechanism optimization, or model compression (quantization, pruning, distillation)
- Hands-on work with Agentic/agent based AI frameworks (e.g., ReAct, Tool Use, Auto GPT)
- Development experience on NPUs or other heterogeneous accelerators
- Contributions to open source projects such as TensorRT, ONNX Runtime, OneAPI
- Linux system tuning, driver development, or low-level hardware interface knowledge
Perks & Benefits
- Hybrid work model
Job Description
Job Description:
- GPU/NPU Software Development and Optimization
- Implement high performance kernels, operators, and libraries for GPU/NPU.
- Profile with Nsight Systems/Compute, VTune, Perf, TensorBoard, etc., identify bottlenecks and apply code level optimizations.
- Robotics AI System Prototyping, Development and Tuning
- Collaborate with Algorithm and Hardware teams to deploy various models on development platforms (GPU/NPU-based) with real time performance constraints.
- Build automated benchmarks, generate performance reports, and propose optimization strategies.
- Agentic System Research (KV-Cache etc.)
- Design, implement, and accelerate KV Cache etc. for large model inference.
- Explore and prototype Agentic (agent based, self adapting) inference frameworks evaluate them in robotic AI scenarios.
Qualifications:
- Currently enrolled in a Master's or Ph.D. program (Computer Science, Electrical Engineering, AI, Mathematics, or related fields).
- Proficient in C/C++ and Python; ability to write clean, maintainable code.
- Solid understanding of the CUDA programming model; 1year of hands on CUDA experience (kernel development, streams, memory management, optimization).
- Experience with profiling tools such as Nsight, VTune, Perf, TensorBoard, etc.
- Familiarity with Transformers, CNNs, RNNs and the typical performance bottlenecks during inference.
- Good reading/writing skills in English; effective teamwork across multidisciplinary groups.
- Strong passion for pushing extreme boundaries of GPU/NPU acceleration, robotics AI, and Agentic systems.
Skills as Plus:
- Experience with KV Cache, attention mechanism optimization, or model compression (quantization, pruning, distillation).
- Hands on work with Agentic/agent based AI frameworks (e.g., ReAct, Tool Use, Auto GPT).
- Development experience on NPUs or other heterogeneous accelerators.
- Contributions to open source projects such as TensorRT, ONNX Runtime, OneAPI, etc.
- Linux system tuning, driver development, or low level hardware interface knowledge.
8 Skills Required For This Role
Team Management
Cpp
Game Texts
Cuda
React
Prototyping
Linux
Python