AI System and Software Research Intern

31 Minutes ago • 1 Years +
Research Development

Job Description

This AI System and Software Research Intern role focuses on GPU/NPU software development and optimization, including implementing high-performance kernels and profiling for bottlenecks. The intern will also prototype, develop, and tune Robotics AI systems, collaborating with algorithm and hardware teams to deploy models with real-time constraints. Additionally, the role involves research into Agentic Systems, designing and accelerating KV Cache for large model inference and exploring agent-based inference frameworks in robotic AI scenarios. Candidates should be pursuing a Master's or Ph.D. in relevant fields, proficient in C/C++ and Python, with solid CUDA experience and familiarity with AI models and profiling tools.
Good To Have:
  • Experience with KV Cache, attention mechanism optimization, or model compression (quantization, pruning, distillation)
  • Hands-on work with Agentic/agent based AI frameworks (e.g., ReAct, Tool Use, Auto GPT)
  • Development experience on NPUs or other heterogeneous accelerators
  • Contributions to open source projects such as TensorRT, ONNX Runtime, OneAPI
  • Linux system tuning, driver development, or low-level hardware interface knowledge
Must Have:
  • Currently enrolled in a Master's or Ph.D. program (Computer Science, Electrical Engineering, AI, Mathematics, or related fields)
  • Proficient in C/C++ and Python
  • Solid understanding of the CUDA programming model
  • 1 year of hands-on CUDA experience (kernel development, streams, memory management, optimization)
  • Experience with profiling tools such as Nsight, VTune, Perf, TensorBoard
  • Familiarity with Transformers, CNNs, RNNs and typical performance bottlenecks during inference
  • Good reading/writing skills in English
  • Effective teamwork across multidisciplinary groups
  • Strong passion for pushing extreme boundaries of GPU/NPU acceleration, robotics AI, and Agentic systems
Perks:
  • Hybrid work model

Add these skills to join the top 1% applicants for this job

team-management
cpp
game-texts
cuda
react
prototyping
linux
python

Job Description:

  • GPU/NPU Software Development and Optimization
  • Implement high performance kernels, operators, and libraries for GPU/NPU.
  • Profile with Nsight Systems/Compute, VTune, Perf, TensorBoard, etc., identify bottlenecks and apply code level optimizations.
  • Robotics AI System Prototyping, Development and Tuning
  • Collaborate with Algorithm and Hardware teams to deploy various models on development platforms (GPU/NPU-based) with real time performance constraints.
  • Build automated benchmarks, generate performance reports, and propose optimization strategies.
  • Agentic System Research (KV-Cache etc.)
  • Design, implement, and accelerate KV Cache etc. for large model inference.
  • Explore and prototype Agentic (agent based, self adapting) inference frameworks evaluate them in robotic AI scenarios.

Qualifications:

  • Currently enrolled in a Master's or Ph.D. program (Computer Science, Electrical Engineering, AI, Mathematics, or related fields).
  • Proficient in C/C++ and Python; ability to write clean, maintainable code.
  • Solid understanding of the CUDA programming model; 1year of hands on CUDA experience (kernel development, streams, memory management, optimization).
  • Experience with profiling tools such as Nsight, VTune, Perf, TensorBoard, etc.
  • Familiarity with Transformers, CNNs, RNNs and the typical performance bottlenecks during inference.
  • Good reading/writing skills in English; effective teamwork across multidisciplinary groups.
  • Strong passion for pushing extreme boundaries of GPU/NPU acceleration, robotics AI, and Agentic systems.

Skills as Plus:

  • Experience with KV Cache, attention mechanism optimization, or model compression (quantization, pruning, distillation).
  • Hands on work with Agentic/agent based AI frameworks (e.g., ReAct, Tool Use, Auto GPT).
  • Development experience on NPUs or other heterogeneous accelerators.
  • Contributions to open source projects such as TensorRT, ONNX Runtime, OneAPI, etc.
  • Linux system tuning, driver development, or low level hardware interface knowledge.

Set alerts for more jobs like AI System and Software Research Intern
Set alerts for new jobs by Intel
Set alerts for new Research Development jobs in China
Set alerts for new jobs in China
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙