Python Software Engineering Intern, Accelerated LLM Data Applications - Fall 2025

1 Month ago • Upto 1 Years • Research & Development

Job Summary

Job Description

NVIDIA seeks a Python Software Engineering Intern to accelerate data engineering for Large Language Models (LLMs). The intern will develop and optimize Python-based data processing frameworks for GPU-accelerated environments, contributing to RAPIDS and other GPU-accelerated libraries. Responsibilities include designing and implementing components for Retrieval Augmented Generation (RAG) pipelines, benchmarking algorithms, and collaborating with LLM & ML researchers. The ideal candidate possesses strong Python skills, familiarity with LLMs and RAG pipelines, experience with PyData and ML/DL ecosystems, and a passion for optimization and iterative development. The internship involves working with large datasets, optimizing for speed and cost, and improving system accuracy through various techniques.
Must have:
  • Python library development experience
  • Familiarity with LLMs and RAG pipelines
  • Understanding of PyData & ML/DL ecosystems
  • Contributions to open-source projects
Good to have:
  • Experience with production-level data pipelines
  • Experience with software packaging technologies
  • Familiarity with Docker-Compose, Kubernetes
  • Knowledge of parallel programming in CUDA C++
Perks:
  • Intern benefits

Job Details

Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work. Come join the team and see how we can make a lasting impact on the world.

Come join the team and see how you can make a lasting impact on the world! NVIDIA is seeking a Python Software Engineer Intern to further our efforts to GPU-accelerate data engineering for Large Language Model (LLM) tools and libraries. This role is pivotal in accelerating pre-processing pipelines for high-quality multi-modal dataset curation. The day to day focus is on developing efficient, scalable systems for de-duplicating, filtering, and classifying training corpora for foundation model LLMs, as well as ingesting and prepping datasets for use in Retrieval Augmented Generation (RAG) pipelines. Fundamental to these efforts are iterative testing and improvement in system cost, speed, & accuracy through micro-optimization, prompt engineering, fine tuning, and applying new research. The ideal candidate is happiest releasing early and often! They court user feedback with an ear open to the spirit of related feature requests. You are comfortable objectively evaluating the latest AI models and frameworks with an eye on acceleration potential. Would you like to run your training & test experiments on our supercomputers on thousands of GPU? Come work with us!

What you'll be doing:

  • Develop and optimize Python-based data processing frameworks, ensuring efficient handling of large datasets on GPU-accelerated environments, vital for LLM training.

  • Contribute to the design and implementation of RAPIDS and other GPU-accelerated libraries, focusing on seamless integration and performance enhancement in the context of LLM training data preparation and RAG pipelines.

  • Lead development and iterative optimization of components for RAG pipelines, ensuring they demonstrate GPU acceleration & the best performing models for improved TCO.

  • Collaborate with teams of LLM & ML researchers in the development of full-stack, GPU-accelerated data preparation pipelines for multimodal models Implement benchmarking, profiling, and optimization of innovative algorithms in Python in various system architectures, specifically targeting LLM applications.

  • Work closely with diverse teams to understand requirements, build & evaluate POCs, and develop roadmaps for production level tools and library features within the growing LLM ecosystem.

What we need to see:

  • Pursuing a MS or PhD in Computer Science, Computer Engineering, or a related field.

  • Python library development experience, including CI systems (GitHub Actions), integration testing, benchmarking, & profiling

  • Familiarity with LLMs and RAG pipelines: prompt engineering, LangChain, llama-index

  • Understanding of the PyData & ML/DL ecosystems, including RAPIDS, Pandas, numpy, scikit-learn, XGBoost, Numba, PyTorch

  • Familiarity with distributed programming frameworks like Dask, Apache Spark, or Ray

  • Visible contributions to open-source projects on GitHub

Ways to stand out from the crowd:

  • Active engagement (published papers, conference talks, blogs) in the data science community

  • Experience with production-level data pipelines, especially SQL-based

  • Experience with software packaging technologies: pip, conda, Docker images

  • Familiarity with Docker-Compose, Kubernetes, and Cloud deployment frameworks

  • Knowledge of parallel programming approaches, especially in CUDA C++

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

The hourly rate for our interns is 18 USD - 71 USD. Our internship hourly rates are a standard pay determined based on the position and your location, year in school, degree, and experience.

You will also be eligible for Intern benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Google - Senior Software Engineer, Machine Learning (Recommendations, Rankings, and Predictions)

Google

Mountain View, California, United States (On-Site)
1 Month ago
NVIDIA - Senior Tool and Methodology Development Software Engineer

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
1 Month ago
Google - Software Engineer, Machine Learning

Google

New Taipei, New Taipei City, Taiwan (On-Site)
1 Month ago
Google - Student Researcher, BS/MS, Winter/Summer 2025

Google

(On-Site)
6 Months ago
Rivos - SOC Electrical Analysis Engineer - Full Time

Rivos

Hsinchu, Hsinchu City, Taiwan (Hybrid)
7 Months ago
Google - Lead CPU Design Verification Engineer, Silicon

Google

Austin, Texas, United States (On-Site)
1 Month ago
Google - Senior Staff Research Scientist, Google Cloud AI

Google

San Francisco, California, United States (On-Site)
1 Month ago
Google - Software Engineering Manager, Processing and Serving, Google Photos

Google

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Google - CPU Logic Design Engineer

Google

Haifa, Haifa District, Israel (On-Site)
1 Month ago
Riot Games - Technical Product Manager III - Accounts

Riot Games

Los Angeles, California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Virtusa - Data Scientist

Virtusa

Andhra Pradesh, India (On-Site)
8 Months ago
Playrix - Lead Technical Designer

Playrix

Armenia (Remote)
7 Months ago
DraftKings - Senior Backend Engineer

DraftKings

Sofia, Sofia City Province, Bulgaria (Remote)
1 Month ago
Meta - Software Engineer, Systems ML - SW/HW Co-design

Meta

Fremont, California, United States (Remote)
6 Months ago
N-iX - Middle/Senior Data QA Engineer

N-iX

Ukraine (Remote)
1 Month ago
Meta - Research Scientist Intern, Photorealistic Telepresence (PhD)

Meta

Redmond, Washington, United States (On-Site)
6 Months ago
Warner Bros Games - Senior Data Engineer

Warner Bros Games

Atlanta, Georgia, United States (Hybrid)
3 Months ago
NVIDIA - Senior ASIC Verification and Infrastructure Engineer – GPU

NVIDIA

Austin, Texas, United States (Hybrid)
2 Months ago
Google - Software Engineer II, Filestore Control Plane

Google

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
MPOWER Financing - Data Engineer - Data - Bangalore, India

MPOWER Financing

Bengaluru, Karnataka, India (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Tencent - Senior Technical Director

Tencent

Los Angeles, California, United States (On-Site)
2 Months ago
ByteDance - Student Researcher (Doubao (Seed) - Foundation Model) - 2025 Start (PhD)

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Warner Bros Games - Senior Artist, Character

Warner Bros Games

Salt Lake City, Utah, United States (Hybrid)
7 Months ago
Google - Senior Cyber Security Advisor

Google

New York, New York, United States (On-Site)
1 Month ago
Google - Senior Software Engineer, Recommendations, Ads

Google

Mountain View, California, United States (On-Site)
1 Month ago
ByteDance - Software Engineer — Data Security

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Netflix - Data Engineer (L5) - Games

Netflix

United States (Remote)
3 Months ago
Nintendo - Principal, Product Experience - Pokémon

Nintendo

Redmond, Washington, United States (On-Site)
2 Months ago
Riot Games - Manager, Competitive Operations

Riot Games

Los Angeles, California, United States (On-Site)
2 Months ago
Valve corporation - Steam Software Engineer

Valve corporation

Bellevue, Washington, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Research & Development Jobs

NVIDIA - Senior STA Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
2 Months ago
Rivos - Accelerator Design Verification - Full Time

Rivos

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
NVIDIA - Chip Design Architect

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
NVIDIA - Solutions Architect, Networking - Cloud Service Providers

NVIDIA

Redmond, Washington, United States (Hybrid)
1 Month ago
Rockstar Games - Senior UI Programmer

Rockstar Games

Oakville, Ontario, Canada (On-Site)
1 Month ago
Rockstar Games - Software Engineer, C#/Java (All Levels)

Rockstar Games

Edinburgh, Scotland, United Kingdom (On-Site)
8 Months ago
Google - Static Timing Analysis Engineer, FullChip/ASIC Implementation

Google

Mountain View, California, United States (On-Site)
1 Month ago
NVIDIA - Physical Design Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
2 Months ago
Google - Silicon Engineering Manager, Hardware Acceleration

Google

Mountain View, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Massachusetts, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Texas, United States (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Pune, Maharashtra, India (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug