Jobs Courses Resources Companies Placements

Home >

Jobs >

Senior Distributed Systems Engineer

Pika

California, United States (On-site)

Senior Distributed Systems Engineer

7 Months ago • 3 Years + • Research Development • $175,000 PA - $250,000 PA

Job Summary

Job Description

This Senior Distributed Systems Engineer role involves collaborating with researchers to build and optimize platforms for training next-generation foundation models on massive GPU clusters. Key responsibilities include scaling and optimizing systems for training large-scale models across thousands of GPUs, profiling and enhancing training code performance, developing efficient workload distribution systems, designing robust solutions for handling hardware failures, building diagnostic tools, optimizing inference workloads, implementing high-performance CUDA, Triton, and PyTorch code, and collaborating with researchers on system design. The ideal candidate will have extensive experience in ML pipelines, distributed systems, or high-performance computing, along with proficiency in Python and PyTorch, and expertise in CUDA/Triton programming and optimization techniques. Experience with generative models and prototype development is a plus.

Must have:

3+ years experience in ML pipelines, distributed systems, or HPC
Experience training large models using Python and PyTorch
Expertise in optimizing and deploying inference workloads
Understanding of distributed systems and frameworks (DDP, FSDP, tensor parallelism)
High-performance parallel C++ and custom PyTorch kernels
CUDA and Triton optimization techniques

Good to have:

Experience with generative models (Transformers, Diffusion Models, GANs)
Prototype development (Gradio, Docker)

Perks:

Competitive equity packages (stock options)
Comprehensive benefits plan

7 skills required

7 skills required for this role

Add these skills to join the top 1% applicants for this job

nvidia-nsight

python

docker

pytorch

foundation

cuda

scalability

Job Details

We are seeking highly skilled engineers with expertise in machine learning, distributed systems, and high-performance computing to join our Research team. In this role, you will collaborate closely with researchers to build and optimize platforms that train next-generation foundation models on massive GPU clusters. Your work will play a critical role in advancing the efficiency and scalability of cutting-edge generative AI technologies.

Key Responsibilities

Scale and optimize systems for training large-scale models across multi-thousand GPU clusters.
Profile and enhance the performance of training codebases to achieve best-in-class hardware efficiency.
Develop systems to distribute workloads efficiently across massive GPU clusters.
Design and implement robust solutions to enable model training in the presence of hardware failures.
Build tools to diagnose issues, visualize processes, and evaluate datasets at scale.
Optimize and deploy inference workloads for throughput and latency across the entire stack, including data processing, model inference, and parallel processing.
Implement and improve high-performance CUDA, Triton, and PyTorch code to address efficiency bottlenecks in memory, speed, and utilization.
Collaborate with researchers to ensure systems are designed with optimal efficiency from the ground up.
Prototype cutting-edge applications using multimodal generative AI.

Qualifications

Experience:
- 3+ years of professional experience in ML pipelines, distributed systems, or high-performance computing.
- Hands-on experience training large models using Python and PyTorch, with familiarity in the full pipeline: data processing, loading, training, and inference.
- Proven expertise in optimizing and deploying inference workloads, with experience in profiling GPU/CPU code (e.g., Nvidia Nsight).
- Deep understanding of distributed systems and frameworks, such as DDP, FSDP, and tensor parallelism.
- Strong experience writing high-performance parallel C++ and custom PyTorch kernels, with knowledge of CUDA and Triton optimization techniques.
- Bonus: Experience with generative models (e.g., Transformers, Diffusion Models, GANs) and prototype development (e.g., Gradio, Docker).
Technical Skills:
- Proficiency in Python, with significant experience using PyTorch.
- Advanced skills in CUDA/Triton programming, including custom kernel development and tensor core optimization.
- Strong generalist software engineering skills and familiarity with distributed and parallel computing systems.

Note: This position is not intended for recent graduates.

Compensation

The salary range for this role in California is $175,000–$250,000 per year. Actual compensation will depend on job-related knowledge, skills, experience, and candidate location. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan.

Similar Jobs

Senior Audio Programmer

Epic Games

Montreal, Quebec, Canada (On-Site)

• 5 Months ago

Senior Technical Consultant - Endur

ION

London, England, United Kingdom (On-Site)

• 9 Months ago

Software Engineer 2

Microsoft

Belgrade, Serbia (On-Site)

• 6 Months ago

Senior Generalist Programmer - Montreal

Snowed In Studios

Quebec, Canada (Remote)

• 8 Months ago

Go Lang C++ Developer

Next Level Business Services

Dallas, Texas, United States (On-Site)

• 9 Months ago

Compensation Partner II

Riot Games

Dublin, County Dublin, Ireland (On-Site)

• 8 Months ago

Software Systems Engineer - Software Health and Complexity

Zoox

Foster City, California, United States (Hybrid)

• 9 Months ago

Research Scientist Intern, Photorealistic Telepresence (PhD)

Senior Thermal Design Engineer

NVIDIA

Shanghai, Shanghai, China (On-Site)

• 4 Months ago

Advanced Analog SW Developer - Intern

NXP

Brno, South Moravian Region, Czechia (On-Site)

• 9 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Site Reliability Engineer [Game Security]

Ubisoft

Düsseldorf, North Rhine-Westphalia, Germany (Hybrid)

• 5 Months ago

Core Software Engineer

Maxis Studios

Edmonton, Alberta, Canada (On-Site)

• 5 Months ago

Ad Delivery Algorithm Intern - Game

ByteDance

Singapore (On-Site)

• 4 Months ago

Senior Game AI Programmer

Meta4 Interactive

Montreal, Quebec, Canada (Remote)

• 11 Months ago

Security Engineer (Penetration Tester) - Security Assurance

ByteDance

Singapore (On-Site)

• 8 Months ago

Senior ML Developer

Epic Games

Montreal, Quebec, Canada (On-Site)

• 6 Months ago

SENIOR С++ SOFTWARE ENGINEER

Equivalent Jobs

(Remote)

• 8 Months ago

Senior Game Programmer Unreal Engine

Old Skull Games

Villeurbanne, Auvergne-Rhône-Alpes, France (On-Site)

• 7 Months ago

Technical Support Analyst - 8034

ION

Hong Kong (On-Site)

• 9 Months ago

Software Engineer, Product

Jobs in Palo Alto, California, United States

Associate, Production Finance, Nonfiction

Netflix

Los Angeles, California, United States (On-Site)

• 5 Months ago

Senior Global Business Development Manager – AI Cloud and Edge Services

NVIDIA

Santa Clara, California, United States (On-Site)

• 5 Months ago

Intern – CPU Debugger Software Engineer (NTD)

Nintendo

Redmond, Washington, United States (On-Site)

• 7 Months ago

Manager, International Brand Marketing

Onward Search

New York, New York, United States (Hybrid)

• 4 Months ago

Guest Experience Manager - Housekeeping

The Walt Disney Company

Anaheim, California, United States (On-Site)

• 5 Months ago

Workplace Experience Coordinator

CD PROJEKT RED

Boston, Massachusetts, United States (On-Site)

• 4 Months ago

Freelance Pokemon Writer

GAMURS Group

United States (Remote)

• 6 Months ago

Senior Manager, Corporate Accounting

Canva

San Francisco, California, United States (Remote)

• 4 Months ago

Senior Environment Artist

Pipeworks

Eugene, Oregon, United States (Remote)

• 5 Months ago

Product Designer

Fliff Inc

Austin, Texas, United States (On-Site)

• 11 Months ago

Get notifed when new similar jobs are uploaded

Research Development Jobs

Researcher II - VALORANT, Core Game

Riot Games

Los Angeles, California, United States (On-Site)

• 5 Months ago

Senior Software Development Engineer

Trend Micro

Manila, Metro Manila, Philippines (Hybrid)

• 9 Months ago

Senior Verification Engineer - Hardware

NVIDIA

Santa Clara, California, United States (On-Site)

• 4 Months ago

Cloud Native Engineer, ARK Large Model Platform (Singapore)

ByteDance

Singapore (On-Site)

• 8 Months ago

Research Engineer

Machine Learning Research Scientist, AI for Science

ByteDance

Seattle, Washington, United States (On-Site)

• 7 Months ago

Research Scientist, Efficient Deep Learning - New College Grad 2025

NVIDIA

Santa Clara, California, United States (On-Site)

• 6 Months ago

CPU Architecture Intern - 2025

NVIDIA

Shanghai, Shanghai, China (On-Site)

• 6 Months ago

Senior Firmware Engineer

Microsoft

Bengaluru, Karnataka, India (On-Site)

• 6 Months ago

SOC Physical Design Verification Engineer - Full Time

Rivos

Hsinchu, Hsinchu City, Taiwan (Hybrid)

• 9 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Pika

13 Active Jobs

We're revolutionizing video creation to be seamless and accessible for everyone. Our small, dynamic team values efficiency, collaboration, and creativity. We're seeking individuals with intellect, work ethic, and ambition to join us in breaking down barriers and making a real impact. Experience career growth in a culture that fosters both individual and collective development.Our headquarters are located in Palo Alto, CA.

Get notified when new jobs are added by Pika

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Senior Distributed Systems Engineer

Job Summary

Job Description

7 skills required

7 skills required for this role

Job Details

Similar Jobs

Senior Audio Programmer

Senior Technical Consultant - Endur

Software Engineer 2

Senior Generalist Programmer - Montreal

Go Lang C++ Developer

Compensation Partner II

Software Systems Engineer - Software Health and Complexity

Research Scientist Intern, Photorealistic Telepresence (PhD)

Senior Thermal Design Engineer

Advanced Analog SW Developer - Intern

Similar Skill Jobs

Site Reliability Engineer [Game Security]

Core Software Engineer

Ad Delivery Algorithm Intern - Game

Senior Game AI Programmer

Security Engineer (Penetration Tester) - Security Assurance

Senior ML Developer

SENIOR С++ SOFTWARE ENGINEER

Senior Game Programmer Unreal Engine

Technical Support Analyst - 8034

Software Engineer, Product

Jobs in Palo Alto, California, United States

Associate, Production Finance, Nonfiction

Senior Global Business Development Manager – AI Cloud and Edge Services

Intern – CPU Debugger Software Engineer (NTD)

Manager, International Brand Marketing

Guest Experience Manager - Housekeeping

Workplace Experience Coordinator

Freelance Pokemon Writer

Senior Manager, Corporate Accounting

Senior Environment Artist

Product Designer

Research Development Jobs

Researcher II - VALORANT, Core Game

Senior Software Development Engineer

Senior Verification Engineer - Hardware

Cloud Native Engineer, ARK Large Model Platform (Singapore)

Research Engineer

Machine Learning Research Scientist, AI for Science

Research Scientist, Efficient Deep Learning - New College Grad 2025

CPU Architecture Intern - 2025

Senior Firmware Engineer

SOC Physical Design Verification Engineer - Full Time

About The Company

Applied Research Scientist

Software Engineer, iOS

Software Engineer, Android

Founding Product Designer

Research Engineer (Applied Research)

Summer Research Internship

Frontend Engineer

Research Scientist

Research Engineer (Applied Research) New Grad

Product Designer

Level Up Your Career in Game Development!