Member of Technical Staff - GPU Infrastructure

23 Minutes ago • All levels
Graphics Engineer

Job Description

Reflection's mission is to build open superintelligence accessible to all, developing open-weight models. This role involves designing, building, and operating large-scale GPU infrastructure for pre-training, post-training, and inference. Key responsibilities include developing reliable, high-performance systems for scheduling, orchestration, and observability, optimizing cluster utilization, and building tools for distributed training and monitoring. Collaboration with research and platform teams is essential to accelerate development and push hardware limits.
Must Have:
  • Design, build, and operate Reflection’s large-scale GPU infrastructure.
  • Develop reliable, high-performance systems for scheduling, orchestration, and observability.
  • Optimize cluster utilization, throughput, and cost efficiency.
  • Build tools and automation for distributed training, inference, monitoring, and experiment management.
  • Collaborate closely with research, training, and platform teams.
  • Deep systems or infrastructure engineering experience in high-performance or distributed computing environments.
  • Strong understanding of GPUs, CUDA, NCCL, and large-scale training and inference frameworks and libraries (PyTorch, DeepSpeed, JAX, Megatron-LM, SGLang, vLLM, etc.).
  • Hands-on experience with containerization, orchestration, and cluster management (Kubernetes, Slurm, or similar).
  • Familiar with modern observability stacks and performance profiling tools.
Perks:
  • Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally.
  • Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance.
  • Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys.
  • Financial support for family planning.
  • Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time.
  • Opportunities to connect with teammates: lunch and dinner are provided daily.
  • Regular off-sites and team celebrations.

Add these skills to join the top 1% applicants for this job

game-texts
cuda
networking
pytorch
kubernetes

Our Mission

Reflection’s mission is to build open superintelligence and make it accessible to all.

We’re developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond.

About the Role

  • Design, build, and operate Reflection’s large-scale GPU infrastructure powering pre-training, post-training, and inference.
  • Develop reliable, high-performance systems for scheduling, orchestration, and observability across thousands of GPUs.
  • Optimize cluster utilization, throughput, and cost efficiency while maintaining reliability at scale.
  • Build tools and automation for distributed training, inference, monitoring, and experiment management.
  • Collaborate closely with research, training, and platform teams to accelerate development and enable large-scale training and inference.
  • Push the limits of hardware, networking, and software to accelerate the path from idea to model.

About You

  • Deep systems or infrastructure engineering experience in high-performance or distributed computing environments.
  • Strong understanding of GPUs, CUDA, NCCL, and large-scale training and inference frameworks and libraries (PyTorch, DeepSpeed, JAX, Megatron-LM, SGLang, vLLM, etc.).
  • Hands-on experience with containerization, orchestration, and cluster management (Kubernetes, Slurm, or similar).
  • Familiar with modern observability stacks and performance profiling tools.
  • High agency and the ability to thrive in a fast-paced, high-ownership startup environment.
  • Excited to build from zero to one defining how frontier-scale training/RL infrastructure is architected and operated.
  • Motivated by enabling researchers and engineers to build the world’s most capable open-weight AI systems.

What We Offer:

We believe that to build superintelligence that is truly open, you need to start at the foundation. Joining Reflection means building from the ground up as part of a small talent-dense team. You will help define our future as a company, and help define the frontier of open foundational models.

We want you to do the most impactful work of your career with the confidence that you and the people you care about most are supported.

  • Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally.
  • Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance.
  • Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys. Financial support for family planning.
  • Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time.
  • Opportunities to connect with teammates: lunch and dinner are provided daily. We have regular off-sites and team celebrations.

Set alerts for more jobs like Member of Technical Staff - GPU Infrastructure
Set alerts for new jobs by Reflection AI
Set alerts for new Graphics Engineer jobs in United States
Set alerts for new jobs in United States
Set alerts for Graphics Engineer (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙