Research Engineer - Midtraining

Periodic Labs

| Menlo Park, CA, United States (Remote) | Full Time | 3 weeks ago

Apply Now

Job Summary

Periodic Labs is an AI + physical sciences lab building state-of-the-art models for novel scientific discoveries. The company is well-funded and growing rapidly, fostering an environment where team members are empowered to solve problems. This role involves training frontier models to be highly knowledgeable scientific experts, developing methods for synthetic data generation, distillation, and continual learning at scale. You will collaborate with RL researchers, physicists, chemists, and supercompute engineers to create evaluations and scale LLM training to thousands of GPUs, building high-performance tools to investigate how data shapes intelligence.

Must Have

Train frontier models to be highly knowledgeable scientific experts that serve as the foundation for reinforcement learning.
Develop methods for synthetic data generation, distillation, and continual learning at scale.
Work closely with RL researchers, physicists, and chemists to create evals that guide scientific data curation.
Collaborate with supercompute engineers to scale compute-efficient LLM training to thousands of GPUs.
Build high-performance tools for yourself to investigate how data shapes intelligence.

Good to Have

Experience training LLMs on curated mixes of trillions of tokens.
Experience calculating scaling laws and compute-optimal hyperparameters.
Experience generating billions of tokens of high-quality synthetic data.
Experience building evals that correlate with downstream task performance.

Job Description

About Periodic Labs

We are an AI + physical sciences lab building state of the art models to make novel scientific discoveries. We are well funded and growing rapidly. Team members are owners who identify and solve problems without boundaries or bureaucracy. We eagerly learn new tools and new science to push forward our mission.

About the role

You will train frontier models to be highly knowledgeable scientific experts that serve as the foundation for reinforcement learning. You will develop methods for synthetic data generation, distillation, and continual learning at scale. You will work closely with RL researchers, physicists, and chemists to create evals that guide scientific data curation. You will collaborate with supercompute engineers to scale compute-efficient LLM training to thousands of GPUs. You will build high-performance tools for yourself to investigate how data shapes intelligence.

You might thrive in this role if you have experience with:

Training LLMs on curated mixes of trillions of tokens
Calculating scaling laws and compute-optimal hyperparameters
Generating billions of tokens of high-quality synthetic data
Building evals that correlate with downstream task performance

2 Skills Required For This Role

Game Texts Reinforcement Learning

Similar Jobs