Runtime Engineer

1 Hour ago • 3 Years +
Software Development & Engineering

Job Description

You will develop high-performance code for a distributed system involving the CS-2 and heterogeneous servers. This includes feeding model and training data in the Weight Streaming regime and CPU-side processing. The role demands optimized data structures and algorithms to leverage hardware resources like CPU, memory, storage, and network bandwidth. The software must support a high degree of concurrency across threads, processes, cores, and systems, and is crucial for Cerebras' Weight Streaming architecture, scaling to neural networks with over 100T parameters.
Good To Have:
  • Knowledge of basic compiler internals.
  • Experience with distributed systems and protocol design.
  • Python knowledge, especially in Machine Learning contexts.
Must Have:
  • Develop high-performance code for a distributed system with CS-2 and multiple heterogeneous servers.
  • Feed model and training data to the CS-2 in the Weight Streaming regime and perform CPU-side processing.
  • Optimize data structures and algorithms to fully utilize CPU, memory, storage, and network bandwidth.
  • Build software with a high degree of concurrency across threads, processes, cores, and systems.
  • Strong low-level programming skills.
  • Track record of working on large, complex system software.
  • Experience with projects involving custom hardware where software unlocks its potential.
  • 3+ years of experience as an engineer.
Perks:
  • Opportunity to build a breakthrough AI platform beyond GPU constraints.
  • Publish and open source cutting-edge AI research.
  • Work on one of the fastest AI supercomputers in the world.
  • Enjoy job stability with startup vitality.
  • Simple, non-corporate work culture that respects individual beliefs.

Add these skills to join the top 1% applicants for this job

cpp
data-structures
game-texts
neural-networks
python
algorithms
machine-learning

builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Current customers include global corporations across multiple industries, national labs, and top-tier healthcare systems. In January, we announced a multi-year, multi-million-dollar partnership with Mayo Clinic, underscoring our commitment to transforming AI applications across various fields. In August, we launched Cerebras Inference, the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services.

About the Role

You will develop high-performance code running the distributed system consisting of the CS-2 and multiple heterogenous servers. Feeding model and training data to the CS-2 in the Weight Streaming regime, as well as CPU-side processing of this data, is a huge challenge that requires optimized data structures and algorithms that take full advantage of the available hardware resources, including CPU, memory, storage, and network bandwidth. The software must be built with a high degree of concurrency across threads, processes, cores, and systems. It is central to the Weight Streaming architecture which scales to brain-sized neural networks with over 100T parameters.

Skills And Qualifications

  • Experienced engineer with 3+ years of experience.
  • Strong low-level programming: C/C++, multi-threading, performance optimization, exposure to Assembly-level programming.
  • Assets: Knowledge of basic compiler internals, experience with distributed systems and protocol design, Python knowledge (especially in ML contexts).
  • Track record of working on large, complex system software.
  • Prior projects where custom HW is given, write SW to unlock potential of HW.

Why Join

People who are serious about software make their own hardware. At we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined:

1. Build a breakthrough AI platform beyond the constraints of the GPU.

2. Publish and open source their cutting-edge AI research.

3. Work on one of the fastest AI supercomputers in the world.

4. Enjoy job stability with startup vitality.

5. Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join in 2025.

Apply today and become part of the forefront of groundbreaking advancements in AI!

  • * *

is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

  • * *

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice._

Set alerts for more jobs like Runtime Engineer
Set alerts for new jobs by Cerebras Systems
Set alerts for new Software Development & Engineering jobs in Canada
Set alerts for new jobs in Canada
Set alerts for Software Development & Engineering (Remote) jobs
Contact Us
hello@outscal.com
Made in INDIA 💛💙