Machine Learning Engineer

1 Month ago • All levels • DevOps

Job Summary

Job Description

Hedra seeks an ML Engineer to manage and optimize computational infrastructure for training and deploying machine learning models, focusing on large video datasets. Responsibilities include designing scalable computing solutions, managing cloud instances (AWS or Google Cloud), ensuring infrastructure handles resource-intensive tasks, monitoring system performance, and collaborating with the team. The ideal candidate will have experience with distributed training, containerization (Docker), orchestration (Kubeflow), and scripting (Python/Bash). The role is crucial for supporting the company's ML efforts, concentrating on deployment and scalability of 3DVAE and video diffusion models.
Must have:
  • Experience with cloud platforms (AWS, GCP)
  • Knowledge of Docker and Kubeflow
  • Understanding of distributed training
  • Proficiency in Python/Bash
  • System administration experience
  • Scalable computing solutions design
Perks:
  • Competitive compensation and equity
  • 401k
  • Healthcare (Silver PPO Medical, Vision, Dental)
  • Lunch and snacks at the office

Job Details

Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures. We're building Hedra Studio, a multimodal creation platform capable of control, emotion, and creative intelligence.

At the core of Hedra Studio is our Character-3 foundation model, the first omnimodal model in production. Character-3 jointly reasons across image, text, and audio for more intelligent video generation — it’s the next evolution of AI-driven content creation.

Note: At Hedra, we’re a team of hard-working, passionate individuals seeking to fundamentally change content and build a generational company together. You should have start-up experience and be a self-starter that is driven to build impactful products that change the status quo. You must be willing to work in-person in either NYC or SF.

Overview:

We are looking for an ML Engineer with expertise in high-performance computing systems to manage and optimize our computational infrastructure for training and deploying our machine learning models. The ideal candidate will have experience with cloud computing platforms and tools for managing ML workloads at scale, supporting our 3DVAE and video diffusion models.

Responsibilities:

  • Design and implement scalable computing solutions for training and deploying ML models, ensuring infrastructure can handle large video datasets.

  • Manage and optimize the performance of our computing clusters or cloud instances, such as AWS or Google Cloud, to support distributed training.

  • Ensure that our infrastructure can handle the resource-intensive tasks associated with training large generative models.

  • Monitor system performance and implement improvements to maximize efficiency, using tools like Kubeflow for orchestration.

  • Collaborate with the team to understand their computational needs and provide appropriate solutions, facilitating seamless model deployment.

Qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, or a related field, with a focus on system administration.

  • Experience with cloud computing platforms such as Amazon Web Services, Google Cloud, or Microsoft Azure, essential for managing large-scale ML workloads.

  • Knowledge of containerization tools like Dockerfile and orchestration tools like Kubeflow, crucial for deploying models at scale.

  • Understanding of distributed training techniques and how to scale models across multiple GPUs or machines, aligning with video generation needs.

  • Proficiency in scripting languages like Python or Bash for automation tasks, facilitating infrastructure management.

  • Strong problem-solving and communication skills, given the need to collaborate with diverse teams.

This role is vital for ensuring the computational backbone supports the company’s ML efforts, focusing on deployment and scalability.

Benefits:

  • Competitive compensation and equity

  • 401k (no match)

  • Healthcare (Silver PPO Medical, Vision, Dental)

  • Lunch and snacks at the office

We encourage you to apply even if you don't fully meet all the listed requirements; we value potential and diverse perspectives, and your unique skills could be a great asset to our team.

Similar Jobs

Nintendo - DevOps Engineer

Nintendo

Redmond, Washington, United States (On-Site)
3 Months ago
Gaming Innovation Group  - System Administrator

Gaming Innovation Group

Sliema, Malta (Hybrid)
2 Months ago
Rackspace Technology - ML/LLM Ops Intern

Rackspace Technology

Mexico City, Mexico City, Mexico (Remote)
2 Months ago
ION - Markets Product Security Engineer - UK

ION

London, England, United Kingdom (On-Site)
6 Months ago
PlayStation Global - Staff Service Reliability Engineer

PlayStation Global

Berlin, Berlin, Germany (On-Site)
6 Months ago
Ness Digital - DevOps Engineer

Ness Digital

Timișoara, Timiș, Romania (Hybrid)
3 Months ago
Interactive Brokers - Senior Systems Engineer- Microsoft M365/Active Directory

Interactive Brokers

Chicago, Illinois, United States (Hybrid)
6 Months ago
Dream Sports - SDE - 1 - DevOps

Dream Sports

Mumbai, Maharashtra, India (On-Site)
5 Months ago
The Walt Disney Company - Director, Tech Operations

The Walt Disney Company

Orlando, Florida, United States (On-Site)
2 Months ago
Nielsen Holdings - Software Engineer ( Java , Python , SQL , AWS / Oracle)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
5 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

E-Hireo - Cloud Engineer

E-Hireo

Bengaluru, Karnataka, India (On-Site)
6 Months ago
Rackspace Technology - Lead Cloud Engineer

Rackspace Technology

United States (Remote)
1 Month ago
Keywords Studios (Player Support) - Solutions Architect

Keywords Studios (Player Support)

Montréal, Québec, Canada (Remote)
4 Months ago
The Walt Disney Company - Media Engineer II

The Walt Disney Company

Charlotte, North Carolina, United States (On-Site)
4 Months ago
Fractal - DevOps - Lead

Fractal

Mumbai, Maharashtra, India (On-Site)
5 Months ago
Ajmera Infotech - Site Reliability Engineer - Kubernetes

Ajmera Infotech

San Jose, California, United States (On-Site)
2 Months ago
Blazesoft - DevOps engineer

Blazesoft

Vaughan, Ontario, Canada (On-Site)
4 Months ago
The Walt Disney Company - Sr Streaming Media Engineer

The Walt Disney Company

New York, New York, United States (On-Site)
1 Month ago
Rackspace Technology - L3 Support Engineer (Windows/Linux on AWS)

Rackspace Technology

India (Remote)
1 Month ago
ION - Cloud Engineer Kubernetes

ION

Castellazzo Bormida, Piedmont, Italy (Hybrid)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

Crunchyroll - Customer Experience Operations Analyst

Crunchyroll

Culver City, California, United States (On-Site)
2 Months ago
Next Level Business Services - Salesforce Technical Lead

Next Level Business Services

Dallas, Texas, United States (On-Site)
5 Months ago
The Walt Disney Company - Senior Digital Product Manager

The Walt Disney Company

Burbank, California, United States (On-Site)
2 Months ago
Pocket Worlds - Senior Recruiter

Pocket Worlds

Texas, United States (On-Site)
1 Month ago
The Walt Disney Company - Security Specialist, Corrective Action

The Walt Disney Company

Burbank, California, United States (On-Site)
1 Month ago
PTW - Contract Generalist Programmer - Unreal Engine (Remote)

PTW

Fort Lauderdale, Florida, United States (Remote)
6 Months ago
Next Level Business Services - DevOps Consultant

Next Level Business Services

San Diego, California, United States (On-Site)
5 Months ago
Ziff Davis - Group Product Manager

Ziff Davis

United States (Remote)
4 Months ago
ByteDance - Backend Software Engineer - Global E-Commerce Supply Chain Inventory

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago
CD PROJEKT RED - Expert VFX Artist

CD PROJEKT RED

Boston, Massachusetts, United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

The Walt Disney Company - Senior Manager, Storage Systems Engineering

The Walt Disney Company

New York, New York, United States (On-Site)
3 Months ago
ION - Cloud Engineer/Architect (DevOps)

ION

London, England, United Kingdom (On-Site)
6 Months ago
PwC - Power Platform Developer Associate

PwC

Milan, Lombardy, Italy (On-Site)
2 Months ago
Hashlist - Senior Data Engineer

Hashlist

Pune, Maharashtra, India (Hybrid)
5 Months ago
Guardian Life - TechOps Engineer

Guardian Life

Gurugram, Haryana, India (On-Site)
6 Months ago
Keywords Studios (Player Support) - Architecte de solutions

Keywords Studios (Player Support)

Montréal, Québec, Canada (Remote)
4 Months ago
Assystems - DevOps Engineer

Assystems

Gurugram, Haryana, India (On-Site)
5 Months ago
The Walt Disney Company - Manager, Software Engineering

The Walt Disney Company

Washington, United States (On-Site)
1 Month ago
Immutable - Senior Site Reliability Engineer

Immutable

Sydney, New South Wales, Australia (Hybrid)
5 Months ago
Flexera - Senior Site Reliability Engineer

Flexera

Bengaluru, Karnataka, India (Hybrid)
6 Months ago

Get notifed when new similar jobs are uploaded

About The Company

We are a creation lab building foundation models into products that power the next generation of human storytelling

San Francisco, California, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

New York, New York, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Hedra

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug