Machine Learning Engineer

6 Months ago • All levels

Devops

Job Description

Hedra seeks an ML Engineer expert in high-performance computing to manage and optimize the computational infrastructure for training and deploying machine learning models. Responsibilities include designing scalable computing solutions for training and deploying ML models handling large video datasets, managing and optimizing computing clusters (AWS/Google Cloud), ensuring infrastructure handles resource-intensive tasks associated with training large generative models, monitoring system performance and implementing improvements (using Kubeflow), and collaborating with the team to understand computational needs and provide solutions. The role focuses on deploying and scaling video generation models using 3DVAE and video diffusion models.

Must Have:

Experience with cloud platforms (AWS, GCP, Azure)
Knowledge of Docker, Kubeflow
Understanding of distributed training
Proficiency in Python or Bash
System administration background
Scalable solutions for ML model training and deployment

Perks:

Competitive compensation and equity
401k
Healthcare (Silver PPO Medical, Vision, Dental)
Lunch and snacks at the office

Add these skills to join the top 1% applicants for this job

bash

microsoft-azure

amazon-web-services

azure

aws

python

model-deployment

foundation

communication

scalability

game-texts

Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures. We're building Hedra Studio, a multimodal creation platform capable of control, emotion, and creative intelligence.

At the core of Hedra Studio is our Character-3 foundation model, the first omnimodal model in production. Character-3 jointly reasons across image, text, and audio for more intelligent video generation — it’s the next evolution of AI-driven content creation.

Note: At Hedra, we’re a team of hard-working, passionate individuals seeking to fundamentally change content and build a generational company together. You should have start-up experience and be a self-starter that is driven to build impactful products that change the status quo. You must be willing to work in-person in either NYC or SF.

Overview:

We are looking for an ML Engineer with expertise in high-performance computing systems to manage and optimize our computational infrastructure for training and deploying our machine learning models. The ideal candidate will have experience with cloud computing platforms and tools for managing ML workloads at scale, supporting our 3DVAE and video diffusion models.

Responsibilities:

Design and implement scalable computing solutions for training and deploying ML models, ensuring infrastructure can handle large video datasets.
Manage and optimize the performance of our computing clusters or cloud instances, such as AWS or Google Cloud, to support distributed training.
Ensure that our infrastructure can handle the resource-intensive tasks associated with training large generative models.
Monitor system performance and implement improvements to maximize efficiency, using tools like Kubeflow for orchestration.
Collaborate with the team to understand their computational needs and provide appropriate solutions, facilitating seamless model deployment.

Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or a related field, with a focus on system administration.
Experience with cloud computing platforms such as Amazon Web Services, Google Cloud, or Microsoft Azure, essential for managing large-scale ML workloads.
Knowledge of containerization tools like Dockerfile and orchestration tools like Kubeflow, crucial for deploying models at scale.
Understanding of distributed training techniques and how to scale models across multiple GPUs or machines, aligning with video generation needs.
Proficiency in scripting languages like Python or Bash for automation tasks, facilitating infrastructure management.
Strong problem-solving and communication skills, given the need to collaborate with diverse teams.

This role is vital for ensuring the computational backbone supports the company’s ML efforts, focusing on deployment and scalability.

Benefits:

Competitive compensation and equity
401k (no match)
Healthcare (Silver PPO Medical, Vision, Dental)
Lunch and snacks at the office

We encourage you to apply even if you don't fully meet all the listed requirements; we value potential and diverse perspectives, and your unique skills could be a great asset to our team.

Set alerts for more jobs like Machine Learning Engineer

Set alerts for new jobs by Hedra

Set alerts for new Devops jobs in United States

Set alerts for new jobs in United States

Set alerts for Devops (Remote) jobs