Machine Learning Engineer - Model Training Infrastructure

3 Months ago • 5 Years + • Devops • $334,000 PA - $435,000 PA

Job Summary

Job Description

The Machine Learning Engineer will be responsible for designing and implementing a global-scale machine learning system for feeds, ads, and search ranking models. The role involves improving the usability and flexibility of the machine learning infrastructure, enhancing model training and serving workflows, data pipelines, storage systems, and resource management for multi-tenancy machine learning systems. The engineer will also design and develop key components of ML infrastructure, mentor interns, and contribute to the overall advancement of the company's AI infrastructure and recommendation platform. This role demands a strong understanding of large-scale system development and experience with deep learning frameworks and core machine learning infrastructure.
Must have:
  • 5+ years of experience in developing and deploying large-scale systems.
  • Proficiency in C/C++/CUDA/Python and solid programming skills.
  • Familiarity with deep learning frameworks (TensorFlow/Pytorch).
Good to have:
  • Experience contributing to an open-sourced machine learning framework (TensorFlow/PyTorch).
  • Experience in using/designing open-source machine learning lifecycle management systems: TFX
Perks:
  • Day one access to medical, dental, and vision insurance.
  • 401(k) savings plan with company match.
  • Paid parental leave.
  • Short-term and long-term disability coverage.
  • Life insurance.
  • Wellbeing benefits.
  • 10 paid holidays per year.
  • 10 paid sick days per year.
  • 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure).

Job Details

The mission of our AML team is to push the next-generation AI infrastructure and recommendation platform for the ads ranking, search ranking, live & ecom ranking in our company. We also drive substantial impact on core businesses of the company. Currently, we are looking for Machine Learning Engineer in Model Training Infrastructure to join our team to support and advance that mission. Responsibilities: - Responsible for the design and implementation of a global-scale machine learning system for feeds, ads and search ranking models. - Responsible for improving use-ability and flexibility of the machine learning infrastructure. - Responsible for improving the workflow of model training and serving, data pipelines, storage system and resource management for multi-tenancy machine learning systems. - Responsible for designing and developing key components of ML infrastructure and mentoring interns.
Qualifications
Minimum Qualifications - At least 5 years of experience in developing and deploying large-scale systems. - Proficient in C/C++/CUDA/Python, and have solid programming skills. - Familiar with deep learning frameworks (TensorFlow/Pytorch). - Experience on improving core machine learning infrastructure(TensorFlow, Pytorch, and Jax). Preferred Qualifications: - Experience contributing to an open sourced machine learning framework (TensorFlow/PyTorch). - Experience in using/designing open-source machine learning lifecycle management systems: TFX

Similar Jobs

appier - Senior Software Engineer, Backend Development

appier

Taipei City, Taiwan (On-Site)
2 Months ago
Qualcomm - Machine Learning Compiler / Firmware Engineer

Qualcomm

Austin, Texas, United States (On-Site)
2 Months ago
bytedance - Tech Lead Manager, Enterprise Solution

bytedance

San Jose, California, United States (On-Site)
3 Months ago
N-ix - Senior Qt Engineer

N-ix

Ukraine (Remote)
1 Month ago
PwC - Senior Manager, Azure Data Architect, Data Analytics, Advisory, Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
1 Year ago
Figma - Software Engineer, Infrastructure

Figma

San Francisco, California, United States (Remote)
2 Weeks ago
dbt Labs - Solutions Architect, Commercial (Portuguese Speaking)

dbt Labs

Austin, Texas, United States (On-Site)
3 Weeks ago
GoTo Group - Senior Software Engineer - Event Platform

GoTo Group

Bengaluru, Karnataka, India (On-Site)
9 Months ago
TechVedika - Senior Cloud Engineer/Devops Engineer

TechVedika

Hyderabad, Telangana, India (On-Site)
5 Months ago
ARHS - AWS or Azure Cloud Architect

ARHS

Luxembourg (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Qualcomm - Senior Engineer - Windows/Linux kernel & Driver Development

Qualcomm

Hyderabad, Telangana, India (On-Site)
2 Months ago
Scanline VFX - Senior Compositor

Scanline VFX

Seoul, South Korea (Hybrid)
2 Months ago
Cadence - Software Engineer II

Cadence

San Jose, California, United States (On-Site)
3 Weeks ago
CyberArk - Senior C++ Windows Engineer

CyberArk

Israel (Hybrid)
1 Month ago
IMC - FPGA Engineer

IMC

Sydney, New South Wales, Australia (On-Site)
3 Months ago
jetbrains - Game Development Product Manager

jetbrains

Yerevan, Yerevan, Armenia (On-Site)
2 Months ago
bounteous - Senior iOS Developer

bounteous

Calgary, Alberta, Canada (Hybrid)
2 Months ago
Aristocrat - Sr Engineer II - Fullstack (Typescript + Java)

Aristocrat

Noida, Uttar Pradesh, India (Hybrid)
1 Month ago
rivos - Accelerator DV Testgen

rivos

Santa Clara, California, United States (Hybrid)
1 Year ago
skybox labs  - Lighting Artist

skybox labs

Burnaby, British Columbia, Canada (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Jose, California, United States

Safe security - FAIR Enablement Specialist

Safe security

New York, United States (On-Site)
1 Month ago
HappyRobot - Head of Finance

HappyRobot

San Francisco, California, United States (On-Site)
2 Weeks ago
Google - Lead CPU RTL Engineer, Silicon

Google

Austin, Texas, United States (On-Site)
3 Months ago
CO:Create - Artist Solutions Lead

CO:Create

New York, United States (Remote)
1 Month ago
Apple - RFIC Layout Engineer

Apple

Irvine, California, United States (On-Site)
1 Month ago
HCL Tech - Senior Project Manager

HCL Tech

Ohio, United States (On-Site)
2 Months ago
EvenUp - Customer Success Manager, SMB

EvenUp

United States (Remote)
2 Weeks ago
WebMD - Implementation Manager

WebMD

Newark, New Jersey, United States (On-Site)
3 Months ago
Gusto - Staff Infrastructure Engineer (DataStores)

Gusto

United States (Hybrid)
2 Weeks ago
Nintendo - UX Designer

Nintendo

Redmond, Washington, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

NVIDIA - Senior Solutions Architect, Omniverse Platform

NVIDIA

Shanghai, Shanghai, China (On-Site)
6 Months ago
Blacksahrk - Senior DevOps Engineer

Blacksahrk

Graz, Styria, Austria (On-Site)
3 Months ago
Sailpoint - Senior Staff DevOps Engineer

Sailpoint

Austin, Texas, United States (On-Site)
2 Months ago
Autodesk - Full-stack Cloud Software Development Engineer

Autodesk

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
2 Months ago
Apple - Cloud Infrastructure Software Developer

Apple

Seattle, Washington, United States (On-Site)
3 Months ago
Google - Software Engineer III, Infrastructure, Google Cloud Compute Infrastructure

Google

Sunnyvale, California, United States (On-Site)
3 Months ago
Sailpoint - Manager, DevOps (AWS Infrastructure)

Sailpoint

Austin, Texas, United States (On-Site)
2 Months ago
Perplexity - Infrastructure Capacity Engineer

Perplexity

Palo Alto, California, United States (On-Site)
2 Weeks ago
Ajmera Infotech - .NET Developer with Cloud Expertise (On-site only)

Ajmera Infotech

Hyderabad, Telangana, India (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.
View All Jobs

Get notified when new jobs are added by bytedance