Jobs Courses Resources Companies Placements

Home >

Jobs >

Senior Research Scientist/Engineer - AI Infrastructure

Bytedance

California, United States (On-site)

Senior Research Scientist/Engineer - AI Infrastructure

1 Month ago • All levels • Devops • $184,000 PA - $337,000 PA

Job Summary

Job Description

The job involves designing, building, and maintaining robust AI infrastructure for training and serving large ML workloads. Responsibilities include designing scalable architectures, optimizing performance, building distributed systems, managing data pipelines, and collaborating with researchers. The ideal candidate should be proficient in infrastructure design, performance optimization, distributed systems, and data pipeline engineering. The role requires leading design, implementing service-oriented architectures, profiling and optimizing the ML stack, building and operating large-scale deployment systems, architecting data ingestion pipelines, and integrating experiment management tools. The candidate will also mentor engineers on best practices.

Must have:

Expertise in infrastructure design and architecture.
Experience in performance optimization.
Skills in building distributed systems and scalability.
Knowledge of data pipeline and workflow engineering.

Perks:

Medical, dental, and vision insurance.
401(k) savings plan with company match.
Paid parental leave.
Short-term and long-term disability coverage.
Life insurance.
Wellbeing benefits.
10 paid holidays per year.
10 paid sick days per year.
17 days of Paid Personal Time (prorated).

6 skills required

6 skills required for this role

Add these skills to join the top 1% applicants for this job

networking

spark

kubernetes

neural-networks

system-design

machine-learning

Job Details

Team Introduction: The infra4AI Research and Architecture Team is responsible for the foundational hardware and software systems specifically engineered to support the demanding and often experimental workloads of developing new artificial intelligence models and systems. It serves as the bedrock upon which researchers and engineers create, train, test, and iterate on novel AI architectures, from large language models (LLMs) to specialized neural networks. We are seeking a highly skilled and motivated AI Infrastructure Researchers and Engineers to join our dynamic team. In this role, you will be responsible for designing, building, deploying, and maintaining the robust and scalable infrastructure that powers our cutting-edge artificial intelligence (AI) and machine learning (ML) initiatives. You will work closely with our AI/ML researchers, data scientists, and software engineers to create an efficient, high-performance environment for training, inference, and data processing. Your expertise will be critical in enabling the next generation of AI-driven products and services. Responsibilities The ideal candidate should be an expert in at least one of the following fields to define and design the next-gen AI Infrastructure: - Infrastructure Design & Architecture - Lead end-to-end design of scalable, reliable AI infrastructure (AI accelerators, compute clusters, storage, networking) for training and serving large ML workloads. - Define and implement service-oriented, containerized architectures (Kubernetes, VM frameworks, unikernels) optimized for ML performance and security. - Performance Optimization - Profile and optimize every layer of the ML stack—ML Compiler, GPU/TPU scheduling, NCCL/RDMA networking, data preprocessing, and training/inference frameworks. - Develop low-overhead telemetry and benchmarking frameworks to identify and eliminate bottlenecks in distributed training and serving. - Distributed Systems & Scalability - Build and operate large-scale deployment and orchestration systems that auto-scale across multiple data centers (on-premises and cloud). - Champion fault-tolerance, high availability, and cost-efficiency through smart resource management and workload placement. - Data Pipeline & Workflow Engineering - Architect and implement robust ETL and data ingestion pipelines (Spark/Beam/Dask/Flume) tailored for petabyte-scale ML datasets. - Integrate experiment management and workflow orchestration tools (Airflow, Kubeflow, Metaflow) to streamline research-to-production. - Collaboration & Mentorship - Partner with ML researchers to translate prototype requirements into production-grade systems. - Mentor and coach engineers on best practices in performance tuning, systems design, and reliability engineering.

Similar Jobs

Senior Infrastructure Engineer (OpenSearch)

Workato

Bengaluru, Karnataka, India (On-Site)

• 1 Month ago

Enterprise Customer Success Manager

Canonical

(Remote)

• 1 Month ago

Systems and Solutions Architect

Intel

Santa Clara, California, United States (On-Site)

• 1 Year ago

Fine Tuning Specialist - Medical Device

BioFire

Morrisville, North Carolina, United States (On-Site)

• 1 Month ago

Business Advisor

Capgemini

Noida, Uttar Pradesh, India (On-Site)

• 2 Months ago

Salesforce Solution Architect

Next Level Business Services

Diamond Bar, California, United States (On-Site)

• 8 Months ago

Software Engineer III, Infrastructure, Google Cloud AI

Google

Kirkland, Washington, United States (On-Site)

• 8 Months ago

Senior Front End infrastructure Engineer

Tel Aviv-Yafo, Tel Aviv District, Israel (Hybrid)

• 1 Month ago

Senior DevOps Engineer

Turbulent

Montreal, Quebec, Canada (On-Site)

• 3 Months ago

Platform Engineer III - MongoDB

smarsh

Belfast, Northern Ireland, United Kingdom (Remote)

• 5 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Software Engineer III, Google Cloud Global Networking

Google

Atlanta, Georgia, United States (On-Site)

• 5 Days ago

Senior/Expert Online Infrastructure Engineer

Ubisoft

Malmö, Skåne County, Sweden (Hybrid)

• 2 Days ago

Business Development Manager/Sales

Remote control productions

Munich, Bavaria, Germany (Hybrid)

• 2 Months ago

C# Engineer (.NET)

In The Pocket

Bucharest, Bucharest, Romania (On-Site)

• 1 Month ago

PhD Software Engineer

Google

Sunnyvale, California, United States (On-Site)

• 2 Months ago

Engineer, RMDC

fluence

Bengaluru, Karnataka, India (On-Site)

• 1 Month ago

AI/LLM Network Software Engineer (High Speed Network)

bytedance

Seattle, Washington, United States (On-Site)

• 3 Months ago

Lead Site Reliability Engineer

DraftKings

Boston, Massachusetts, United States (On-Site)

• 3 Months ago

Enterprise solution architect

HCL Tech

New Jersey, United States (On-Site)

• 1 Month ago

Major Account Manager - DACH

Sonar Source

Geneva, Geneva, Switzerland (On-Site)

• 7 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Jose, California, United States

Senior Software Engineer, Full Stack

Scout

Fremont, California, United States (On-Site)

• 2 Months ago

Technical Product Manager

Scale AI

San Francisco, California, United States (On-Site)

• 2 Months ago

Senior Analyst, FP&A

PayPal

San Jose, California, United States (Hybrid)

• 3 Weeks ago

Operations Coordinator

Nightfall AI

San Francisco, California, United States (On-Site)

• 2 Months ago

Strategic Finance Manager - GTM Finance

Demandbase

San Francisco, California, United States (On-Site)

• 1 Month ago

Account Strategist, Mid-Market Sales

Google

Ann Arbor, Michigan, United States (On-Site)

• 2 Months ago

SAP Project Manager

Apple

Austin, Texas, United States (On-Site)

• 1 Month ago

Coordinator, Events

Georgia, United States (On-Site)

• 3 Weeks ago

Welding Engineer

Kavalirio

Mount Pleasant, Pennsylvania, United States (On-Site)

• 2 Months ago

Deployment Engineer - North Carolina

Mashgin

Charlotte, North Carolina, United States (Remote)

• 8 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Senior Infrastructure Engineer (OpenSearch)

Workato

Lisbon, Lisbon, Portugal (On-Site)

• 1 Month ago

Senior Infrastructure Engineer - Windows OS

Intel

Phoenix, Arizona, United States (On-Site)

• 1 Year ago

AWS Cloud Engineer

Impronics Technologies

Gurugram, Haryana, India (On-Site)

• 1 Year ago

Backend Engineer III - Falcon NG-SIEM, Global Serverless Platform

CrowdStrike

Aarhus, Denmark (Hybrid)

• 1 Month ago

Senior DevOps Engineer

London stock Exchange

Colombo, Western Province, Sri Lanka (On-Site)

• 1 Month ago

Tencent Cloud - Senior Cloud Architect (R&D & Solution Design)

Tencent

Singapore (On-Site)

• 7 Months ago

Solutions Architect, Generative AI

NVIDIA

Santa Clara, California, United States (On-Site)

• 2 Months ago

Senior Engineer I - DevOps

Contentstack

Chennai, Tamil Nadu, India (Hybrid)

• 2 Months ago

Site Reliability Engineer, ML System - Foundation Model

bytedance

Seattle, Washington, United States (On-Site)

• 3 Months ago

Staff Cloud Engineer

Toast

United States (Remote)

• 3 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

bytedance

1045 Active Jobs

Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.

Get notified when new jobs are added by bytedance

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Senior Research Scientist/Engineer - AI Infrastructure

Job Summary

Job Description

6 skills required

6 skills required for this role

Job Details

Similar Jobs

Senior Infrastructure Engineer (OpenSearch)

Enterprise Customer Success Manager

Systems and Solutions Architect

Fine Tuning Specialist - Medical Device

Business Advisor

Salesforce Solution Architect

Software Engineer III, Infrastructure, Google Cloud AI

Senior Front End infrastructure Engineer

Senior DevOps Engineer

Platform Engineer III - MongoDB

Similar Skill Jobs

Software Engineer III, Google Cloud Global Networking

Senior/Expert Online Infrastructure Engineer

Business Development Manager/Sales

C# Engineer (.NET)

PhD Software Engineer

Engineer, RMDC

AI/LLM Network Software Engineer (High Speed Network)

Lead Site Reliability Engineer

Enterprise solution architect

Major Account Manager - DACH

Jobs in San Jose, California, United States

Senior Software Engineer, Full Stack

Technical Product Manager

Senior Analyst, FP&A

Operations Coordinator

Strategic Finance Manager - GTM Finance

Account Strategist, Mid-Market Sales

SAP Project Manager

Coordinator, Events

Welding Engineer

Deployment Engineer - North Carolina

Devops Jobs

Senior Infrastructure Engineer (OpenSearch)

Senior Infrastructure Engineer - Windows OS

AWS Cloud Engineer

Backend Engineer III - Falcon NG-SIEM, Global Serverless Platform

Senior DevOps Engineer

Tencent Cloud - Senior Cloud Architect (R&D & Solution Design)

Solutions Architect, Generative AI

Senior Engineer I - DevOps

Site Reliability Engineer, ML System - Foundation Model

Staff Cloud Engineer

About The Company

Research Scientist Graduate (Generative AI for Science (ByteDance Seed)) - 2026 Start (PhD)

Category Manager, Furnishing and Home Supplies (Philippines, eCommerce)

Research Scientist Graduate, AI-Native Database Systems - 2026 Start (PhD)

Research Scientist in Generative AI Graduate (Intelligent Creation)

Research Scientist Graduate (LLM Model Evaluation - Seed)

Machine Learning Graduate (E-Commerce Governance-CV/NLP/Multimodal/LLM)

Research Scientist Graduate (eCommerce Recommendation)

Software Engineer Graduate (Applied Machine Learning - Orchestration)

Product Manager, Large Language Model

Network Engineer, High Performance GPU Network Direction

Level Up Your Career in Game Development!