Position Overview
Join us to design the core data systems powering both traditional machine learning and cutting-edge generative AI/LLM workflows. As a Senior/Principal Software Engineer, you’ll specialize in one of two tracks:
- Data & Feature Store Infrastructure: Build scalable backend systems for data ingestion, batch/streaming ETL pipelines, feature stores, vector-enabled APIs, and data compliance
- Labeling & Human Feedback Systems: Design multimodal annotation platforms (text, image, audio, video, 3D), develop RLHF workflows (instruction tuning, output ranking), and drive LLM-assisted labeling innovations
You’ll work closely with ML engineers, MLOps, and product teams to deliver high-impact data and labeling solutions at scale. Reporting to the Head of AI & ML Platform, you’ll turn AI research into production-ready features that create real customer value.
Responsibilities
Choose one track to focus on:
Data & Feature Store Infrastructure
- Design and implement scalable feature engineering systems for both batch and streaming computation
- Build and maintain low-latency online feature serving systems with consistency between training and inference
- Develop and maintain monitoring systems for feature freshness, data drift, and data quality
- Integrate feature management solutions with vector databases to support embeddings and retrieval-augmented generation (RAG) workflows
- Ensure compliance, lineage, and best practices for infrastructure as code
Labeling & Human Feedback Systems
- Build and scale annotation platforms for diverse data types: text, image, video, audio, and 3D
- Develop workflows for LLM alignment, including instruction tuning and RLHF (Reinforcement Learning from Human Feedback) output ranking
- Embed LLM-assisted labeling features such as auto-labeling, policy checking, and active learning
- Drive annotation quality through processes such as inter-annotator agreement, gold standard samples, and anomaly detection
- Manage and scale internal/external labeling teams while maintaining secure data integration
Minimum Qualifications
- 5+ years of experience in data engineering, ML platform, or backend development roles
- Proficiency in at least one modern programming language (Python preferred)
- Experience developing and operating distributed backend APIs and SDKs
- Experience working with cloud platforms (AWS, GCP, or Azure), containers (Docker/Kubernetes), and infrastructure-as-code tools (e.g., Terraform)
Plus, one of the following specialization experiences:
Feature Store Track: (At least have experience with TWO of the following)
- Hands-on experience with feature store frameworks (e.g., SageMaker Feature Store, Feast, Tecton, Hopsworks), or operating vector database systems for serving LLM use cases
- Experience with batch and/or streaming data pipelines (e.g., Kafka, Flink, Spark, Ray) and orchestration tools (e.g., Airflow, Argo Workflow)
- Demonstrated experience at least in one the data areas: data catalog, data validation, versioning, lineage, and security/compliance
Labeling Track: (At least have experience with ONE of the following)
- Proven working experience with labeling platforms (e.g., GroundTruth, Label Studio)
- RLHF/instruction tuning, or annotation workflow development
Preferred Qualifications
- Experience with LLM pipelines, including embeddings, retrieval-augmented generation (RAG), or prompt engineering
- Familiarity with labeling copilot tools, active learning, or managing hybrid annotation teams
- Knowledge of knowledge graphs or semantic data modeling