ML Ops Engineer
Stord
Job Summary
Stord is seeking an ML Operations Bridge Engineer to join its newly formed AI team, focusing on building cutting-edge ML features. This role bridges data science and production engineering, involving the deployment of models to production APIs, building real-time feature engineering pipelines, and developing CI/CD for model deployment. The engineer will work on critical features like delivery time estimation and demand forecasting, with the freedom to shape MLOps practices and tooling.
Must Have
- Deploy trained ML models to production platforms like Modal.com or Vertex AI
- Build APIs that serve model predictions with low latency
- Implement A/B testing frameworks for model comparison
- Create model versioning and rollback strategies
- Monitor model performance and detect drift
- Build real-time feature engineering pipelines using Kafka Streams or similar tooling
- Create data validation and quality monitoring systems
- Design feature stores for both training and inference
- Implement efficient data transformations from Postgres/AlloyDB sources
- Develop CI/CD pipelines for model deployment
- Build monitoring dashboards for model and pipeline health
- Optimize inference costs across cloud platforms
- Create developer tools for ML features integration
- Document and evangelize MLOps best practices
- Partner with Data Scientists to understand model requirements
- Collaborate with platform engineers to integrate ML features
- Work with product teams to define success metrics
Good to Have
- Experience with Modal.com, Vertex AI, or similar ML platforms
- Kafka or other streaming data platform experience
- Familiarity with Elixir or functional programming
- Knowledge of logistics, e-commerce, or supply chain domains
- Experience with Cloudflare Workers or edge computing
- Contributions to open source ML/data tools
- Experience with feature stores (Feast, Tecton)
- Container orchestration (Kubernetes)
Job Description
Stord is The Consumer Experience Company, powering seamless checkout through delivery for today's leading brands. Stord is rapidly growing and is on track to double our revenue in the next 18 months. To meet and exceed this target, Stord is strategically scaling teams across the entire company, and seeking energetic experts to help us achieve our mission.
By combining comprehensive commerce-enablement technology with high-volume fulfillment services, Stord provides brands a platform to compete with retail giants. Stord manages over $10 billion of commerce annually through its fulfillment, warehousing, transportation, and operator-built software suite including OMS, Pre- and Post-Purchase, and WMS platforms. Stord is leveling the playing field for all brands to deliver the best consumer experience at scale.
With Stord, brands can increase cart conversion, improve unit economics, and drive sustained customer loyalty. Stord’s end-to-end commerce solutions combine best-in-class omnichannel fulfillment and shipping with leading technology to ensure fast shipping, reliable delivery promises, easy access to more channels, and improved margins on every order.
Hundreds of leading DTC and B2B companies like AG1, True Classic, Native, Seed Health, quip, goodr, Sundays for Dogs, and more trust Stord to deliver industry-leading consumer experiences on every order. Stord is headquartered in Atlanta with facilities across the United States, Canada, and Europe. Stord is backed by top-tier investors including Kleiner Perkins, Franklin Templeton, Founders Fund, Strike Capital, Baillie Gifford, and Salesforce Ventures.
Stord is revolutionizing the logistics industry with our cloud-based supply chain platform. Our newly formed AI team is building cutting-edge features that leverage both traditional ML models (deployed on Modal.com) and LLM capabilities (via Cloudflare Workers AI, Vertex AI, and direct provider integrations). We need someone who can bridge the gap between data science and production engineering to help us ship ML features rapidly and reliably.
We are seeking a skilled ML Operations Bridge Engineer who thrives at the intersection of data science and software engineering. You'll work directly with our Senior Data Scientist to take models from Jupyter notebooks to production APIs serving millions of predictions daily. This is a hands-on role where you'll build data pipelines, deploy models, create monitoring systems, and ensure our ML features deliver real business value.
In this role, you'll be instrumental in building our ML infrastructure from the ground up. You'll work on critical features like delivery time estimation, demand forecasting, and AI-powered insights, with the freedom to shape our MLOps practices and tooling choices. This is a unique opportunity to have massive impact on a small, ambitious team.
What You'll Do:
ML Operations & Deployment:
- Take trained models from our Data Scientist and deploy them to Modal.com or Vertex AI
- Build TypeScript (or Python or Elixir – the best tool for the job) APIs that serve model predictions with <100ms latency
- Implement A/B testing frameworks for model comparison
- Create model versioning and rollback strategies
- Monitor model performance and catch drift before customers notice
Data Pipeline Development:
- Build real-time feature engineering pipelines using Kafka Streams or similar tooling
- Create data validation and quality monitoring systems
- Design feature stores that serve both training and inference
- Implement efficient data transformations from our Postgres/AlloyDB sources
- Ensure data consistency across our ML and production systems
Infrastructure & Integration:
- Develop CI/CD pipelines for model deployment
- Build monitoring dashboards for model and pipeline health
- Optimize inference costs across Modal, Cloudflare, and GCP
- Create developer tools that make ML features easy to integrate
- Document and evangelize MLOps best practices
Cross-functional Collaboration:
- Partner with the Data Scientist to understand model requirements
- Work with platform engineers to integrate ML features into our core Elixir services
- Collaborate with product teams to define success metrics
- Help other engineers understand and use ML capabilities
What You'll Need:
- Strong Python (3+ years) - You've shipped ML models to production, not just notebooks
- Strong TypeScript/JavaScript (2+ years) - You can build robust APIs and understand async patterns
- MLOps Experience - You've deployed models, built pipelines, and monitored performance
- Data Engineering - Experience with streaming data, ETL/ELT, and data quality
- Cloud Platforms - Hands-on experience with GCP, AWS, or Azure (we’re on GCP)
- Version Control - Expert with Git/GitHub and collaborative workflows
- SQL Proficiency - Can write complex queries and optimize performance
- Production Mindset - You care more about customer impact than perfect code
- Pragmatic Approach - You know when to use simple solutions vs complex ones
- Strong Communication - Can explain technical decisions to various audiences
- Self-Directed - You identify what needs doing without detailed specs
- Learning Agility - Comfortable picking up new tools and technologies quickly
Preferred Qualifications:
- Experience with Modal.com, Vertex AI, or similar ML platforms
- Kafka or other streaming data platform experience
- Familiarity with Elixir or functional programming
- Knowledge of logistics, e-commerce, or supply chain domains
- Experience with Cloudflare Workers or edge computing
- Contributions to open source ML/data tools
- Experience with feature stores (Feast, Tecton)
- Container orchestration (Kubernetes) kn