Please note that this is an ML Engineering role, not a Scientist role. The focus for the position will be on ML pipeline development, deployment, and management.
Who we are:
Wayfair runs the largest custom e-commerce large parcel network in the United States, approximately 1.6 million square meters of logistics space. The nature of the network is inherently a highly variable ecosystem that requires flexible, reliable, and resilient systems to operate efficiently.
At Wayfair we are well on the way to becoming the world’s number one, online destination for all things home. Our core belief is that everyone should live in a home they love. We make this possible by ensuring our 24 million customers have all the technology and innovation they need at their fingertips, to give them access to our more than 33 million products which are provided by our 23,000 awesome global suppliers.
Wayfair is moving the world so that anyone can live in a home they love – a journey enabled by more than 3,000 Wayfair engineers and a data-centric culture. Wayfair’s Data Science Marketing team builds algorithmic systems that drive our business, enhance customer experience, and improve customer loyalty. You will be part of a cross-functional, collaborative team driving development of world-class ML systems that improve our customer understanding and marketing decisions.
What you’ll do:
- Own and contribute to the ML Pipeline development lifecycle from Data wrangling, feature development, training and tuning ML model with Data Scientist, deploy and manage the Inference Pipeline.
- Develop a reusable code and pattern to scale the ML Pipeline to new business use cases and create a self service platform.
- Partner closely with the ML Platform team, Infrastructure team, and similar teams to ensure the Data science org has the data, computing resources, and workflows/abstractions needed to do our best work.
- Create jobs for human annotations on 3rd party vendors like LabelBox, Snorkel, etc. Partner with them to support new features as needed.
- Define and advance MLOps best practices within data science, engineering and Platforms teams.
- Contribute to SME initiative and code review in support of spreading best practices
We Are a Match Because You Have:
- 6+ years experience as a ML Engineer with strong engineering skills and a passion for working on turning reference implementations into production-ready software.
- Proficiency in at least one high-level programming language (Python, Java, Scala or equivalent) used both for ML and automation tasks.
- Experience with Python ML ecosystem (numpy, pandas, sklearn, XGBoost, etc.) and Apache Spark Ecosystem (Spark SQL, MLlib/Spark ML)
- Hands-on experience building scalable ML & big data processing pipelines with big data tools such as Hadoop, Hive, SQL, Spark and GCP cloud services such as DataProc, BigQuery, GCS etc.
- Experience with automated data pipeline and workflow management tools, i.e. Airflow.
- Experience with basic software engineering tools, e.g., git, CI/CD environment (such as Jenkins or Buildkite), PyPi, Docker, Kubernetes, unit testing, and general object-oriented design.
It’s Great if You Have:
- Masters or Bachelors in Computer Science / Operations Research / Statistics or other quantitative fields
- Experience with common ML frameworks/libraries such as Vowel wabbit, Tensorflow, PyTorch is preferred.
- Experience with Kubernetes and micro-services is preferred.
- Experience with Cloud Services such as GCP AI Platform.
- Deploying and scaling ML solutions using open-source frameworks (MLFlow, TFX, H2O, etc.)