MDS Capstone project - Fresh Prep weekly orders prediction
For my Capstone project, I collaborated with Hayley Boyce, Maninder Kohli and Rachel K. Riggs to help improve Fresh Prep’s weekly orders prediction accuracy. In this post, I summarize our work and the lessons learned. Due to NDA restrictions, I do not include any confidential information such as descriptive statistics.
Project Background
Overall, our work was focused in three main areas:
- Analyze customer behaviour to reveal patterns and trends
- Improve weekly orders prediction accuracy
- Build an interactive Tableau dashboard that displays predictions and client information
Exploratory Data Analysis (EDA)
The raw data was stored in a large PostgreSQL database consisting of many tables. It took our team time throughout the course of the project to create a clean, workable and consolidated file. Understanding the data was key to our success. Below are some of the insights we uncovered:
- Active and paused customers have significantly different billed order rates
- Customers have a tendency towards their same behaviour from the previous week
- Active customers plan further out than paused customers do
- The customers with the most dietary restrictions (7 or 8) are less likely to skip their orders
- Customers who customize their orders most frequently (over 80%) skip their orders more than those with customizations rates between 20-80%
These insights guided our feature selection and engineering.
Feature Engineering
A significant component of this project involved creating the below features that were used in our predictive model:
- Individual’s billed order rate up to a given order: smoothed using an empirical Bayes estimation approach, to account for newer clients with a small number of orders in their histories
- Weekly billed order rate: average rate for the corresponding week the year prior
- Customer behaviour (skip or bill) in the 5 weeks prior to a given order (5 lags)
- The month a client joined
- The number of weeks that they existed at the time of an order
- Their subscription prices
- Their location
- Beef as a dietary restriction
Predictive Model
Results
Future Improvements
- Run the models only on the undecided groups, and add their expected values to the numbers of the decided groups (opt-in for paused and opt-out for active)
- Engineer a feature that is a true/false value for whether or not a week includes a holiday
- Explore possible effects of other features such as recipes and customizations
Lessons Learned
- Before beginning a project, ask client/partner: What does a successful project look like?
- Working in group means you won’t do everything and that’s ok. I barely mention the dashboard in this post because I wasn’t involved in creating it
- EDA is crucial before advancing to a machine learning problem
- Data wrangling never ends
- Ensure you know the time zone of your database even if it seems obvious
- A feature that gives you the highest accuracy is not necessarily a usable feature
- Be skeptical of 99% accuracy. Things that appear too good to be true generally are