MDS Capstone project - Fresh Prep weekly orders prediction

For my Capstone project, I collaborated with Hayley Boyce, Maninder Kohli and Rachel K. Riggs to help improve Fresh Prep’s weekly orders prediction accuracy. In this post, I summarize our work and the lessons learned. Due to NDA restrictions, I do not include any confidential information such as descriptive statistics.

Project Background

Fresh Prep is a Vancouver-based meal kit delivery company that strives to help their customers cook quick and healthy dinners. They offer a weekly rotating menu with 10 recipes and are committed to sustainability.
A Fresh Prep customer has a status of either active or paused. Orders are automatically generated for all customers regardless of status; however, for paused customers these orders are also automatically skipped. Our goal therefore was to determine which active status customers take action to opt-out as well as which paused customers take action to opt-in.
To answer our main question: how many orders are expected in the upcoming weeks? , we answered a slightly harder question: which customers will be ordering in the upcoming weeks? Answering the latter question not only provides Fresh Prep with the number of orders predicted for a given week, but also allows them to target undecided customers.

Overall, our work was focused in three main areas:

  • Analyze customer behaviour to reveal patterns and trends
  • Improve weekly orders prediction accuracy
  • Build an interactive Tableau dashboard that displays predictions and client information

Exploratory Data Analysis (EDA)

The raw data was stored in a large PostgreSQL database consisting of many tables. It took our team time throughout the course of the project to create a clean, workable and consolidated file. Understanding the data was key to our success. Below are some of the insights we uncovered:

  • Active and paused customers have significantly different billed order rates
  • Customers have a tendency towards their same behaviour from the previous week
  • Active customers plan further out than paused customers do
  • The customers with the most dietary restrictions (7 or 8) are less likely to skip their orders
  • Customers who customize their orders most frequently (over 80%) skip their orders more than those with customizations rates between 20-80%

These insights guided our feature selection and engineering.

Feature Engineering

A significant component of this project involved creating the below features that were used in our predictive model:

  • Individual’s billed order rate up to a given order: smoothed using an empirical Bayes estimation approach, to account for newer clients with a small number of orders in their histories
  • Weekly billed order rate: average rate for the corresponding week the year prior
  • Customer behaviour (skip or bill) in the 5 weeks prior to a given order (5 lags)
  • The month a client joined
  • The number of weeks that they existed at the time of an order
  • Their subscription prices
  • Their location
  • Beef as a dietary restriction

Predictive Model

We trained two different Logistic Regression models; one for active customers and one for paused customers. We opted for a Logistic Regression model because it provided more interpretable weights as well as more trustworthy probabilities than other models we experimented with, such as a Random Forest. The models output a probability of ordering for each customer and these are summed up to provide the expected number of orders for a given week. For active customers, we took an expected value but for paused customers, we took their most trusted behaviour. This means that we set any values smaller than 0.5 to 0; and anything larger than 0.5 to 1. This is because most customers have a paused status and adding up small probabilities (i.e: 0.01*1000 for example) overestimates the number of orders.

Results

Our models have a 4.61 Mean Absolute Percentage Error (MAPE) on the total number of actual orders from June 2018 to June 2019.This means that: for a week in which Fresh Prep expects 100 orders, the error is at most 5 orders. If we focus on dates only in 2019 the models have a 1.5% error : for a week in which Fresh Prep expects 100 orders, the error is at most 2 orders.

Future Improvements

  • Run the models only on the undecided groups, and add their expected values to the numbers of the decided groups (opt-in for paused and opt-out for active)
  • Engineer a feature that is a true/false value for whether or not a week includes a holiday
  • Explore possible effects of other features such as recipes and customizations

Lessons Learned

  • Before beginning a project, ask client/partner: What does a successful project look like?
  • Working in group means you won’t do everything and that’s ok. I barely mention the dashboard in this post because I wasn’t involved in creating it
  • EDA is crucial before advancing to a machine learning problem
  • Data wrangling never ends
  • Ensure you know the time zone of your database even if it seems obvious
  • A feature that gives you the highest accuracy is not necessarily a usable feature
  • Be skeptical of 99% accuracy. Things that appear too good to be true generally are
Written on July 1, 2019