Non Data Scientists identify with it as the common Amazon phrase, “customers who bought item also bought” or Netflix’s “Watch this if you like”. Recommendation engines have emerged as the key drivers of upselling and cross-selling in ecommerce, with users and businesses becoming heavily reliant on them for incremental purchases and users using them for preferred product discovery.
MARKET BASKET ANALYSIS
At the core of this recommendation engine is the market basket analysis algorithm, a subset of affinity analysis in statistics. Affinity analysis is defined as a data mining and data analysis technique that discovers co-occurrence relationships among activities performed by specific individuals or groups.
In the case of online commerce, this pertains to the discovery of common attributes in the format, meta or other data points among different items in a dataset through transaction data of a user. Market basket analysis is a sub-set of Association Analysis.
THE APRIORI ALGORITHM
At the heart of the market basket analysis is the apriori algorithm, which follows item-set generation and rule generation steps. It also scans the entire transactional database to find frequent item-set correlativity. The key concepts of the algorithm include the following:
Itemset – An itemset, like it’s name, is the total count of items that a customer purchases in one single session. Following the If(*)=Then(*) rule, it contains values ranging from zero (null) to n (all items)
Support – Support is a critical element in market basket analysis and comprises the frequency at which at which an item appears in the given range. This is often stated as a probability count and is expressed in numerical value where:
Support Count =
Frequency of Item
Total Number of Transactions
Confidence – Confidence Count in market basket analysis, in simple terms, is the ratio of the number of transactions that include the consequent item (the conditional purchase- in this case, the item that is bought when the first item is purchased) to the total number of transactions. This is also expressed in a numerical value where:
Confidence Count =
Frequency of the Consequent Item
Total Number of Transactions
Lift Ratio – The third important component of the market basket analysis is the lift ratio. The lift ratio is defined as the accuracy and efficiency of the rules set in support and confidence, when calculating results, when compared to a random set of transactions. By rule of thumb, a lift ratio of more than 1 is considered useful whereas a lift ratio of less than 1 is generally considered inaccurate. The greater the lift ratio, the stronger the accuracy of the association of the items. Like support and confidence, lift ratio is also expressed as a numerical value where:
The lift ratio is usually expressed in decimal points.
With these concepts in place, the next step is to implement the apriori algorithms which is expressed as:
The results of the rules, comprising the code-sets, can be then run across the entire transactional database for further tuning and accuracy.
The popular statistical programming language R has its own set of libraries that contain the code for running the entire market basket analysis on a transactional database. These are
tidyverse, readxl/readcsv (depending on the dataset), knitr, ggplot2, lubridate, arules, arulesViz, plyr.
These can be imported and run against the transactions to find association rules, identify the support, confidence and lift, and thereafter, run apriori.
USE CASES – AMAZON AND NETFLIX
The association rules of market basket analysis are most commonly used to establish associations and preferences between items in recommendation engines in e-commerce. Amazon is one of the most widely cited cases of early market basket implementations in the market, where recommendation engines have had a massive impact in their cross-selling and upselling initiatives, accounting for a massive one fourth of user clicks on the site.
Similarly, market basket combined with cohort analyses are also the secret ingredients behind Netflix’s highly successful “because you watched” recommendations. In this case, the recommendations are the consequents that have been generated by their algorithms based on the precedent items, which are the first item-sets of the viewer’s transactions (your clicks on the media thumbnails).
This interesting analysis is just one single aspects of the multitude of ways in which data science is changing the world, making businesses more efficient improving the customer experience, while delivering on higher revenues and growth.
(The DASCA NewsDesk. Research by CredRadar™ and CredForce)