With that said, let's start.
To download the dataset I have used in this article- click here
You will understand more about the model in the article in coming pages
Simple step by Step Process
Please follow this step by step process to get a high view understanding of Association and Apriori Association which is one of the important concept in Machine Learning and Computer Science.
I will execute the model on a sample dataset with by solving a typical association problem in simple steps with a workable accuracy for your understanding.
First we import libraries
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
Now understanding our downloaded dataset
df = pd.read_csv('090.csv', header = None)
df.head()
Brief on the problem we are dealing with
The dataset is having 20 columns and 7501 columns, we have 20 food items in a grocery store as columns and the rows represents the costumer
This data explains that costumer 0 buys shrimp, almonds, avocado together
Costumer 1 buys burgers, meatballs and eggs toghter and so on
To visualize this data, let us take this image into imagination
Here is the transaction per costumers, the model's role is to understand what kind of items the customer bought together and then give us the output what is the association of item 1 with another, for instance it will tell us that wrt customer buying patterns, how is milk related to egges
df.shape
Data Transformation:
As aprori expects a list and not a dataframe So we will transform this dataset to a list in this stepprint(type(df))
listt = []
for i in range(0, 7051):
listt.append([str(df.values[i,j]) for j in range(0,20)])
listt[:1]
Let's have a look to our data which should be in list form for the next step
Model fitting and training on the dataset(list)
As aprori is an unsupervised learning method, We will download the pretrained apriori algorithm file and we will import aprori class from that file which we will later use to work with our store dataset
To download the Pretrained Aprori Model CLICK HERE
The file we need is apyori.py, Ignore other files and continue here
Importing the model
from apyori import apriori
Now, we will need to instantiate the class with an object with which we will call our class later
rules = apriori(listt, min_support = 0.003 , min_confidence = 0.2, min_lift = 3, min_length = 2)
Brief into basic association concepts and the parameters
The Apriori Association takes in the parameters: input list, minimum support, minumum confidence, minimum lift and minimum length
min_support
All right so let's start with the support,
support of a set of item 1 (e.g. milk) and item 2 (e.g egg) will be equal to the number of
transactions executed (frequency) in our data of total 7500 transactions which had the frequency of egg and milk together (I) divided by the total number of transactions performed which is 7500 in this case.
The support argument that we're putting here is actually the minimum support you want to have in your rules.
That means that the items that are going to appear in your rules will have a higher support than this, i.e higher frequency of occurance (of milk and egg in this case) in the dataset than the min_support parameter
So what we must ask ourselves Is that what supports Do we want to have our different items in the rules
We can see that the minimum confidence of this rule is 0.2, 0.2 here implies 20 percent of occurances which implies that if for instance 100 people in California and same in Texas purchase mushroom cream sauce in the store, they have 20 percent chance of being as close together as well.
We can see that the minimum confidence of this rule is 0.2, 0.2 here implies 20 percent of occurances which implies that if for instance 100 people in California and same in Texas purchase mushroom cream sauce in the store, they have 20 percent chance of being as close together as well.
min_length: This indicates what is the minimum number of items that should be in a transaction which in our case is 2, therefore we are considering the trnsactions or custumers data which are having minimum 2 in the cart at time of purchase.
Fitting the model
In this step we will fit our model to the dataset, it will now learn and generalize to the list data structure with name"rules" we will provide to it
rules = list(rules)
See results
rules[:1] #getting the 1st item from the output
Here, as we can see. We have our first association from the output list from the Apriori Association algorithm which gives the associative value of "burgers" and "almonds" with support of 55%, confidence of only 26% and a lift of 3
We can say that burgers and almonds are not very well suited together
Let us see another output
rules[30:31] #getting the 30th item from the output
Here, as we can see. We have our second association from the 30th item from the output list from the Apriori Association algorithm which gives the associative value of "milk" and "bread" with support of 96%, confidence of 77% and a lift of 6
We can say that milk and bread are highly bough together and must be kept together to increase store sales