Apriori Association is a type of unsupervised learning technique. It looks for the relation of one entity to another entity and then maps them accordingly so that it can be more appropriate. The entity can be anything ranging from grocery items like milk to clothing items like a shirt. The technique finds some interesting associations or relations among the variables of dataset. Apriori Association uses a Hash Tree search and breadth-first to calculate the associations in the entities effectively. The process of finding the frequent items for a massive dataset is an iterative process. This algorithm was introduced by the R. Agrawal and Srikant in 1994. It is majorly utilised for market basket analysis. It helps in finding those products which can be bought together. It can also be used in the medical dimain to find potential drug reactions for patients. One of the use case of Association is inside malls where the store owner can use this unsupervised machine learning algorithms to find association between multiple items like eggs, milk, bread or vegetables after taking and reading the costumer insights. If the relation is high, the store owner will keep them together in the store and increase the sales as the probability of the costumer buying them is much higher.
With that said, let's start.
To download the dataset I have used in this article- click here

You will understand more about the model in the article in coming pages

Simple step by Step Process

Please follow this step by step process to get a high view understanding of Association and Apriori Association which is one of the important concept in Machine Learning and Computer Science.
I will execute the model on a sample dataset with by solving a typical association problem in simple steps with a workable accuracy for your understanding.

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

df = pd.read_csv('090.csv', header = None)

df.head()

Brief on the problem we are dealing with

The dataset is having 20 columns and 7501 columns, we have 20 food items in a grocery store as columns and the rows represents the costumer

This data explains that costumer 0 buys shrimp, almonds, avocado together
Costumer 1 buys burgers, meatballs and eggs toghter and so on

To visualize this data, let us take this image into imagination

Here is the transaction per costumers, the model's role is to understand what kind of items the customer bought together and then give us the output what is the association of item 1 with another, for instance it will tell us that wrt customer buying patterns, how is milk related to egges

df.shape

(7501, 20)

Data Transformation:

As aprori expects a list and not a dataframe So we will transform this dataset to a list in this step

print(type(df))

<class 'pandas.core.frame.DataFrame'>

listt = []
for i in range(0, 7051):
        listt.append([str(df.values[i,j]) for j in range(0,20)])

listt[:1]

[['shrimp',
  'almonds',
  'avocado',
  'vegetables mix',
  'green grapes',
  'whole weat flour',
  'yams',
  'cottage cheese',
  'energy drink',
  'tomato juice',
  'low fat yogurt',
  'green tea',
  'honey',
  'salad',
  'mineral water',
  'salmon',
  'antioxydant juice',
  'frozen smoothie',
  'spinach',
  'olive oil'],
 [#1st Customer's Purchase Data,
  'burgers',
  'meatballs',
  'eggs',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan'],
 [#2nd Customer's Purchase Data,
  'chutney',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan']]

Model fitting and training on the dataset(list)

As aprori is an unsupervised learning method, We will download the pretrained apriori algorithm file and we will import aprori class from that file which we will later use to work with our store dataset
To download the Pretrained Aprori Model CLICK HERE

The file we need is apyori.py, Ignore other files and continue here

from apyori import apriori

rules = apriori(listt, min_support = 0.003 , min_confidence = 0.2, min_lift = 3, min_length = 2)

min_support

All right so let's start with the support,
support of a set of item 1 (e.g. milk) and item 2 (e.g egg) will be equal to the number of transactions executed (frequency) in our data of total 7500 transactions which had the frequency of egg and milk together (I) divided by the total number of transactions performed which is 7500 in this case.
The support argument that we're putting here is actually the minimum support you want to have in your rules.
That means that the items that are going to appear in your rules will have a higher support than this, i.e higher frequency of occurance (of milk and egg in this case) in the dataset than the min_support parameter
So what we must ask ourselves Is that what supports Do we want to have our different items in the rules

min_confidence

We can see that the minimum confidence of this rule is 0.2, 0.2 here implies 20 percent of occurances which implies that if for instance 100 people in California and same in Texas purchase mushroom cream sauce in the store, they have 20 percent chance of being as close together as well.

min_lift

We can see that the minimum confidence of this rule is 0.2, 0.2 here implies 20 percent of occurances which implies that if for instance 100 people in California and same in Texas purchase mushroom cream sauce in the store, they have 20 percent chance of being as close together as well.

min_length

min_length: This indicates what is the minimum number of items that should be in a transaction which in our case is 2, therefore we are considering the trnsactions or custumers data which are having minimum 2 in the cart at time of purchase.



rules = list(rules)

rules[:1] #getting the 1st item from the output

[RelationRecord(items=frozenset({'burgers', 'almonds'}), support=0.005531130336122536, ordered_statistics=[OrderedStatistic(items_base=frozenset({'almonds'}), items_add=frozenset({'burgers'}), confidence=0.2671232876712329, lift=3.102942835864684)])]

rules[30:31] #getting the 30th item from the output

[RelationRecord(items=frozenset({'milk', 'bread'}), support=0.009592235556175531, ordered_statistics=[OrderedStatistic(items_base=frozenset({'milk'}), items_add=frozenset({'bread'}), confidence=0.7771235575511127, lift=6.000555746668677)])]

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	shrimp	almonds	avocado	vegetables mix	green grapes	whole weat flour	yams	cottage cheese	energy drink	tomato juice	low fat yogurt	green tea	honey	salad	mineral water	salmon	antioxydant juice	frozen smoothie	spinach	olive oil
1	burgers	meatballs	eggs	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	chutney	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	turkey	avocado	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	mineral water	milk	energy bar	whole wheat rice	green tea	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Massivefile.com - Blog

Apriori Association (Implementation)

Simple step by Step Process

First we import libraries

Now understanding our downloaded dataset

Brief on the problem we are dealing with

Data Transformation:

Model fitting and training on the dataset(list)

Importing the model

Brief into basic association concepts and the parameters

min_support

min_confidence

min_lift

min_length

Fitting the model

See results