Jeune has a fascination for building and solving things. His interests include Algorithms, Development Practices and Software Design. When not immersed in his passions, he spends time experimenting in the kitchen. Jose is a DZone MVB and is not an employee of DZone and has posted 10 posts at DZone. You can read more from them at their website. View Full User Profile

# The Apriori Algorithm

03.18.2012
| 6870 views |

Here are just notes from my data mining class which I began to consolidate here in my blog as a way to assimilate the lessons.

The Apriori algorithm is a basic method for finding frequent itemsets. The latter is used to generate association rules with high confidence and high interest.

Here is my summary of it along with a running example. The following set of baskets will be used:

D = [ ('milk', 'cheese', 'tuna'),
('cheese', 'mushroom'),
('cheese', 'eggs'),
('milk', 'cheese', 'mushroom'),
('milk', 'eggs'),
('cheese', 'eggs'),
('milk', 'cheese', 'mushroom', 'tuna'),
('milk', 'cheese', 'eggs') ]

Some definitions:

$I$ – is the universal set of items. In the example above, the universal set would be {milk, cheese, tuna, mushroom, eggs}.

$C_{k}$ – is like a k-combination of $I$. Like because items in this set should have frequent (k-1, k-2,…1)-itemsets.

Now, the apriori algorithm.

1. Generate $C_{k}$ by cross joining the itemsets of $L_{k-1}$ among themselves.

Cross joining two sets

A cross join between two k-item sets is a union of those two sets which results in a k+1 itemset. However, the join only happens if and only if the first k-1 items of both sets are equal.

For example:

$[1,2] |x| [1,3] = [1,2,3]$
$[1,3] |x| [1,4] = [1,3,4]$
$[2,3] |x| [1,5] =$ no join

If $k=1$, simply list all 1 itemsets.

one_itemset = ['milk', 'cheese', 'eggs', 'mushroom', 'tuna']

2. Generate $L_{k}$, the frequent itsemsets by counting the number of times each element in $C_{k}$ occurs in $D$. If an element in $C_{k} \geq S$ or the support threshold, it is qualified to be a member of $L_{k}$. For $k=1$ and our example above for $L_{1}$ is

c1 = {'cheese': 7,
'tuna': 2,
'eggs': 6,
'mushroom': 2,
'milk': 6}

3. Repeat the process for $k \leq |I|$ until no frequent itemsets are found.

Published at DZone with permission of Jose Asuncion, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

"Starting from scratch" is seductive but disease ridden