Association Rule Mining - Apriori Algorithm

1. Luật kết hợp

$$X \rightarrow Y$$

2. Support

Support của tập item X:

$$\text{support}(X) = \frac{|\{T \in D \mid X \subseteq T\}|}{|D|}$$

Support của luật X → Y:

$$\text{support}(X \rightarrow Y) = \text{support}(X \cup Y)$$

3. Confidence

$$\text{confidence}(X \rightarrow Y) = \frac{\text{support}(X \cup Y)}{\text{support}(X)}$$

4. Lift

$$\text{lift}(X \rightarrow Y) = \frac{\text{confidence}(X \rightarrow Y)}{\text{support}(Y)} = \frac{\text{support}(X \cup Y)}{\text{support}(X)\,\text{support}(Y)}$$

5. Leverage

$$\text{leverage}(X \rightarrow Y) = \text{support}(X \cup Y) - \text{support}(X)\,\text{support}(Y)$$

6. Conviction

$$\text{conviction}(X \rightarrow Y) = \frac{1 - \text{support}(Y)}{1 - \text{confidence}(X \rightarrow Y)}$$

7. Cosine

$$\text{cosine}(X \rightarrow Y) = \frac{\text{support}(X \cup Y)}{\sqrt{\text{support}(X)\,\text{support}(Y)}}$$

Example 1

Min Support = 0.3

Transaction ID	Items
T1	Hot Dogs, Buns, Ketchup
T2	Hot Dogs, Buns
T3	Hot Dogs, Coke, Chips
T4	Chips, Coke
T5	Chips, Ketchup
T6	Hot Dogs, Coke, Chips

Buns, Chips, Coke, Hot Dogs, Ketchup

Example 2

Min support = 0.3

Transaction ID	Items Purchased
T1	Rock, Jazz
T2	Jazz, Pop, Classical
T3	Rock, Pop
T4	Jazz, Rock, Pop, Classical
T5	Pop, Classical
T6	Rock, Jazz, Classical
T7	Jazz, Pop, Classical
T8	Rock, Pop, Jazz

Classical, Jazz, Pop, Rock

Example 3

Min support = 0.25

Transaction ID	Items
T1	egg, bread
T2	juice, egg, butter
T3	juice, egg, bread
T4	juice, bread
T5	juice, egg
T6	juice, bread, butter
T7	juice, egg, butter
T8	bread, butter
T9	juice, bread
T10	egg, butter
T11	juice, egg, butter

bread, butter, egg, juice

Implement

data = pd.DataFrame([
    ['T1', ['Hot Dogs', 'Buns', 'Ketchup']],
    ['T2', ['Hot Dogs', 'Buns']],
    ['T3', ['Hot Dogs', 'Coke', 'Chips']],
    ['T4', ['Chips', 'Coke']],
    ['T5', ['Chips', 'Ketchup']],
    ['T6', ['Hot Dogs', 'Coke', 'Chips']]
], columns=['Transaction', 'itemset'])
encoder = TransactionEncoder()
encoder.fit(data['itemset'])
df = pd.DataFrame(data = encoder.transform(data['itemset']), columns=encoder.columns_, dtype=int)
apriori(df, min_support=0.3, use_colnames=True)