AI

Association Rule Mining - Apriori Algorithm

1. Luật kết hợp

$$X \rightarrow Y$$

2. Support

Support của tập item X:

$$\text{support}(X) = \frac{|\{T \in D \mid X \subseteq T\}|}{|D|}$$

Support của luật X → Y:

$$\text{support}(X \rightarrow Y) = \text{support}(X \cup Y)$$

3. Confidence

$$\text{confidence}(X \rightarrow Y) = \frac{\text{support}(X \cup Y)}{\text{support}(X)}$$

4. Lift

$$\text{lift}(X \rightarrow Y) = \frac{\text{confidence}(X \rightarrow Y)}{\text{support}(Y)} = \frac{\text{support}(X \cup Y)}{\text{support}(X)\,\text{support}(Y)}$$

5. Leverage

$$\text{leverage}(X \rightarrow Y) = \text{support}(X \cup Y) - \text{support}(X)\,\text{support}(Y)$$

6. Conviction

$$\text{conviction}(X \rightarrow Y) = \frac{1 - \text{support}(Y)}{1 - \text{confidence}(X \rightarrow Y)}$$

7. Cosine

$$\text{cosine}(X \rightarrow Y) = \frac{\text{support}(X \cup Y)}{\sqrt{\text{support}(X)\,\text{support}(Y)}}$$

Example 1

Min Support = 0.3

Transaction ID Items
T1 Hot Dogs, Buns, Ketchup
T2 Hot Dogs, Buns
T3 Hot Dogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 Hot Dogs, Coke, Chips

Buns, Chips, Coke, Hot Dogs, Ketchup

Example 2

Min support = 0.3

Transaction IDItems Purchased
T1Rock, Jazz
T2Jazz, Pop, Classical
T3Rock, Pop
T4Jazz, Rock, Pop, Classical
T5Pop, Classical
T6Rock, Jazz, Classical
T7Jazz, Pop, Classical
T8Rock, Pop, Jazz

Classical, Jazz, Pop, Rock

Example 3

Min support = 0.25

Transaction ID Items
T1egg, bread
T2juice, egg, butter
T3juice, egg, bread
T4juice, bread
T5juice, egg
T6juice, bread, butter
T7juice, egg, butter
T8bread, butter
T9juice, bread
T10egg, butter
T11juice, egg, butter

bread, butter, egg, juice

Implement

data = pd.DataFrame([
    ['T1', ['Hot Dogs', 'Buns', 'Ketchup']],
    ['T2', ['Hot Dogs', 'Buns']],
    ['T3', ['Hot Dogs', 'Coke', 'Chips']],
    ['T4', ['Chips', 'Coke']],
    ['T5', ['Chips', 'Ketchup']],
    ['T6', ['Hot Dogs', 'Coke', 'Chips']]
], columns=['Transaction', 'itemset'])
encoder = TransactionEncoder()
encoder.fit(data['itemset'])
df = pd.DataFrame(data = encoder.transform(data['itemset']), columns=encoder.columns_, dtype=int)
apriori(df, min_support=0.3, use_colnames=True)