AI

K-Means Clustering

1. Euclidian Distaince

$$d(\mathbf{A}, \mathbf{B}) = \sqrt{\sum_{i=1}^{n} (A_i - B_i)^2}$$

$$d\big((x_1, x_2), (y_1, y_2)\big) = \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2}$$

2. Algorithm K-Means

Step 1: Objective function

$$\min_{\{C_k\}_{k=1}^K} \sum_{k=1}^{K} \sum_{x_i \in C_k} \left\| x_i - \mu_k \right\|^2$$

Step 2: Assignment step

$$c_i = \arg\min_{k \in \{1,\ldots,K\}}\left\| x_i - \mu_k \right\|^2$$

Step 3: Update

$$\mu_k = \frac{1}{|C_k|}\sum_{x_i \in C_k} x_i$$

3. Validation

Inertia

$$\text{Inertia} = \sum_{k=1}^{K} \sum_{\mathbf{x}_i \in C_k} \left\lVert \mathbf{x}_i - \boldsymbol{\mu}_k \right\rVert^2$$

Example

Dự đoán thuộc cluster (cụm) nào?

$$x_1 = 2, x_2 = 4$$

$$x_1 = 5, x_2 = 7$$

Index x1 x2
0 2 10
1 2 5
2 8 4
3 5 8
4 7 5
5 6 4
6 1 2
7 4 9