Decision Tree
Gini
$$\text{Gini}(S) = 1 - \sum_{i=1}^{K} p_i^2$$
Gini Split
$$\text{Gini}_{\text{split}} =\sum_{j=1}^{m}\frac{|S_j|}{|S|}\text{Gini}(S_j)$$
Best split
$$\text{Best Split}=\arg\min \sum_{j=1}^{m} \frac{|S_j|}{|S|} \text{Gini}(S_j)$$
Entropy
$$H(S) = - \sum_{i=1}^{K} p_i \log_2 p_i$$
Coditional Entropy
$$H(S \mid X) = \sum_{j=1}^{m} \frac{|S_j|}{|S|}H(S_j)$$
Information Gain
$$\text{Gain}(S, X) = H(S) - \sum_{j=1}^{m} \frac{|S_j|}{|S|} H(S_j)$$
| Tên |
Tóc |
Ch.Cao |
Cân Nặng |
Dùng kem? |
Kết quả |
| Sarah |
Vàng |
T.Bình |
Nhẹ |
Không |
Cháy |
| Dana |
Vàng |
Cao |
T.Bình |
Có |
Không |
| Alex |
Nâu |
Thấp |
T.Bình |
Có |
Không |
| Annie |
Vàng |
Thấp |
T.Bình |
Không |
Cháy |
| Emilie |
Đỏ |
T.Bình |
Nặng |
Không |
Cháy |
| Peter |
Nâu |
Cao |
Nặng |
Không |
Không |
| John |
Nâu |
T.Bình |
Nặng |
Không |
Không |
| Kartie |
Vàng |
Thấp |
Nhẹ |
Có |
Không |
| Predict 1 | Vàng | Cao | Nhẹ | Không | ? |
| Predict 2 | Nâu | Thấp | Nặng | Có | ? |
| Predict 3 | Đỏ | T.Bình | Nhẹ | Không | ? |
Buy Cars
| ID |
Age |
Car Type |
Class |
| 1 |
23 |
Family |
High |
| 2 |
17 |
Sports |
High |
| 3 |
43 |
Sports |
High |
| 4 |
68 |
Family |
Low |
| 5 |
32 |
Truck |
Low |
| 6 |
20 |
Family |
High |
Prediction
| Ví dụ |
Age |
Car Type |
Dự đoán Class |
| 1 |
25 |
Family |
? |
| 2 |
70 |
Family |
? |
| 3 |
30 |
Truck |
? |
| 4 |
40 |
Sports |
? |
Buy Computer
| RID |
age |
income |
student |
credit_rating |
Class: buys_computer |
| 1 | youth | high | no | fair | no |
| 2 | youth | high | no | excellent | no |
| 3 | middle | high | no | fair | yes |
| 4 | senior | medium | no | fair | yes |
| 5 | senior | low | yes | fair | yes |
| 6 | senior | low | yes | excellent | no |
| 7 | middle | low | yes | excellent | yes |
| 8 | youth | medium | no | fair | no |
| 9 | youth | low | yes | fair | yes |
| 10 | senior | medium | yes | fair | yes |
| 11 | youth | medium | yes | excellent | yes |
| 12 | middle | medium | no | excellent | yes |
| 13 | middle | high | yes | fair | yes |
| 14 | senior | medium | no | excellent | no |
Predict
| Ví dụ |
age |
income |
student |
credit_rating |
Dự đoán (buys_computer) |
| 1 |
youth |
high |
yes |
fair |
? |
| 2 |
youth |
low |
no |
excellent |
? |
| 3 |
middle |
medium |
no |
fair |
? |
| 4 |
senior |
low |
yes |
excellent |
? |
Predict the risk class of a car driver based on the following attributes:
| Attribute |
Description |
Values |
| time |
time since obtaining a drivers license in years |
{1-2, 2-7, >7} |
| gender |
gender |
{male, female} |
| area |
residential area |
{urban, rural} |
| risk |
the risk class |
{low, high} |
Manually classified training examples:
| ID |
time |
gender |
area |
risk |
| 1 |
1-2 |
m |
urban |
low |
| 2 |
2-7 |
m |
rural |
high |
| 3 |
>7 |
f |
rural |
low |
| 4 |
1-2 |
f |
rural |
high |
| 5 |
>7 |
m |
rural |
high |
| 6 |
1-2 |
m |
rural |
high |
| 7 |
2-7 |
f |
urban |
low |
| 8 |
2-7 |
m |
urban |
low |
Loan Approval
| Age | Job | House | Credit | Loan Approved |
| Young | False | No | Fair | No |
| Young | False | No | Good | No |
| Young | True | No | Good | Yes |
| Young | True | Yes | Fair | Yes |
| Young | False | No | Fair | No |
| Middle | False | No | Fair | No |
| Middle | False | No | Good | No |
| Middle | True | Yes | Good | Yes |
| Middle | False | Yes | Excellent | Yes |
| Middle | False | Yes | Excellent | Yes |
| Old | False | Yes | Excellent | Yes |
| Old | False | Yes | Good | Yes |
| Old | True | No | Good | Yes |
| Old | True | No | Excellent | Yes |
| Old | False | No | Fair | No |
Predict
| Age | Job | House | Credit | Loan Approved |
| Young | False | No | Good | ? |
Play Tennis
| Outlook | Temp | Humidity | Windy | Play |
| Sunny | Hot | High | False | No |
| Sunny | Hot | High | True | No |
| Overcast | Hot | High | False | Yes |
| Rainy | Mild | High | False | Yes |
| Rainy | Cool | Normal | False | Yes |
| Rainy | Cool | Normal | True | No |
| Overcast | Cool | Normal | True | Yes |
| Sunny | Mild | High | False | No |
| Sunny | Cool | Normal | False | Yes |
| Rainy | Mild | Normal | False | Yes |
| Sunny | Mild | Normal | True | Yes |
| Overcast | Mild | High | True | Yes |
| Overcast | Hot | Normal | False | Yes |
| Rainy | Mild | High | True | No |
Predict
| Outlook | Temp | Humidity | Windy | Play |
| Sunny | Hot | Normal | True | ? |
Predict buy House
| City Size | Avg. Income | Local Investors | LOHAS Awareness | Decision |
| Big | High | Yes | High | Yes |
| Medium | Med | No | Med | No |
| Small | Low | Yes | Low | No |
| Big | High | No | High | Yes |
| Small | Med | Yes | High | No |
| Med | High | Yes | Med | Yes |
| Med | Med | Yes | Med | No |
| Big | Med | No | Med | No |
| Med | High | Yes | Low | No |
| Small | High | No | High | Yes |
| Small | Med | No | High | No |
| Med | Heigh | No | Med | No |
Predict
| City Size | Avg. Income | Local Investors | LOHAS Awareness | Decision |
| Med | Med | No | Med | ? |