AI

Multiple Linear Regression

Mô hình hồi quy tuyến tính đa biến

$$ y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_p x_{ip} + \varepsilon_i $$

Dạng vector ma trận

$$ \mathbf{y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon}$$

Ước lượng OLS

$$ \hat{\boldsymbol{\beta}} = (X^\top X)^{-1} X^\top \mathbf{y} $$

1. Model

$$ y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_p x_{ip} + \varepsilon_i $$

2. Vector Form

$$ \mathbf{y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon} $$

3. Matrix X

$$ X = \begin{bmatrix} 1 & x_{11} & x_{12} & \cdots & x_{1p} \\\\ 1 & x_{21} & x_{22} & \cdots & x_{2p} \\\\ \vdots & \vdots & \vdots & \ddots & \vdots \\\\ 1 & x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} $$

4. OLS estimate

$$ \hat{\boldsymbol{\beta}} = (X^\top X)^{-1} X^\top \mathbf{y} $$

5. Prediction

$$ \hat{\mathbf{y}} = X \hat{\boldsymbol{\beta}} $$

6. Residuals

$$ \mathbf{e} = \mathbf{y} - \hat{\mathbf{y}} $$

7. Error variance

$$ \hat{\sigma}^2 = \frac{1}{n-p-1} \, \mathbf{e}^\top \mathbf{e} $$

8. Var(beta)

$$ \mathrm{Var}(\hat{\boldsymbol{\beta}}) = \hat{\sigma}^2 (X^\top X)^{-1} $$

9. Standard error

$$ SE(\hat{\beta}_j) = \sqrt{ \left[ \hat{\sigma}^2 (X^\top X)^{-1} \right]_{jj} } $$

10. t-test

$$ t_j = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)} $$

11. Degrees of freedom

$$ df = n - p - 1$$

12. 95% CI

$$ \hat{\beta}_j \pm t_{0.975,\, n-p-1} \cdot SE(\hat{\beta}_j) $$

13. R-squared

$$ R^2 = 1 - \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2} {\sum_{i=1}^n (y_i - \bar{y})^2} $$

14. Adjusted R2

$$ R_{\mathrm{adj}}^2 = 1 - \left( \frac{n-1}{n-p-1} \right)(1 - R^2)$$

Sai số chuẩn của mô hình (Residual Standard Error / SER / RSE)

$$ RSE = \sqrt{\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n - p - 1}}$$

$$ SSE = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

$$ \hat{y}_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}$$

Example 1

No. X1 X2 Y
0321037.9
119942.2
213547.3
313547.5
45551.5
57348.2
634740.3
720646.7
830118.8
917325.8
df = pd.DataFrame([
    [32 ,10 ,37.9],
    [19 ,9 ,42.2],
    [13 ,5 ,47.3],
    [13 ,5 ,47.5],
    [5 ,5 ,51.5],
    [7 ,3 ,48.2],
    [34 ,7 ,40.3],
    [20 ,6 ,46.7],
    [30 ,1 ,18.8],
    [17 ,3 ,25.8],
], columns=['x1', 'x2', 'y'])
X = df[['x1', 'x2']]
y = df['y']
lg = LinearRegression()
lg.fit(X, y)
y_pred = lg.predict(X)
print('Predict', y_pred)
print('R^2 score', lg.score(X, y))

Example 2

y X1 X2
140 60 22
155 62 25
159 67 24
179 70 20
192 71 15
200 72 14
212 75 14
215 78 11