$$\hat{y}_i = \mathbf{x}_i^\top \boldsymbol{\beta}$$
$$\ell_i(\boldsymbol{\beta}) = \frac{1}{2}\left(y_i - \mathbf{x}_i^\top \boldsymbol{\beta}\right)^2$$
$$\nabla_{\boldsymbol{\beta}} \ell_i(\boldsymbol{\beta}) = - \mathbf{x}_i \left( y_i - \mathbf{x}_i^\top \boldsymbol{\beta} \right)$$
$$\boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} + \eta \, \mathbf{x}_i \left(y_i - \mathbf{x}_i^\top \boldsymbol{\beta}^{(t)}\right)$$
Với learning rate eta = 0.1
$$\beta_0 = 1, \beta_1 = 0.5, \beta_2 = -0.5$$
| x1 | x2 | y |
|---|---|---|
| 1 | 1 | 5 |
| 2 | 0 | 4 |
| 0 | 2 | 6 |
| 3 | 1 | 10 |
| 1 | 3 | 11 |