[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/concept/kbhleast_squares_error.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "least-squares error" source: https://www.jemoka.com/posts/kbhleast_squares_error/ --- requirements \(h\qty(x)\) the predictor function \(x,y\), the samples of data definition \begin{equation} J\qty(\theta) = \frac{1}{2} \sum_{i=1}^{n}\qty(h_{\theta }\qty(x^{(i)}) - y^{(i)})^{2} \end{equation} see also example: gradient descent for least-squares error. additional information “why the 1/2”? Because when you take \(\nabla J\qty(\theta)\) you end up with the \(\frac{1}{2}\) and the \(2\) canceling out. probabilistic intuition for least-squares error in linear regression Assume that our dataset \(\qty(x^{(i)}, y^{(i)}) \sim D\) has the following property: “the true \(y\) value is just our model’s output, plus some error.” Meaning: \begin{equation} y^{(i)} = \theta^{\top} x^{(i)} + \varepsilon^{(i)} \end{equation} Assume too now that \(\varepsilon^{(i)} \sim \mathcal{N}\qty(0, \sigma^{2})\) for all \(i\), that the error is normally distributed. Recall the PDF of the normal distribution: \begin{equation} P\qty(\varepsilon^{(i)}) = \frac{1}{\sigma\sqrt{2\pi}} \exp \qty( \frac{- \qty(\epsilon^{(i)})^{2}}{2\sigma^{2}}) \end{equation} Plugging in our definition for \(\varepsilon\) here: \begin{equation} P\qty(y^{(i)} | x^{(i)}, \theta) = \frac{1}{\sigma\sqrt{2\pi}} \exp \qty( \frac{- \qty(y^{(i)}- \theta^{T}x^{(i)})^{2}}{2\sigma^{2}}) \end{equation} If we now assume the entire dataset is IID, we can then write: \begin{align} P\qty(y | x, \theta) &= \prod_{i=1}^{n} P\qty(y^{(i)} | x^{(i)}, \theta) \\ &= \prod_{i=1}^{n} \frac{1}{\sigma\sqrt{2\pi}} \exp \qty( \frac{- \qty(y^{(i)}- \theta^{T}x^{(i)})^{2}}{2\sigma^{2}}) \end{align} What we want to pick \(\theta\) is to perform MLE—indeed we want the model that maximizes the likelihood of seeing our real data \(y\). Meaning, we desire: \begin{equation} \theta = \arg\max_{\theta} P\qty(y | x,\theta) \end{equation} Let’s do it! First let’s write the thing we want to maximize as a function of \(\theta\) \begin{equation} L\qty(\theta) = \frac{1}{\sigma\sqrt{2\pi}} \exp \qty( \frac{- \qty(y^{(i)}- \theta^{T}x^{(i)})^{2}}{2\sigma^{2}}) \end{equation} recall log is monotonic, so \begin{align} \arg\max_{\theta} L\qty(\theta) &= \arg\max_{\theta} \log \qty(L\qty(\theta)) \\ &= \arg\max_{\theta} \log \prod_{i=1}^{n}\frac{1}{\sigma\sqrt{2\pi}} \exp \qty(\dots) \\ &= \arg\max_{\theta} n \log \frac{1}{\sigma\sqrt{2\pi}} + \sum_{i=1}^{n} \frac{-\qty(y^{(i)}- \theta^{\top}x^{(i)})^{2}}{2\sigma^{2}} \end{align} We can throw away the left term (since its just a constant, and the objective function of the right is just the least-squares error formula, with \(\sigma=1\) (i.e. it doesn’t matter since we are just trying to maximize)! Yay!