[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/deep_learning.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "deep learning" type: concept related: [Stochastic Gradient Descent, Neural Network, Supervised Learning] source: https://www.jemoka.com/posts/kbhdeep_learning/ confidence: high status: active --- supervised learning with non-linear models. Motivation Previously, our learning method was linear in the parameters \(\theta\) (i.e. we can have non-linear \(x\), but our \(\theta\) is always linear). Today: with deep learning we can have non-linearity with both \(\theta\) and \(x\). constituents We have \(\qty {\qty(x^{(i)}, y^{(i)})}_{i=1}^{n}\) the dataset Our loss \(J^{(i)}\qty(\theta) = \qty(y^{(i)} - h_{\theta}\qty(x^{(i)}))^{2}\) Our overall cost: \(J\qty(\theta) = \frac{1}{n} \sum_{i=1}^{n} J^{(i)}\qty(\theta)\) Optimization: \(\min_{\theta} J\qty(\theta)\) Optimization step: \(\theta = \theta - \alpha \nabla_{\theta} J\qty(\theta)\) Hyperparameters: Learning rate: \(\alpha\) Batch size \(B\) Iterations: \(n_{\text{iter}}\) stochastic gradient descent (where we randomly sample a dataset point, etc.) or batch gradient descent (where we scale learning rate by batch size and comput e abatch) neural network requirements additional information Background Notation: \(x\) is the input, \(h\) is the hidden layers, and \(\hat{y}\) is the prediction. We call each weight, at each layer, from \(x_{i}\) to \(h_{j}\), \(\theta_{i,j}^{(h)}\). At every neuron on each layer, we calculate: \begin{equation} h_{j} = \sigma\qty[\sum_{i}^{} x_{i} \theta_{i,j}^{(h)}] \end{equation} \begin{equation} \hat{y} = \sigma\qty[\sum_{i}^{} h_{i}\theta_{i}^{(y)}] \end{equation} note! we often