[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/course/cs224n/kbhsu_cs224n_apr162024.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "SU-CS224N APR162024" source: https://www.jemoka.com/posts/kbhsu_cs224n_apr162024/ date: 2024-04-16 --- Why do Neural Nets Work Suddenly? Regularization see regularization We want to be able to manipulate our parameters so that our models learn better—for instance, we want our weights to be low: \begin{equation} J_{L2}(\theta) = J_{reg}(\theta) + \lambda \sum_{k}^{} \theta^{2}_{k} \end{equation} or good ‘ol dropout—“fetaure dependent regularization” Motivation classic view: regularization works to prevent overfitting when we have a lot of features NEW view with big models: regularization produces generalizable models when parameter count is big enough Dropout Dropout: prevents feature co-adaptation => results in good regularization Language Model See Language Model