[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/course/cs229/kbhsu_cs229_sep242025.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "SU-CS229 SEP242025" source: https://www.jemoka.com/posts/kbhsu_cs229_sep242025/ date: 2025-09-24 --- Supervise learning! Some Notational Conventions \(n\): number of training examples \(m\): number of features \(x\): input feature(s) \(y\): output*/*target feature \(\theta\): parameters \(h_{\theta}\qty(x)\): the predictor function And so, a tuple \(\qty(x,y)\) is a particular training example. We will use the parentheses notation to denote samples, so \(\qty(x^{(i)}, y^{(i)})\) as the ith example of training. We typically use \(h\qty(x)\) as the predictor, parameters are \(\theta_{j}\). New Concepts Linear Regression least-squares error gradient descent gradient descent for least-squares error variants summing over dataset: batch gradient descent pick one sample and run it: stochastic gradient descent pick some samples and run them: mini-batch gradient descenmini-bach gradient descet a primer on Vector Calculus trace Normal Equation