Suggest edit — Newton's Method

Title

Name

Note

---
title: "Newton's Method"
source: https://www.jemoka.com/posts/kbhnewton_s_method/
---

constituents requirements A Newton step is:
\begin{align} \Delta x_{nt} = -\nabla^{2}f\qty(x)^{-1}\nabla f\qty(x) \end{align}
additional information Newton&rsquo;s method is affine invariant!
convergence Then number of steps until convergence for Newton&rsquo;s method relates to the third derivative because its step changes as a function of how the second derivative changes.
Number of iterations until \(f\qty(x) - p^{*} \leq \epsilon\) is bounded above by:
\begin{align} \frac{f\qty(x^{(0)}) - p^{*}}{\gamma} + \log \log \qty(\frac{\epsilon_{0}}{\epsilon}) \end{align}
\(\gamma\), \(\epsilon_{0}\) that depends on the strong convexity constant and the Lipschitz Constant etc. You basically never knows the constants so give up. oh also you need to know \(p^{*}\).
Newton&rsquo;s decrement \begin{align} \lambda \qty(x) = \qty(\nabla f\qty(x)^{T} \nabla^{2}f\qty(x)^{-1} \nabla f\qty(x))^{\frac{1}{2}} \end{align}
which is a measurement between the distance between \(x\) and \(x^{*}\).
implementing Newton&rsquo;s Method Two basic steps:
form a Hessian solve it. We can evaluate derivatives and solve the Newtonian system:
\begin{align} H\Delta x = -g \end{align}
where \(H = \nabla^{2} f\qty(x), g = \nabla f\qty(x)\). We naively would solve with Cholesky factorization:
\(H = LL\) \(\Delta x_{nt} = -L^{-T} L^{-1} g\) \(\lambda \qty(x) = \norm{L^{-1}g}_{2}\) we can thus exploit the structures as needed.
205L Notation \begin{equation} f(x) \approx f(x_{t-1}) + (x-x_{t-1}) f&rsquo;(x_{t-1}) + \frac{(x-x_{t-1})^{2}}{2} f&rsquo;&rsquo;(x_{t-1}) \end{equation}
Taking a derivative with respect to this, we obtain:
\begin{equation} f&rsquo;(x_{t-1}) + (x-x_{t-1}) f&rsquo;&rsquo;(x_{t-1}) \end{equation}
Solving the update equation for zero, we obtain that:
\begin{equation} x = x_{t-1} - \frac{f&rsquo;(x_{t-1})}{f&rsquo;&rsquo;(x_{t-1})} \end{equation}
This converges quadratically!! to solve for minima/maxima. For solving minima/maxima for gardients:
\begin{equation} x_{t} = x_{t-1} - \qty(\bold{H}_{g})^{-1}\nabla J \end{equation}
for Hessian \(H\) and gradient \(g\).
For finding ZEROS instead of minima:
\begin{equation} x = x_{t-1} - \frac{f(x_{t-1})}{f&rsquo;(x_{t-1})} \end{equation}
graphical intuition We want to drop down a tangent line from the standing point until we hit \(0\), and then go to that point on the function and draw a tangent line again.
Namely, we write:
\begin{equation} f&rsquo;\qty(x^{(j)}) = \frac{f\qty(\qty(x^{(j)}))}{\delta} \end{equation}
for small \(\delta\), Meaning, \(\delta = \frac{f\qty(x^{(j)})}{f&rsquo;\qty(x^{(j)})}\), which is the gap between each step. Then we update just by this ratio \(x = x - \delta\)
Failure Case If the function is near an inflection point (i.e. with bad quadratic approximation), you may converge at a point which doesn&rsquo;t satisfy SONC (i.e. you will get an inflection but not a minima).