CalculusWithJuliaNotes.jl/quarto/derivatives/newtons_method.qmd

# Newton's method


{{< include ../_common_code.qmd >}}

This section uses these add-on packages:


```{julia}
using CalculusWithJulia
using Plots
using SymPy
using Roots
```


---


The Babylonian method is an algorithm to find an approximate value for $\sqrt{k}$. It was described by the first-century Greek mathematician Hero of [Alexandria](http://en.wikipedia.org/wiki/Babylonian_method).


The method starts with some initial guess, called $x_0$. It then applies a formula to produce an improved guess. This is repeated until the improved guess is accurate enough or it is clear the algorithm fails to work.


For the Babylonian method, the next guess, $x_{i+1}$, is derived from the current guess, $x_i$. In mathematical notation, this is the updating step:


$$
x_{i+1} = \frac{1}{2}(x_i + \frac{k}{x_i})
$$

We use this algorithm to approximate the square root of $2$, a value known to the Babylonians.


Start with $x$, then form $x/2 + 1/x$, from this again form $x/2 + 1/x$, repeat.


We represent this step using a function


```{julia}
babylon(x) = x/2 + 1/x
```

Let's look starting with $x = 2$ as a  rational number:


```{julia}
#| hold: true
x₁ = babylon(2//1)
x₁, x₁^2.0
```

Our estimate improved from something which squared to $4$ down to something which squares to $2.25.$ A big improvement, but there is still more to come. Had we done one more step:


```{julia}
x₂ = (babylon ∘ babylon)(2//1)
x₂, x₂^2.0
```

We now see accuracy until the third decimal point.


```{julia}
x₃ = (babylon ∘ babylon ∘ babylon)(2//1)
x₃, x₃^2.0
```

This is now accurate to the sixth decimal point.  That is about as far as we, or the Bablyonians, would want to go by hand. Using rational numbers quickly grows out of hand. The next step shows the explosion.


```{julia}
reduce((x,step) -> babylon(x), 1:4, init=2//1)
```

(In the above, we used `reduce` to repeat a function call $4$ times, as an alternative to the composition operation. In this section we show a few styles to do this repetition before introducing a packaged function.)


However, with the advent of floating point numbers, the method stays quite manageable:


```{julia}
#| hold: true
xₙ = reduce((x, step) -> babylon(x), 1:6, init=2.0)
xₙ, xₙ^2
```

We can see that the algorithm - to the precision offered by floating point numbers - has resulted in an answer `1.414213562373095`. This answer is an *approximation* to the actual answer. Approximation is necessary, as $\sqrt{2}$ is an irrational number and so can never be exactly represented in floating point. That being said, we can see that the value of $f(x)$ is accurate to the last decimal place, so our approximation is very close and is achieved in a few steps.


## Newton's generalization


Let $f(x) = x^3 - 2x -5$. The value of $2$ is almost a zero, but not quite, as $f(2) = -1$. We can check that there are no *rational* roots. Though there is a method to solve the cubic it may be difficult to compute and will not be as generally applicable as some algorithm like the Babylonian method to produce an approximate answer.


Is there some generalization to the Babylonian method?


We know that the tangent line is a good approximation to the function at the point. Looking at this graph gives a hint as to an algorithm:


```{julia}
#| hold: true
#| echo: false
f(x) = x^3 - 2x - 5
fp(x) = 3x^2 - 2
c = 2
p = plot(f, 1.75, 2.25, legend=false)
plot!(x->f(2) + fp(2)*(x-2))
plot!(zero)
scatter!(p, [c], [f(c)], color=:orange, markersize=3)
p
```

The tangent line and the function nearly agree near $2$. So much so, that the intersection point of the tangent line with the $x$ axis nearly hides the actual zero of $f(x)$ that is near $2.1$.


That is, it seems that the intersection of the tangent line and the $x$ axis should be an improved approximation for the zero of the function.


Let $x_0$ be $2$, and $x_1$ be the intersection point of the tangent line at $(x_0, f(x_0))$ with the $x$ axis. Then by the definition of the tangent line:


$$
f'(x_0) = \frac{\Delta y }{\Delta x} = \frac{f(x_0)}{x_0 - x_1}.
$$

This can be solved for $x_1$ to give $x_1 = x_0 - f(x_0)/f'(x_0)$. In general, if we had $x_i$ and used the intersection point of the tangent line to produce $x_{i+1}$ we would have Newton's method:


$$
x_{i+1} = x_i - \frac{f(x_i)}{f'(x_i)}.
$$

Using automatic derivatives, as brought in with the `CalculusWithJulia` package, we can implement this algorithm.


The algorithm above starts at $2$ and then becomes:


```{julia}
f(x) = x^3 - 2x - 5
x0 = 2.0
x1 = x0 - f(x0) / f'(x0)
```

We can see we are closer to a zero:


```{julia}
f(x0), f(x1)
```

Trying again, we have


```{julia}
x2 = x1 - f(x1)/ f'(x1)
x2, f(x2), f(x1)
```

And again:


```{julia}
x3 = x2 - f(x2)/ f'(x2)
x3, f(x3), f(x2)
```

```{julia}
x4 = x3 - f(x3)/ f'(x3)
x4, f(x4), f(x3)
```

We see now that $f(x_4)$ is within machine tolerance of $0$, so we call $x_4$ an *approximate zero* of $f(x)$.


> **Newton's method:** Let $x_0$ be an initial guess for a zero of $f(x)$. Iteratively define $x_{i+1}$ in terms of the just generated $x_i$ by:
>
> $$
> x_{i+1} = x_i - f(x_i) / f'(x_i).
> $$
>
> Then for reasonable functions and reasonable initial guesses, the sequence of points converges to a zero of $f$.


On the computer, we know that actual convergence will likely never occur, but accuracy to a certain tolerance can often be achieved.


In the example above, we kept track of the previous values. This is unnecessary if only the answer is sought. In that case, the update step could use the same variable. Here we use `reduce`:


```{julia}
#| hold: true
xₙ = reduce((x, step) -> x - f(x)/f'(x), 1:4, init=2)
xₙ, f(xₙ)
```

In practice, the algorithm is implemented not by repeating the update step a fixed number of times, rather by repeating the step until either we converge or it is clear we won't converge. For good guesses and most functions, convergence happens quickly.


:::{.callout-note}
## Note
Newton looked at this same example in 1699 (B.T. Polyak, *Newton's method and its use in optimization*, European Journal of Operational Research. 02/2007; 181(3):1086-1096.) though his technique was slightly different as he did not use the derivative, *per se*, but rather an approximation based on the fact that his function was a polynomial (though identical to the derivative). Raphson (1690) proposed the general form, hence the usual name of the Newton-Raphson method.

:::

#### Examples


##### Example: visualizing convergence


This  graphic demonstrates the method and the rapid convergence:


```{julia}
#| echo: false
function newtons_method_graph(n, f, a, b, c)

    xstars = [c]
    xs = [c]
    ys = [0.0]

    plt = plot(f, a, b, legend=false, size=fig_size)
    plot!(plt, [a, b], [0,0], color=:black)


    ts = range(a, stop=b, length=50)
    for i in 1:n
        x0 = xs[end]
        x1 = x0 - f(x0)/D(f)(x0)
        push!(xstars, x1)
            append!(xs, [x0, x1])
        append!(ys, [f(x0), 0])
    end
    plot!(plt, xs, ys, color=:orange)
    scatter!(plt, xstars, 0*xstars, color=:orange, markersize=5)
    plt
end
nothing
```

```{julia}
#| hold: true
#| echo: false
#| cache: true
### {{{newtons_method_example}}}

caption = """

Illustration of Newton's Method converging to a zero of a function.

"""
n = 6

fn, a, b, c = x->log(x), .15, 2, .2

anim = @animate for i=1:n
    newtons_method_graph(i-1, fn, a, b, c)
end

imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)

ImageFile(imgfile, caption)
```

---


This interactive graphic (built using [JSXGraph](https://jsxgraph.uni-bayreuth.de/wp/index.html)) allows the adjustment of the point `x0`, initially at $0.85$. Five iterations of Newton's method are illustrated. Different positions of `x0` clearly converge, others will not.


```{=html}
<div id="jsxgraph" style="width: 500px; height: 500px;"></div>
```

```{ojs}
//| echo: false
//| output: false

JXG = require("jsxgraph");

// newton's method

b = JXG.JSXGraph.initBoard('jsxgraph', {
    boundingbox: [-3,5,3,-5], axis:true
});


f = function(x) {return x*x*x*x*x - x - 1};
fp = function(x) { return 4*x*x*x*x - 1};
x0 = 0.85;

nm = function(x) { return x - f(x)/fp(x);};

l = b.create('point', [-1.5,0], {name:'', size:0});
r = b.create('point', [1.5,0], {name:'', size:0});
xaxis = b.create('line', [l,r])


P0 = b.create('glider', [x0,0,xaxis], {name:'x0'});
P0a = b.create('point', [function() {return P0.X();},
			     function() {return f(P0.X());}], {name:''});

P1 = b.create('point', [function() {return nm(P0.X());},
			    0], {name:''});
P1a = b.create('point', [function() {return P1.X();},
			     function() {return f(P1.X());}], {name:''});

P2 = b.create('point', [function() {return nm(P1.X());},
			    0], {name:''});
P2a = b.create('point', [function() {return P2.X();},
			     function() {return f(P2.X());}], {name:''});

P3 = b.create('point', [function() {return nm(P2.X());},
			    0], {name:''});
P3a = b.create('point', [function() {return P3.X();},
			     function() {return f(P3.X());}], {name:''});

P4 = b.create('point', [function() {return nm(P3.X());},
			    0], {name:''});
P4a = b.create('point', [function() {return P4.X();},
			     function() {return f(P4.X());}], {name:''});
P5 = b.create('point', [function() {return nm(P4.X());},
			    0], {name:'x5', strokeColor:'black'});


P0a.setAttribute({fixed:true});
P1.setAttribute({fixed:true});
P1a.setAttribute({fixed:true});
P2.setAttribute({fixed:true});
P2a.setAttribute({fixed:true});
P3.setAttribute({fixed:true});
P3a.setAttribute({fixed:true});
P4.setAttribute({fixed:true});
P4a.setAttribute({fixed:true});
P5.setAttribute({fixed:true});

sc = '#000000';
b.create('segment', [P0,P0a], {strokeColor:sc, strokeWidth:1});
b.create('segment', [P0a, P1], {strokeColor:sc, strokeWidth:1});
b.create('segment', [P1,P1a], {strokeColor:sc, strokeWidth:1});
b.create('segment', [P1a, P2], {strokeColor:sc, strokeWidth:1});
b.create('segment', [P2,P2a], {strokeColor:sc, strokeWidth:1});
b.create('segment', [P2a, P3], {strokeColor:sc, strokeWidth:1});
b.create('segment', [P3,P3a], {strokeColor:sc, strokeWidth:1});
b.create('segment', [P3a, P4], {strokeColor:sc, strokeWidth:1});
b.create('segment', [P4,P4a], {strokeColor:sc, strokeWidth:1});
b.create('segment', [P4a, P5], {strokeColor:sc, strokeWidth:1});

b.create('functiongraph', [f, -1.5, 1.5])

```

##### Example: numeric not algebraic


For the function $f(x) = \cos(x) - x$, we see that SymPy can not solve symbolically for a zero:


```{julia}
@syms x::real
solve(cos(x) - x, x)
```

We can find a numeric solution, even though there is no closed-form answer. Here we try Newton's method:


```{julia}
#| hold: true
f(x) = cos(x) - x
x = .5
x = x - f(x)/f'(x)  # 0.7552224171056364
x = x - f(x)/f'(x)  # 0.7391416661498792
x = x - f(x)/f'(x)  # 0.7390851339208068
x = x - f(x)/f'(x)  # 0.7390851332151607
x = x - f(x)/f'(x)
x, f(x)
```

To machine tolerance the answer is a zero, even though the exact answer is irrational and all finite floating point values can be represented as rational numbers.


##### Example


Use Newton's method to find the *largest* real solution to $e^x = x^6$.


A plot shows us roughly where the value lies:


```{julia}
#| hold: true
f(x) = exp(x)
g(x) = x^6
plot(f, 0, 25; label="f")
plot!(g; label="g")
```

Clearly by $20$ the two paths diverge. We know exponentials eventually grow faster than powers, and this is seen in the graph.


To use Newton's method to find the intersection point. Stop when the increment $f(x)/f'(x)$ is smaller than `1e-4`. We need to turn the solution to an equation into a value where a function is $0$. Just moving the terms to one side of the equals sign gives $e^x - x^6 = 0$, or the $x$ we seek is a solution to $h(x)=0$ with $h(x) = e^x - x^6$.


```{julia}
#| hold: true
#| term: true
h(x) = exp(x) - x^6
x = 20
for step in 1:10
    delta = h(x)/h'(x)
    x = x - delta
    @show step, x, delta
end
```

So it takes $8$ steps to get an increment that small and about `10` steps to get to full convergence.


##### Example division as multiplication


[Newton-Raphson Division](http://tinyurl.com/kjj9w92) is a means to divide by multiplying.


Why would you want to do that? Well, even for computers division is harder (read slower) than multiplying. The trick is that $p/q$ is simply $p \cdot (1/q)$, so finding a means to compute a reciprocal by multiplying will reduce division to multiplication.


Well suppose we have $q$, we could try to use Newton's method to find $1/q$, as it is a solution to $f(x) = x - 1/q$. The Newton update step simplifies to:


$$
x - f(x) / f'(x) \quad\text{or}\quad x - (x - 1/q)/ 1 = 1/q
$$

That doesn't really help, as Newton's method is just $x_{i+1} = 1/q$. That is, it just jumps to the answer, the one we want to compute by some other means!


Trying again, we simplify the update step for a related function: $f(x) = 1/x - q$ with $f'(x) = -1/x^2$ and then one step of the process is:


$$
x_{i+1} = x_i - (1/x_i - q)/(-1/x_i^2) = -qx^2_i + 2x_i.
$$

Now for $q$ in the interval $[1/2, 1]$ we want to get a *good* initial guess. Here is a claim. We can use $x_0=48/17 - 32/17 \cdot q$. Let's check graphically that this is a reasonable initial approximation to $1/q$:


```{julia}
#| hold: true

plot(q -> 1/q, 1/2, 1, label="1/q")
plot!(q -> 1/17 * (48 - 32q), label="linear approximation")
```

It can be shown that we have for any $q$ in $[1/2, 1]$ with initial guess $x_0 = 48/17 - 32/17\cdot q$ that Newton's method will converge to $16$ digits in no more than this many steps:


$$
\log_2(\frac{53 + 1}{\log_2(17)}).
$$

```{julia}
a = log2((53 + 1)/log2(17))
ceil(Integer, a)
```

That is $4$ steps suffices.


For $q = 0.80$, to find $1/q$ using the above we have


```{julia}
#| hold: true
q = 0.80
x = (48/17) - (32/17)*q
x = -q*x*x + 2*x
x = -q*x*x + 2*x
x = -q*x*x + 2*x
x = -q*x*x + 2*x
```

This method has basically $18$ multiplication and addition operations for one division, so it naively would seem slower, but timing this shows the method is competitive with a regular division.


## Wrapping in a function


In the previous examples, we saw fast convergence, guaranteed converge  in $4$ steps, and an example where $8$ steps were needed to get the requested level of approximation. Newton's method usually converges quickly, but may converge slowly, and may not converge at all. Automating the task to avoid repeatedly running the update step is a task best done by the computer.


The `while` loop is a good way to repeat commands until some condition is met. With this, we present a simple function implementing Newton's method, we iterate until the update step gets really small (the `atol`) or the convergence takes more than $50$ steps. (There are other, better choices that could be used to determine when the algorithm should stop, these are just easy to understand.)


```{julia}
function nm(f, fp, x0)
  atol = 1e-14
  ctr = 0
  delta = Inf
  while (abs(delta) > atol) && (ctr < 50)
    delta = f(x0)/fp(x0)
    x0 = x0 - delta
    ctr = ctr + 1
  end

  ctr < 50 ? x0 : NaN
end
```

##### Examples


  * Find a zero of $\sin(x)$ starting at $x_0=3$:


```{julia}
nm(sin, cos, 3)
```

This is an approximation for $\pi$, that historically found use, as the convergence is fast.


  * Find a solution to $x^5 =  5^x$ near $2$:


Writing a function to handle this, we have:


```{julia}
k(x) = x^5 - 5^x
```

We could find the derivative by hand, but use the automatic one instead:


```{julia}
alpha = nm(k, k', 2)
alpha, k(alpha)
```

### Functions in the Roots package


Typing in the `nm` function might be okay once, but would be tedious if it was needed each time. Besides, it isn't as robust to different inputs as possible. The `Roots` package provides a `Newton` method for `find_zero`.


To use a different method with `find_zero`, the calling pattern is `find_zero(f, x, M)` where `f` represent the function(s), `x` the initial point(s), and `M` the method. Here we have:


```{julia}
find_zero((sin, cos), 3, Roots.Newton())
```

Or, if a derivative is not specified, one can be computed using automatic differentiation:


```{julia}
#| hold: true
f(x) = sin(x)
find_zero((f, f'), 2, Roots.Newton())
```

The argument `verbose=true` will force a print out of a message summarizing the convergence and showing each step.


```{julia}
#| hold: true
f(x) = exp(x) - x^4
find_zero((f,f'), 8, Roots.Newton(); verbose=true)
```

##### Example: intersection of two graphs


Find the intersection point between $f(x) = \cos(x)$ and $g(x) = 5x$ near $0$.


We have Newton's method to solve for zeros of $f(x)$, i.e. when $f(x) = 0$. Here we want to solve for $x$ with $f(x) = g(x)$. To do so, we make a new function $h(x) = f(x) - g(x)$, that is $0$ when $f(x)$ equals $g(x)$:


```{julia}
#| hold: true
f(x) = cos(x)
g(x) = 5x
h(x) = f(x) - g(x)
x0 = find_zero((h,h'), 0, Roots.Newton())
x0, h(x0), f(x0), g(x0)
```

---


We redo the above using a *parameter* for the $5$, as there are some options on how it would be done. We let `f(x,p) = cos(x) - p*x`. Then we can use `Roots.Newton` by also defining a derivative:


```{julia}
#| hold: true
f(x,p) = cos(x) - p*x
fp(x,p) = -sin(x) - p
xn = find_zero((f,fp), pi/4, Roots.Newton(); p=5)
xn, f(xn, 5)
```

To use automatic differentiation is not straightforward, as we must hold the `p` fixed. For this, we introduce a closure that fixes `p` and differentiates in the `x` variable (called `u` below):


```{julia}
#| hold: true
f(x,p) = cos(x) - p*x
fp(x,p) = (u -> f(u,p))'(x)
xn = find_zero((f,fp), pi/4, Roots.Newton(); p=5)
```

##### Example: Finding  $c$ in Rolle's Theorem


The function $r(x) = \sqrt{1 - \cos(x^2)^2}$ has a zero at $0$ and one at $a$ near $1.77$.


```{julia}
r(x) = sqrt(1 - cos(x^2)^2)
plot(r, 0, 1.77)
```

As $f(x)$ is differentiable between $0$ and $a$, Rolle's theorem says there will be value where the derivative is $0$. Find that value.


This value will be a zero of the derivative. A graph shows it should be near $1.2$, so we use that as a starting value to get the answer:


```{julia}
find_zero((r',r''), 1.2, Roots.Newton())
```

## Convergence rates


Newton's method is famously known to have "quadratic convergence." What does this mean? Let the error in the $i$th step be called $e_i = x_i - \alpha$. Then Newton's method satisfies a bound of the type:


$$
\lvert e_{i+1} \rvert \leq M_i \cdot e_i^2.
$$

If $M$ were just a constant and we suppose $e_0 = 10^{-1}$ then $e_1$ would be less than $M 10^{-2}$ and $e_2$ less than $M^2 10^{-4}$, $e_3$ less than $M^3 10^{-8}$ and $e_4$ less than $M^4 10^{-16}$ which for $M=1$ is basically the machine precision when values are near $1$. That is for some problems, with a good initial guess it will take around $4$ or so steps to converge.


To identify $M$, let $\alpha$ be the zero of $f$ to be approximated. Assume


  * The function $f$ has at continuous second derivative in a neighborhood of $\alpha$.
  * The value $f'(\alpha)$ is *non-zero* in the neighborhood of $\alpha$.


Then this linearization holds at each $x_i$ in the above neighborhood:


$$
f(x) = f(x_i) + f'(x_i) \cdot (x - x_i) + \frac{1}{2} f''(\xi) \cdot (x-x_i)^2.
$$

The value $\xi$ is from the mean value theorem and is between $x$ and $x_i$.


Dividing by $f'(x_i)$ and setting $x=\alpha$ (as $f(\alpha)=0$) leaves


$$
0 = \frac{f(x_i)}{f'(x_i)} + (\alpha-x_i) + \frac{1}{2}\cdot \frac{f''(\xi)}{f'(x_i)} \cdot (\alpha-x_i)^2.
$$

For this value, we have


\begin{align*}
x_{i+1} - \alpha
&= \left(x_i  - \frac{f(x_i)}{f'(x_i)}\right) - \alpha\\
&= \left(x_i - \alpha \right) - \frac{f(x_i)}{f'(x_i)}\\
&= (x_i - \alpha) + \left(
(\alpha - x_i) + \frac{1}{2}\frac{f''(\xi) \cdot(\alpha - x_i)^2}{f'(x_i)}
\right)\\
&=  \frac{1}{2}\frac{f''(\xi)}{f'(x_i)} \cdot(x_i - \alpha)^2.
\end{align*}


That is


$$
e_{i+1} = \frac{1}{2}\frac{f''(\xi)}{f'(x_i)} e_i^2.
$$

This convergence to $\alpha$ will be quadratic *if*:


  * The initial guess $x_0$ is not too far from $\alpha$, so $e_0$ is managed.
  * The derivative at $\alpha$ is not too close to $0$, hence, by continuity $f'(x_i)$ is not too close to $0$. (As it appears in the denominator). That is, the function can't be too flat, which should make sense, as then the tangent line is nearly parallel to the $x$ axis and would intersect far away.
  * The function $f$ has a continuous second derivative at $\alpha$.
  * The second derivative is not too big (in absolute value) near $\alpha$. A large second derivative means the function is very concave, which means it is "turning" a lot. In this case, the function turns away from the tangent line quickly, so the tangent line's zero is not necessarily a good approximation to the actual zero, $\alpha$.


:::{.callout-note}
## Note
The basic tradeoff: methods like Newton's are faster than the bisection method in terms of function calls, but are not guaranteed to converge, as the bisection method is.

:::

What can go wrong when one of these isn't the case is illustrated next:


### Poor initial step


```{julia}
#| hold: true
#| echo: false
#| cache: true
### {{{newtons_method_poor_x0}}}
caption = """

Illustration of Newton's Method converging to a zero of a function,
but slowly as the initial guess, is very poor, and not close to the
zero. The algorithm does converge in this illustration, but not quickly and not to the nearest root from
the initial guess.

"""

fn, a, b, c = x ->  sin(x) - x/4, -15, 20, 2pi

n = 20
anim = @animate for i=1:n
    newtons_method_graph(i-1, fn, a, b, c)
end

imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 2)

ImageFile(imgfile, caption)
```

```{julia}
#| hold: true
#| echo: false
#| cache: true
# {{{newtons_method_flat}}}
caption = L"""

Illustration of Newton's method failing to converge as for some $x_i$,
$f'(x_i)$ is too close to ``0``. In this instance after a few steps, the
algorithm just cycles around the local minimum near $0.66$. The values
of $x_i$ repeat in the pattern: $1.0002, 0.7503, -0.0833, 1.0002,
\dots$. This is also an illustration of a poor initial guess. If there
is a local minimum or maximum between the guess and the zero, such
cycles can occur.

"""

fn, a, b, c = x -> x^5 - x + 1, -1.5, 1.4, 0.0

n=7
anim = @animate for i=1:n
    newtons_method_graph(i-1, fn, a, b, c)
end
imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)

ImageFile(imgfile, caption)
```

### The second derivative is too big


```{julia}
#| hold: true
#| echo: false
#| cache: true
# {{{newtons_method_cycle}}}

fn, a, b, c, = x -> abs(x)^(0.49),  -2, 2, 1.0
caption = L"""

Illustration of Newton's Method not converging. Here the second
derivative is too big near the zero - it blows up near $0$ - and the
convergence does not occur. Rather the iterates increase in their
distance from the zero.

"""

n=10
anim = @animate for i=1:n
    newtons_method_graph(i-1, fn, a, b, c)
end

imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 2)

ImageFile(imgfile, caption)
```

### The tangent line at some xᵢ is flat


```{julia}
#| hold: true
#| echo: false
#| cache: true
# {{{newtons_method_wilkinson}}}

caption = L"""

The function $f(x) = x^{20} - 1$ has two bad behaviours for Newton's
method: for $x < 1$ the derivative is nearly $0$ and for $x>1$ the
second derivative is very big. In this illustration, we have an
initial guess of $x_0=8/9$. As the tangent line is fairly flat, the
next approximation is far away, $x_1 = 1.313\dots$. As this guess is
is much bigger than $1$, the ratio $f(x)/f'(x) \approx
x^{20}/(20x^{19}) = x/20$, so $x_i - f(x_i)/f'(x_i) \approx (19/20)x_i$
yielding slow, linear convergence until $f''(x_i)$ is moderate. For
this function, starting at $x_0=8/9$ takes 11 steps, at $x_0=7/8$
takes 13 steps, at $x_0=3/4$ takes ``55`` steps, and at $x_0=1/2$ it takes
$204$ steps.

"""


fn,a,b,c = x -> x^20 - 1,  .7, 1.4, 8/9
n = 10

anim = @animate for i=1:n
    newtons_method_graph(i-1, fn, a, b, c)
end
imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)

ImageFile(imgfile, caption)
```

###### Example


Suppose $\alpha$ is a simple zero for $f(x)$.  (The value $\alpha$ is a zero of multiplicity $k$ if $f(x) = (x-\alpha)^kg(x)$ where $g(\alpha)$ is not zero. A simple zero has multiplicity $1$. If $f'(\alpha) \neq 0$ and the second derivative exists, then a zero $\alpha$ will be simple.)  Around $\alpha$, quadratic convergence should apply. However, consider the function $g(x) = f(x)^k$ for some integer $k \geq 2$. Then $\alpha$ is still a zero, but the derivative of $g$ at $\alpha$ is zero, so the tangent line is basically flat. This will slow the convergence up. We can see that the update step $g(x)/g'(x)$ becomes $(1/k) f(x)/f'(x)$, so an extra factor is introduced.


The calculation that produces the quadratic convergence now becomes:


$$
\begin{align*}
x_{i+1} - \alpha &= (x_i - \alpha) - \frac{1}{k}(x_i-\alpha - \frac{f''(\xi)}{2f'(x_i)}(x_i-\alpha)^2)
&= \frac{k-1}{k} (x_i-\alpha) + \frac{f''(\xi)}{2kf'(x_i)}(x_i-\alpha)^2.
\end{align*}
$$

As $k > 1$, the $(x_i - \alpha)$ term dominates, and we see the convergence is linear with $\lvert e_{i+1}\rvert \approx (k-1)/k \lvert e_i\rvert$.


## Questions


###### Question


Look at this graph with $x_0$ marked with a point:


```{julia}
#| hold: true
#| echo: false
import SpecialFunctions: airyai
p = plot(airyai, -3.3, 0, legend=false);
plot!(p, zero, -3.3, 0);
scatter!(p, [-2.8], [0], color=:orange, markersize=5);
annotate!(p, [(-2.8, 0.2, "x₀")])
p
```

If one step of Newton's method was used, what would be the value of $x_1$?


```{julia}
#| hold: true
#| echo: false
choices = ["``-2.224``", "``-2.80``",  "``-0.020``", "``0.355``"]
answ = 1
radioq(choices, answ, keep_order=true)
```

###### Question


Look at this graph of some increasing, concave up $f(x)$ with initial point $x_0$ marked. Let $\alpha$ be the zero.


```{julia}
#| hold: true
#| echo: false
p = plot(x -> x^2 - 2, .75, 2.2, legend=false);
plot!(p, zero,                   color=:green);
scatter!(p, [1],[0],             color=:orange, markersize=5);
annotate!(p, [(1,.25, "x₀"), (sqrt(2), .2, "α")]);
p
```

What can be said about $x_1$?


```{julia}
#| hold: true
#| echo: false
choices = [
L"It must be $x_1 > \alpha$",
L"It must be $x_1 < x_0$",
L"It must be $x_0 < x_1 < \alpha$"
]
answ = 1
radioq(choices, answ)
```

---


Look at this graph of some increasing, concave up $f(x)$ with initial point $x_0$ marked. Let $\alpha$ be the zero.


```{julia}
#| hold: true
#| echo: false
p = plot(x -> x^2 - 2, .75, 2.2, legend=false);
plot!(p, zero, .75, 2.2, color=:green);
scatter!(p, [2],[0], color=:orange, markersize=5);
annotate!(p, [(2,.25, "x₀"), (sqrt(2), .2, "α")]);
p
```

What can be said about $x_1$?


```{julia}
#| hold: true
#| echo: false
choices = [
L"It must be $x_1 < \alpha$",
L"It must be $x_1 > x_0$",
L"It must be $\alpha < x_1 < x_0$"
]
answ = 3
radioq(choices, answ)
```

---


Suppose $f(x)$ is increasing and concave up. From the tangent line representation: $f(x) = f(c) + f'(c)\cdot(x-c) + f''(\xi)/2 \cdot(x-c)^2$, explain why it must be that the graph of $f(x)$ lies on or *above* the tangent line.


```{julia}
#| hold: true
#| echo: false
choices = [
L"As $f''(\xi)/2 \cdot(x-c)^2$ is non-negative, we must have $f(x) - (f(c) + f'(c)\cdot(x-c)) \geq 0$.",
L"As $f''(\xi) < 0$ it must be that $f(x) - (f(c) + f'(c)\cdot(x-c)) \geq 0$.",
L"This isn't true. The function $f(x) = x^3$ at $x=0$ provides a counterexample"
]
answ = 1
radioq(choices, answ)
```

This question can be used to give a proof for the previous two questions, which can be answered by considering the graphs alone. Combined, they say that if a function is  increasing and concave up and $\alpha$ is a zero, then if $x_0 < \alpha$ it will be $x_1 > \alpha$, and for any $x_i > \alpha$, $\alpha \le x_{i+1} \le x_i$, so the sequence in Newton's method is decreasing and bounded below; conditions for which it is guaranteed mathematically there will be convergence.


###### Question


Let $f(x) = x^2 - 3^x$. This has derivative $2x - 3^x \cdot \log(3)$. Starting with $x_0=0$, what does Newton's method converge on?


```{julia}
#| hold: true
#| echo: false
f(x) = x^2 - 3^x;
fp(x) = 2x - 3^x*log(3);
val = Roots.newton(f, fp, 0);
numericq(val, 1e-14)
```

###### Question


Let $f(x) = \exp(x) - x^4$. There are 3 zeros for this function. Which one does Newton's method converge to when $x_0=2$?


```{julia}
#| hold: true
#| echo: false
f(x) = exp(x) - x^4;
fp(x) = exp(x) - 4x^3;
xstar= Roots.newton(f, fp, 2);
numericq(xstar, 1e-1)
```

###### Question


Let $f(x) = \exp(x) - x^4$. As mentioned, there are 3 zeros for this function. Which one does Newton's method converge to when $x_0=8$?


```{julia}
#| hold: true
#| echo: false
f(x) = exp(x) - x^4;
fp(x) = exp(x) - 4x^3;
xstar = Roots.newton(f, fp, 8);
numericq(xstar, 1e-1)
```

###### Question


Let $f(x) = \sin(x) - \cos(4\cdot x)$.


Starting at $\pi/8$, solve for the root returned by Newton's method


```{julia}
#| hold: true
#| echo: false
k1=4
f(x)  = sin(x) - cos(k1*x);
fp(x) = cos(x) + k1*sin(k1*x);
val = Roots.newton(f, fp, pi/(2k1));
numericq(val)
```

###### Question


Using Newton's method find a root to $f(x) = \cos(x) - x^3$ starting at $x_0 = 1/2$.


```{julia}
#| hold: true
#| echo: false
f(x) = cos(x) - x^3
val = Roots.newton(f,f', 1/2)
numericq(val)
```

###### Question


Use Newton's method to find a root of $f(x) = x^5 + x -1$. Make a quick graph to find a reasonable starting point.


```{julia}
#| hold: true
#| echo: false
f(x) = x^5 + x - 1
val = Roots.newton(f,f', -1)
numericq(val)
```

###### Question


```{julia}
#| hold: true
#| echo: false
##Consider the following illustration of Newton's method:
caption = """
Illustration of Newton's method. Moving the point ``x_0`` shows different behaviours of the algorithm.
"""
## JSXGraph(:derivatives, "newtons-method.js", caption)
nothing
```

For the following graph, graphically consider the algorithm for a few different starting points.


```{julia}
#| hold: true
#| echo: false
# placeholder until CWJ bumps up a version?
plot(x -> x^5 - x - 1, -1, 2)
```

If $x_0$ is $1$ what occurs?


```{julia}
#| echo: false
nm_choices = [
"The algorithm converges very quickly. A good initial point was chosen.",
"The algorithm converges, but slowly. The initial point is close enough to the answer to ensure decreasing errors.",
"The algorithm fails to converge, as it cycles about"
]
radioq(nm_choices, 1, keep_order=true)
```

When $x_0 = 1.0$ the following values are true for $f$:


```{julia}
#| echo: false
ff(x) = x^5 - x - 1
α = find_zero(ff, 1)
function error_terms(x)
    (e₀=x-α, f₀′= ff'(x), f̄₀′′=ff''(α), ē₁ = 1/2*ff''(α)/ff'(x)*(x-α)^2)
end
error_terms(1.0)
```

Where the values `f̄₀′′` and `ē₁` are worst-case estimates when $\xi$ is between $x_0$ and the zero.


Does the magnitude of the error increase or decrease in the first step?


```{julia}
#| hold: true
#| echo: false
radioq(["Appears to increase", "It decreases"],2,keep_order=true)
```

If $x_0$ is set near $0.50$ what happens?


```{julia}
#| hold: true
#| echo: false
radioq(nm_choices, 3, keep_order=true)
```

When $x_0 = 0.5$ the following values are true for $f$:


```{julia}
#| hold: true
#| echo: false
error_terms(0.5)
```

Where the values `f̄₀′′` and `ē₁` are worst-case estimates when $\xi$ is between $x_0$ and the zero.


Does the magnitude of the error increase or decrease in the first step?


```{julia}
#| hold: true
#| echo: false
radioq(["Appears to increase", "It decreases"],1,keep_order=true)
```

If $x_0$ is set near $0.75$ what happens?


```{julia}
#| hold: true
#| echo: false
radioq(nm_choices, 2, keep_order=true)
```

###### Question


Will Newton's method converge for the function $f(x) = x^5 - x + 1$ starting at $x=1$?


```{julia}
#| hold: true
#| echo: false
choices = [
"Yes",
"No. The initial guess is not close enough",
"No. The second derivative is too big",
L"No. The first derivative gets too close to $0$ for one of the $x_i$"]
answ = 2
radioq(choices, answ, keep_order=true)
```

###### Question


Will Newton's method converge for the function $f(x) = 4x^5 - x + 1$ starting at $x=1$?


```{julia}
#| hold: true
#| echo: false
choices = [
"Yes",
"No. The initial guess is not close enough",
"No. The second derivative is too big, or does not exist",
L"No. The first derivative gets too close to $0$ for one of the $x_i$"]
answ = 2
radioq(choices, answ, keep_order=true)
```

###### Question


Will Newton's method converge for the function $f(x) = x^{10} - 2x^3 - x + 1$ starting from $0.25$?


```{julia}
#| hold: true
#| echo: false
choices = [
"Yes",
"No. The initial guess is not close enough",
"No. The second derivative is too big, or does not exist",
L"No. The first derivative gets too close to $0$ for one of the $x_i$"]
answ = 1
radioq(choices, answ, keep_order=true)
```

###### Question


Will Newton's method converge for $f(x) = 20x/(100 x^2 + 1)$ starting at $0.1$?


```{julia}
#| hold: true
#| echo: false
choices = [
"Yes",
"No. The initial guess is not close enough",
"No. The second derivative is too big, or does not exist",
L"No. The first derivative gets too close to $0$ for one of the $x_i$"]
answ = 4
radioq(choices, answ, keep_order=true)
```

###### Question


Will Newton's method converge to a zero for $f(x) = \sqrt{(1 - x^2)^2}$ starting at $1.0$?


```{julia}
#| hold: true
#| echo: false
choices = [
"Yes",
"No. The initial guess is not close enough",
"No. The second derivative is too big, or does not exist",
L"No. The first derivative gets too close to $0$ for one of the $x_i$"]
answ = 3
radioq(choices, answ, keep_order=true)
```

###### Question


Use Newton's method to find a root of $f(x) = 4x^4 - 5x^3 + 4x^2 -20x -6$ starting at $x_0 = 0$.


```{julia}
#| hold: true
#| echo: false
f(x) = 4x^4 - 5x^3 + 4x^2 -20x -6
val = find_zero((f,f') , 0, Roots.Newton())
numericq(val)
```

###### Question


Use Newton's method to find a zero of $f(x) = \sin(x) - x/2$ that is *bigger* than $0$.


```{julia}
#| hold: true
#| echo: false
f(x) = sin(x) - x/2
val = find_zero((f,f'), 2, Roots.Newton())
numericq(val)
```

###### Question


The Newton baffler (defined below) is so named, as Newton's method will fail to find the root for most starting points.


```{julia}
function newton_baffler(x)
    if ( x - 0.0 ) < -0.25
        0.75 * ( x - 0 ) - 0.3125
    elseif  ( x - 0 ) < 0.25
        2.0 * ( x - 0 )
    else
        0.75 * ( x - 0 ) + 0.3125
    end
end
```

Will Newton's method find the zero at $0.0$ starting at $1$?


```{julia}
#| hold: true
#| echo: false
yesnoq("no")
```

Considering this plot:


```{julia}
#| hold: true
plot(newton_baffler, -1.1, 1.1)
```

Starting with $x_0=1$, you can see why Newton's method will fail. Why?


```{julia}
#| hold: true
#| echo: false
choices = [
L"It doesn't fail, it converges to $0$",
L"The tangent lines for $|x| > 0.25$ intersect at $x$ values with $|x| > 0.25$",
L"The first derivative is $0$ at $1$"
]
answ = 2
radioq(choices, answ)
```

This function does not have a small first derivative; or a large second derivative; and the bump up can be made as close to the origin as desired, so the starting point can be very close to the zero. However, even though the conditions of the error term are satisfied, the error term does not apply, as $f$ is not continuously differentiable.


###### Question


Let $f(x) = \sin(x) - x/4$. Starting at  $x_0 = 2\pi$ Newton's method will converge to a value, but it will take many steps. Using the argument `verbose=true` for `find_zero`, how many steps does it take:


```{julia}
#| hold: true
#| echo: false
f(x) = sin(x) - x/4
x₀ = 2π
tracks = Roots.Tracks()
find_zero((f,f'), x₀, Roots.Newton(); tracks=tracks)
val = tracks.steps
numericq(val, 2)
```

What is the zero that is found?


```{julia}
#| hold: true
#| echo: false
val = Roots.newton(f,f', 2pi)
numericq(val)
```

Is this the closest zero to the starting point, $x_0$?


```{julia}
#| hold: true
#| echo: false
yesnoq("no")
```

###### Question


Quadratic convergence of Newton's method only applies to *simple* roots. For example, we can see (using the `verbose=true` argument to the `Roots` package's `newton` method, that it only takes $4$ steps to find a zero to $f(x) = \cos(x) - x$ starting at $x_0 = 1$. But it takes many more steps to find the same zero for $f(x) = (\cos(x) - x)^2$.


How many?


```{julia}
#| hold: true
#| echo: false
val = 24
numericq(val, 2)
```

###### Question: Implicit equations


The equation $x^2 + x\cdot y + y^2 = 1$ is a rotated ellipse.


```{julia}
#| hold: true
#| echo: false

f(x,y) = x^2 + x * y + y^2 - 1
implicit_plot(f, xlims=(-2,2), ylims=(-2,2), legend=false)
```

Can we find which point on its graph has the largest $y$ value?


This would be straightforward *if* we could write $y(x) = \dots$, for then we would simply find the critical points and investiate. But we can't so easily solve for $y$ interms of $x$. However, we can use Newton's method to do so:


```{julia}
function findy(x)
  fn = y -> (x^2 + x*y + y^2) - 1
  fp = y -> (x + 2y)
  find_zero((fn, fp), sqrt(1 - x^2), Roots.Newton())
end
```

For a *fixed* x, this solves for $y$ in the equation: $F(y) = x^2 + x \cdot y + y^2 - 1 = 0$. It should be that $(x,y)$ is a solution:


```{julia}
#| hold: true
x = .75
y = findy(x)
x^2 + x*y + y^2  ## is this 1?
```

So we have a means to find $y(x)$, but it is implicit.


Using `find_zero`, find the value $x$ which maximizes `y` by finding a zero of `y'`. Use this to find the point $(x,y)$ with largest $y$ value.


```{julia}
#| hold: true
#| echo: false
xstar = find_zero(findy', 0.5)
ystar = findy(xstar)
choices = ["``(-0.57735, 1.15470)``",
           "``(0,0)``",
           "``(0, -0.57735)``",
           "``(0.57735, 0.57735)``"]
answ = 1
radioq(choices, answ)
```

(Using automatic derivatives works for values identified with `find_zero` *as long as* the initial point has its type the same as that of `x`.)


###### Question


In the last problem we used an *approximate* derivative (forward difference) in place of the derivative. This can introduce an error due to the approximation. Would Newton's method still converge if the derivative in the algorithm were replaced with an approximate derivative? In general, this can often be done *but* the convergence can be *slower* and the sensitivity to a poor initial guess even greater.


Three common approximations are given by the difference quotient for a fixed $h$: $f'(x_i) \approx (f(x_i+h)-f(x_i))/h$; the secant line approximation: $f'(x_i) \approx (f(x_i) - f(x_{i-1})) / (x_i - x_{i-1})$; and the Steffensen approximation $f'(x_i) \approx (f(x_i + f(x_i)) - f(x_i)) / f(x_i)$ (using $h=f(x_i)$).


Let's revisit the $4$-step convergence of Newton's method to the root of $f(x) = 1/x - q$ when $q=0.8$. Will these methods be as fast?


Let's define the above approximations for a given `f`:


```{julia}
q₀ = 0.8
fq(x) = 1/x - q₀
secant_approx(x0,x1) = (fq(x1) - fq(x0)) / (x1 - x0)
diffq_approx(x0, h) = secant_approx(x0, x0+h)
steff_approx(x0) = diffq_approx(x0, fq(x0))
```

Then using the difference quotient would look like:


```{julia}
#| hold: true
Δ = 1e-6
x1 = 48/17 - 32/17 * q₀
x1 = x1 - fq(x1) / diffq_approx(x1, Δ)   # |x1 - xstar| = 0.003660953777242959
x1 = x1 - fq(x1) / diffq_approx(x1, Δ)   # |x1 - xstar| = 1.0719137523373945e-5; etc
```

The Steffensen method would look like:


```{julia}
#| hold: true
x1 = 48/17 - 32/17 * q₀
x1 = x1 - fq(x1) / steff_approx(x1)   # |x1 - xstar| = 0.0014382105783488086
x1 = x1 - fq(x1) / steff_approx(x1)   # |x1 - xstar| = 5.944935954627084e-7; etc.
```

And the secant method like:


```{julia}
#| hold: true
Δ = 1e-6
x1 = 48/17 - 32/17 * q₀
x0 = x1 - Δ # we need two initial values
x0, x1 = x1, x1 - fq(x1) / secant_approx(x0, x1)   # |x1 - xstar| = 0.00366084553494872
x0, x1 = x1, x1 - fq(x1) / secant_approx(x0, x1)   # |x1 - xstar| = 0.00019811634659716582; etc.
```

Repeat each of the above algorithms until `abs(x1 - 1.25)` is `0` (which will happen for this problem, though not in general). Record the steps.


  * Does the difference quotient need *more* than $4$ steps?


```{julia}
#| hold: true
#| echo: false
yesnoq(false)
```

  * Does the secant method need *more* than $4$ steps?


```{julia}
#| hold: true
#| echo: false
yesnoq(true)
```

  * Does the Steffensen method need *more* than 4 steps?


```{julia}
#| hold: true
#| echo: false
yesnoq(false)
```

All methods work quickly with this well-behaved problem. In general the convergence rates are slightly different for each, with the Steffensen method matching Newton's method and the difference quotient method being slower in general. All can be more sensitive to the initial guess.