CalculusWithJuliaNotes.jl/quarto/derivatives/condition.qmd

## Conditioning and stability

In Part III of @doi:10.1137/1.9781611977165 we find language of numerical analysis useful to  formally describe the zero-finding problem. Key concepts are errors, conditioning, and stability.

Abstractly a *problem* is a mapping, or function, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm is a generalization of the absolute value.


> A *well-conditioned* problem is one with the property that all small perturbations of $x$ lead to only small changes in $f(x)$.

This sense of "small" is quantified through a *condition number*.

If we let $\delta_x$ be a small perturbation of $x$ then $\delta_f = f(x + \delta_x) - f(x)$.

The *forward error* is $\lVert\delta_f\rVert = \lVert f(x+\delta_x) - f(x)\rVert$, the *relative forward error* is  $\lVert\delta_f\rVert/\lVert f\rVert = \lVert f(x+\delta_x) - f(x)\rVert/ \lVert f(x)\rVert$.

The *backward error* is $\lVert\delta_x\rVert$, the *relative backward error* is $\lVert\delta_x\rVert / \lVert x\rVert$.

 The *absolute condition number* $\hat{\kappa}$ is worst case of this ratio $\lVert\delta_f\rVert/ \lVert\delta_x\rVert$ as the perturbation size shrinks to $0$.
The relative condition number $\kappa$ divides $\lVert\delta_f\rVert$ by $\lVert f(x)\rVert$ and $\lVert\delta_x\rVert$ by $\lVert x\rVert$ before taking the ratio.


A *problem* is a mathematical concept, an *algorithm* the computational version. Algorithms may differ for many reasons, such as floating point errors, tolerances, etc. We use notation $\tilde{f}$ to indicate the algorithm.

The absolute error in the algorithm is $\lVert\tilde{f}(x) - f(x)\rVert$, the relative error divides by $\lVert f(x)\rVert$. A good algorithm would have smaller relative errors.

An algorithm is called *stable* if

$$
\frac{\lVert\tilde{f}(x) - f(\tilde{x})\rVert}{\lVert f(\tilde{x})\rVert}
$$

is *small* for *some* $\tilde{x}$ relatively near $x$, $\lVert\tilde{x}-x\rVert/\lVert x\rVert$.

> "A *stable* algorithm gives nearly the right answer to nearly the right question."

(The answer it gives is $\tilde{f}(x)$, the nearly right question is $f(\tilde{x})$.)

A related concept is an algorithm $\tilde{f}$ for a problem $f$ is *backward stable* if for each $x \in X$,

$$
\tilde{f}(x) = f(\tilde{x})
$$

	for some $\tilde{x}$ with $\lVert\tilde{x} - x\rVert/\lVert x\rVert$ is small.

> "A backward stable algorithm gives exactly the right answer to nearly the right question."


The concepts are related by Trefethen and Bao's Theorem 15.1 which says  for a backward stable algorithm the relative error $\lVert\tilde{f}(x) - f(x)\rVert/\lVert f(x)\rVert$ is small in a manner proportional to the relative condition number.

Applying this to the zero-finding we follow @doi:10.1137/1.9781611975086.

To be specific, the problem is finding a zero of $f$ starting at an initial point $x_0$. The data is $(f, x_0)$, the solution is $r$ a zero of $f$.

Take the algorithm as Newton's method. Any implementation must incorporate tolerances, so this is a computational approximation to the problem. The data is the same, but technically we use $\tilde{f}$ for the function, as any computation is dependent on machine implementations. The output is $\tilde{r}$ an *approximate* zero.

Suppose for sake of argument that $\tilde{f}(x) = f(x) + \epsilon$ and $r$ is a root of $f$ and $\tilde{r}$ is a root of $\tilde{f}$. Then by linearization:

\begin{align*}
0 &= \tilde{f}(\tilde r) \\
  &= f(r + \delta) + \epsilon\\
  &\approx f(r) + f'(r)\delta + \epsilon\\
  &= 0 + f'(r)\delta + \epsilon
\end{align*}

Rearranging gives $\lVert\delta/\epsilon\rVert \approx 1/\lVert f'(r)\rVert$ leading to:

> The absolute condition number is $\hat{\kappa}_r = |f'(r)|^{-1}$.


The error formula in Newton's method includes the derivative in the denominator, so we see large condition numbers are tied into larger errors.

Now consider $g(x) = f(x) - f(\tilde{r})$. Call $f(\tilde{r})$ the residual. We have $g$ is near $f$ if the residual is small. The algorithm will solve $(g, x_0)$ with $\tilde{r}$, so with a small residual an exact solution to an approximate question will be found. Driscoll and Braun state

> The backward error in a root estimate is equal to the residual.


Practically these two observations lead to

* If there is a large condition number, it may not be possible to find an approximate root near the real root.

* A tolerance in an algorithm should consider both the size of $x_{n} - x_{n-1}$ and the residual $f(x_n)$.

For the first observation, the example of Wilkinson's polynomial is often used where $f(x) = (x-1)\cdot(x-2)\cdot \cdots\cdot(x-20)$. When expanded this function has exactness issues of typical floating point values, the condition number is large and some of the roots found are quite different from the mathematical values.

The second observation helps explain why a problem like finding the zero of $f(x) = x \cdot \exp(x)$ using Newton's method starting at $2$ might return a value like $5.89\dots$. The residual is checked to be zero in a *relative* manner which would basically use a tolerance of `atol + abs(xn)*rtol`. Functions with asymptotes of $0$ will eventually be smaller than this value.