87 lines
5.1 KiB
Plaintext
87 lines
5.1 KiB
Plaintext
## Conditioning and stability
|
|
|
|
In Part III of @doi:10.1137/1.9781611977165 we find language of numerical analysis useful to formally describe the zero-finding problem. Key concepts are errors, conditioning, and stability.
|
|
|
|
Abstractly a *problem* is a mapping, or function, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm is a generalization of the absolute value.
|
|
|
|
|
|
> A *well-conditioned* problem is one with the property that all small perturbations of $x$ lead to only small changes in $f(x)$.
|
|
|
|
This sense of "small" is quantified through a *condition number*.
|
|
|
|
If we let $\delta_x$ be a small perturbation of $x$ then $\delta_f = f(x + \delta_x) - f(x)$.
|
|
|
|
The *forward error* is $\lVert\delta_f\rVert = \lVert f(x+\delta_x) - f(x)\rVert$, the *relative forward error* is $\lVert\delta_f\rVert/\lVert f\rVert = \lVert f(x+\delta_x) - f(x)\rVert/ \lVert f(x)\rVert$.
|
|
|
|
The *backward error* is $\lVert\delta_x\rVert$, the *relative backward error* is $\lVert\delta_x\rVert / \lVert x\rVert$.
|
|
|
|
The *absolute condition number* $\hat{\kappa}$ is worst case of this ratio $\lVert\delta_f\rVert/ \lVert\delta_x\rVert$ as the perturbation size shrinks to $0$.
|
|
The relative condition number $\kappa$ divides $\lVert\delta_f\rVert$ by $\lVert f(x)\rVert$ and $\lVert\delta_x\rVert$ by $\lVert x\rVert$ before taking the ratio.
|
|
|
|
|
|
A *problem* is a mathematical concept, an *algorithm* the computational version. Algorithms may differ for many reasons, such as floating point errors, tolerances, etc. We use notation $\tilde{f}$ to indicate the algorithm.
|
|
|
|
The absolute error in the algorithm is $\lVert\tilde{f}(x) - f(x)\rVert$, the relative error divides by $\lVert f(x)\rVert$. A good algorithm would have smaller relative errors.
|
|
|
|
An algorithm is called *stable* if
|
|
|
|
$$
|
|
\frac{\lVert\tilde{f}(x) - f(\tilde{x})\rVert}{\lVert f(\tilde{x})\rVert}
|
|
$$
|
|
|
|
is *small* for *some* $\tilde{x}$ relatively near $x$, $\lVert\tilde{x}-x\rVert/\lVert x\rVert$.
|
|
|
|
> "A *stable* algorithm gives nearly the right answer to nearly the right question."
|
|
|
|
(The answer it gives is $\tilde{f}(x)$, the nearly right question is $f(\tilde{x})$.)
|
|
|
|
A related concept is an algorithm $\tilde{f}$ for a problem $f$ is *backward stable* if for each $x \in X$,
|
|
|
|
$$
|
|
\tilde{f}(x) = f(\tilde{x})
|
|
$$
|
|
|
|
for some $\tilde{x}$ with $\lVert\tilde{x} - x\rVert/\lVert x\rVert$ is small.
|
|
|
|
> "A backward stable algorithm gives exactly the right answer to nearly the right question."
|
|
|
|
|
|
The concepts are related by Trefethen and Bao's Theorem 15.1 which says for a backward stable algorithm the relative error $\lVert\tilde{f}(x) - f(x)\rVert/\lVert f(x)\rVert$ is small in a manner proportional to the relative condition number.
|
|
|
|
Applying this to the zero-finding we follow @doi:10.1137/1.9781611975086.
|
|
|
|
To be specific, the problem is finding a zero of $f$ starting at an initial point $x_0$. The data is $(f, x_0)$, the solution is $r$ a zero of $f$.
|
|
|
|
Take the algorithm as Newton's method. Any implementation must incorporate tolerances, so this is a computational approximation to the problem. The data is the same, but technically we use $\tilde{f}$ for the function, as any computation is dependent on machine implementations. The output is $\tilde{r}$ an *approximate* zero.
|
|
|
|
Suppose for sake of argument that $\tilde{f}(x) = f(x) + \epsilon$ and $r$ is a root of $f$ and $\tilde{r}$ is a root of $\tilde{f}$. Then by linearization:
|
|
|
|
\begin{align*}
|
|
0 &= \tilde{f}(\tilde r) \\
|
|
&= f(r + \delta) + \epsilon\\
|
|
&\approx f(r) + f'(r)\delta + \epsilon\\
|
|
&= 0 + f'(r)\delta + \epsilon
|
|
\end{align*}
|
|
|
|
Rearranging gives $\lVert\delta/\epsilon\rVert \approx 1/\lVert f'(r)\rVert$ leading to:
|
|
|
|
> The absolute condition number is $\hat{\kappa}_r = |f'(r)|^{-1}$.
|
|
|
|
|
|
The error formula in Newton's method includes the derivative in the denominator, so we see large condition numbers are tied into larger errors.
|
|
|
|
Now consider $g(x) = f(x) - f(\tilde{r})$. Call $f(\tilde{r})$ the residual. We have $g$ is near $f$ if the residual is small. The algorithm will solve $(g, x_0)$ with $\tilde{r}$, so with a small residual an exact solution to an approximate question will be found. Driscoll and Braun state
|
|
|
|
> The backward error in a root estimate is equal to the residual.
|
|
|
|
|
|
Practically these two observations lead to
|
|
|
|
* If there is a large condition number, it may not be possible to find an approximate root near the real root.
|
|
|
|
* A tolerance in an algorithm should consider both the size of $x_{n} - x_{n-1}$ and the residual $f(x_n)$.
|
|
|
|
For the first observation, the example of Wilkinson's polynomial is often used where $f(x) = (x-1)\cdot(x-2)\cdot \cdots\cdot(x-20)$. When expanded this function has exactness issues of typical floating point values, the condition number is large and some of the roots found are quite different from the mathematical values.
|
|
|
|
The second observation helps explain why a problem like finding the zero of $f(x) = x \cdot \exp(x)$ using Newton's method starting at $2$ might return a value like $5.89\dots$. The residual is checked to be zero in a *relative* manner which would basically use a tolerance of `atol + abs(xn)*rtol`. Functions with asymptotes of $0$ will eventually be smaller than this value.
|