87 lines
4.9 KiB
Plaintext
87 lines
4.9 KiB
Plaintext
## Conditioning and stability
|
|
|
|
In Part III of @doi:10.1137/1.9781611977165 we find language of numerical analysis useful to formally describe the zero-finding problem. Key concepts are errors, conditioning, and stability.
|
|
|
|
Abstractly a *problem* is a mapping, or function, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm is a generalization of the absolute value.
|
|
|
|
|
|
> A *well-conditioned* problem is one with the property that all small perturbations of $x$ lead to only small changes in $f(x)$.
|
|
|
|
This sense of "small" is quantified through a *condition number*.
|
|
|
|
If we let $\delta_x$ be a small perturbation of $x$ then $\delta_f = f(x + \delta_x) - f(x)$.
|
|
|
|
The *forward error* is $||f(x+\delta_x) - f(x)||$, the *relative forward error* is $||f(x+\delta_x) - f(x)||/ ||f(x)||$.
|
|
|
|
The *backward error* is $||\delta_x||$, the *relative backward error* is $||\delta_x|| / ||x||$.
|
|
|
|
The *absolute condition number* $\hat{\kappa}$ is worst case of this ratio $||\delta_f||/ ||\delta_x||$ as the perturbation size shrinks to $0$.
|
|
The relative condition number $\kappa$ divides $||\delta_f||$ by $||f(x)||$ and $||\delta_x||$ by $||x||$ before taking the ratio.
|
|
|
|
|
|
A *problem* is a mathematical concept, an *algorithm* the computational version. Algorithms may differ for many reasons, such as floating point errors, tolerances, etc. We use notation $\tilde{f}$ to indicate the algorithm.
|
|
|
|
The absolute error in the algorithm is $||\tilde{f}(x) - f(x)||$, the relative error divides by $||f(x)||$. A good algorithm would have smaller relative errors.
|
|
|
|
An algorithm is called *stable* if
|
|
|
|
$$
|
|
\frac{||\tilde{f}(x) - f(\tilde{x})||}{||f(\tilde{x})||}
|
|
$$
|
|
|
|
is *small* for *some* $\tilde{x}$ relatively near $x$, $||\tilde{x}-x||/||x||$.
|
|
|
|
> "A *stable* algorithm gives nearly the right answer to nearly the right question."
|
|
|
|
(The answer it gives is $\tilde{f}(x)$, the nearly right question is $f(\tilde{x})$.)
|
|
|
|
A related concept is an algorithm $\tilde{f}$ for a problem $f$ is *backward stable* if for each $x \in X$,
|
|
|
|
$$
|
|
\tilde{f}(x) = f(\tilde{x})
|
|
$$
|
|
|
|
for some $\tilde{x}$ with $||\tilde{x} - x||/||x||$ is small.
|
|
|
|
> "A backward stable algorithm gives exactly the right answer to nearly the right question."
|
|
|
|
|
|
The concepts are related by Trefethen and Bao's Theorem 15.1 which says for a backward stable algorithm the relative error $||\tilde{f}(x) - f(x)||/||f(x)||$ is small in a manner proportional to the relative condition number.
|
|
|
|
Applying this to the zero-finding we follow doi:10.1137/1.9781611975086.
|
|
|
|
To be specific, the problem is finding a zero of $f$ starting at an initial point $x_0$. The data is $(f, x_0)$, the solution is $r$ a zero of $f$.
|
|
|
|
Take the algorithm as Newton's method. Any implementation must incorporate tolerances, so this is a computational approximation to the problem. The data is the same, but technically we use $\tilde{f}$ for the function, as any computation is dependent on machine implementations. The output is $\tilde{r}$ an *approximate* zero.
|
|
|
|
Suppose for sake of argument that $\tilde{f}(x) = f(x) + \epsilon$ and $r$ is a root of $f$ and $\tilde{r}$ is a root of $\tilde{f}$. Then by linearization:
|
|
|
|
\begin{align*}
|
|
0 &= \tilde{f}(\tilde r) \\
|
|
&= f(r + \delta) + \epsilon\\
|
|
&\approx f(r) + f'(r)\delta + \epsilon\\
|
|
&= 0 + f'(r)\delta + \epsilon
|
|
\end{align*}
|
|
|
|
Rearranging gives $||\delta/\epsilon|| \approx 1/||f'(r)||$ leading to:
|
|
|
|
> The absolute condition number is $\hat{\kappa}_r = |f'(r)|^{-1}$.
|
|
|
|
|
|
The error formula in Newton's method includes the derivative in the denominator, so we see large condition numbers are tied into larger errors.
|
|
|
|
Now consider $g(x) = f(x) - f(\tilde{r})$. Call $f(\tilde{r})$ the residual. We have $g$ is near $f$ if the residual is small. The algorithm will solve $(g, x_0)$ with $\tilde{r}$, so with a small residual an exact solution to an approximate question will be found. Driscoll and Braun state
|
|
|
|
> The backward error in a root estimate is equal to the residual.
|
|
|
|
|
|
Practically these two observations lead to
|
|
|
|
* If there is a large condition number, it may not be possible to find an approximate root near the real root.
|
|
|
|
* A tolerance in an algorithm should consider both the size of $x_{n} - x_{n-1}$ and the residual $f(x_n)$.
|
|
|
|
For the first observation, the example of Wilkinson's polynomial is often used where $f(x) = (x-1)\cdot(x-2)\cdot \cdots\cdot(x-20)$. When expanded this function has exactness issues of typical floating point values, the condition number is large and some of the roots found are quite different from the mathematical values.
|
|
|
|
The second observation helps explain why a problem like finding the zero of $f(x) = x * exp(x)$ using Newton's method starting at $2$ might return a value like $5.89\dots$. The residual is checked to be zero in a *relative* manner which would basically use a tolerance of `atol + abs(xn)*rtol`. Functions with asymptotes of $0$ will eventually be smaller than this value.
|