Illustration of the Taylor polynomial of degree $k$, $T_k(x)$, at $c=0$ and its graph overlayed on that of the function $1 - \cos(x)$.
"""
ImageFile(imgfile, caption)
```
## The secant line and the tangent line
We approach this general problem **much** more indirectly than is needed. We introducing notations that are attributed to Newton and proceed from there. By leveraging `SymPy` we avoid tedious computations and *hopefully* gain some insight.
Suppose ``f(x)`` is a function which is defined in a neighborhood of
$c$ and has as many continuous derivatives as we care to take at $c$.
We have two related formulas:
* The *secant line* connecting $(c, f(c))$ and $(c+h, f(c+h))$ for a
That is, ``f(x) = f(c) + f'(c)(x-c) + f''(\xi)/2\cdot(x-c)^2``, or ``f(x)-tl(x)`` is as described.)
The secant line also has an interpretation that will generalize - it is the smallest order polynomial that goes through, or *interpolates*, the points $(c,f(c))$ and $(c+h, f(c+h))$. This is obvious from the construction - as this is how the slope is derived - but from the formula itself requires showing $tl(c) = f(c)$ and $tl(c+h) = f(c+h)$. The former is straightforward, as $(c-c) = 0$, so clearly $tl(c) = f(c)$. The latter requires a bit of algebra.
We have:
> The best *linear* approximation at a point ``c`` is related to the *linear* polynomial interpolating the points ``c`` and ``c+h`` as ``h`` goes to ``0``.
This is the relationship we seek to generalize through our round about approach below:
> The best approximation at a point ``c`` by a polynomial of degree ``n`` or less is related to the polynomial interpolating through the points ``c, c+h, \dots, c+nh`` as ``h`` goes to ``0``.
As in the linear case, there is flexibility in the exact points chosen for the interpolation.
----
Now, we take a small detour to define some notation. Instead of
writing our two points as $c$ and $c+h,$ we use $x_0$ and
$x_1$. For any set of points $x_0, x_1, \dots, x_n$,
define the **divided differences** of $f$ inductively, as follows:
In the following, by adding a `getindex` method, we enable the `[]` notation of Newton to work with symbolic functions, like `u()` defined below, which is used in place of ``f``:
We add the next term to our previous polynomial and simplify
```julia;
p₂ = p₁ + u[c, c+h, c+2h] * (x-c) * (x-(c+h))
simplify(p₂)
```
We can check that this interpolates the three points. Notice that at
$x_0=c$ and $x_1=c+h$, the last term, $f[x_0, x_1,
x_2]\cdot(x-x_0)(x-x_1)$, vanishes, so we already have the polynomial
interpolating there. Only the
value $x_2=c+2h$ remains to be checked:
```julia;
p₂(x => c+2h) - u(c+2h)
```
Hmm, doesn't seem correct - that was supposed to be $0$. The issue isn't the math, it is that SymPy needs to be encouraged to simplify:
```julia;
simplify(p₂(x => c+2h) - u(c+2h))
```
By contrast, at the point $x=c+3h$ we have no guarantee of interpolation, and indeed don't, as this expression is non always zero:
```julia;
simplify(p₂(x => c+3h) - u(c+3h))
```
Interpolating polynomials are of interest in their own right, but for now we want to use them as motivation for the best polynomial approximation of a certain degree for a function. Motivated by how the secant line leads to the tangent line, we note that coefficients of the quadratic interpolating polynomial above have limits as $h$ goes to $0$, leaving this polynomial:
In this sense, the above quadratic polynomial, called the Taylor Polynomial of degree 2, is the best *quadratic* approximation to $f$, as the difference goes to $0$ at a rate of ``(x-c)^3``.
The graphs of the secant line and approximating parabola for $h=1/4$ are similar:
```julia; hold=true
f(x) = cos(x)
a, b = -pi/2, pi/2
c = 0
h = 1/4
x0, x1, x2 = c-h, c, c+h
f0 = divided_differences(f, x0)
fd = divided_differences(f, x0, x1)
fdd = divided_differences(f, x0, x1, x2)
p = plot(f, a, b, color=:blue, linewidth=5, legend=false)
plot!(p, x -> f0 + fd*(x-x0), a, b, color=:green, alpha=0.25, linewidth=5);
\frac{1}{2} m R^2 (\frac{d\theta}{dt})^2 + mgR \cdot (1 - \cos(\theta)).
```
The value of $1-\cos(\theta)$ inhibits further work which would be possible were there an easier formula there. In fact, we could try the excellent approximation $1 - \theta^2/2$ from the quadratic approximation. Then we have:
```math
K \approx \frac{1}{2} m R^2 (\frac{d\theta}{dt})^2 + mgR \cdot (1 - \theta^2/2).
```
Assuming equality and differentiating in $t$ gives by the chain rule:
and taking $x_i = c + i\cdot h$, for a given $n$, we have in the limit as $h > 0$ goes to zero that coefficients of this polynomial converge to the coefficients of the *Taylor Polynomial of degree n*:
A Taylor polynomial of degree ``n`` consists of ``n+1`` terms and an error term. The "Taylor series" is an *infinite* collection of terms, the first ``n+1`` matching the Taylor polynomial of degree ``n``. The fact that series are *infinite* means care must be taken when even talking about their existence, unlike a Tyalor polynomial, which is just a polynomial and exists as long as a sufficient number of derivatives are available.
We define a function to compute Taylor polynomials from a function. The following returns a function, not a symbolic object, using `D`, from `CalculusWithJulia`, which is based on `ForwardDiff.derivative`, to find higher-order derivatives:
```julia;
function taylor_poly(f, c=0, n=2)
x -> f(c) + sum(D(f, i)(c) * (x-c)^i / factorial(i) for i in 1:n)
end
```
With a function, we can compare values.
For example, here we see the difference between the Taylor polynomial and the answer for a small value of $x$:
```julia; hold=true
a = .1
f(x) = log(1+x)
Tn = taylor_poly(f, 0, 5)
Tn(a) - f(a)
```
### Plotting
Let's now visualize a function and the two approximations - the Taylor
polynomial and the interpolating polynomial. We use this function to
generate the interpolating polynomial as a function:
```julia;
function newton_form(f, xs)
x -> begin
tot = divided_differences(f, xs[1])
for i in 2:length(xs)
tot += divided_differences(f, xs[1:i]...) * prod([x-xs[j] for j in 1:(i-1)])
The graph should be $0$ at each of the the points in `xs`, which we
can verify in the graph above. Plotting over a wider region shows a
common phenomenon that these polynomials approximate the function near
the values, but quickly deviate away:
In this graph we make a plot of the Taylor polynomial for different sizes of $n$ for the function ``f(x) = 1 - \cos(x)``:
```julia; hold=true
f(x) = 1 - cos(x)
a, b = -pi, pi
plot(f, a, b, linewidth=5, label="f")
plot!(taylor_poly(f, 0, 2), label="T₂")
plot!(taylor_poly(f, 0, 4), label="T₄")
plot!(taylor_poly(f, 0, 6), label="T₆")
```
Though all are good approximations near $c=0$, as more terms are
included, the Taylor polynomial becomes a better approximation over a wider
range of values.
##### Example: period of an orbiting satellite
Kepler's third [law](http://tinyurl.com/y7oa4x2g) of planetary motion states:
> The square of the orbital period of a planet is directly proportional to the cube of the semi-major axis of its orbit.
In formulas, $P^2 = a^3 \cdot (4\pi^2) / (G\cdot(M + m))$, where $M$ and $m$ are the respective masses. Suppose a satellite is in low earth orbit with a constant height, $a$. Use a Taylor polynomial to approximate the period using Kepler's third law to relate the quantities.
Suppose $R$ is the radius of the earth and $h$ the height above the earth assuming $h$ is much smaller than $R$. The mass $m$ of a satellite is negligible to that of the earth, so $M+m=M$ for this purpose. We have:
Typically, if $h$ is much smaller than $R$ the first term is enough giving a formula like $P \approx P_0 \cdot(1 + \frac{3h}{2R})$.
A satellite phone utilizes low orbit satellites to relay phone communications. The [Iridium](http://www.kddi.com/english/business/cloud-network-voice/satellite/iridium/mobile/) system uses satellites with an elevation ``h=780km``. The radius of the earth is $3,959 miles$, the mass of the earth is $5.972 × 10^{24} kg$, and the gravitational [constant](https://en.wikipedia.org/wiki/Gravitational_constant), $G$ is $6.67408 \cdot 10^{-11}$ $m^3/(kg \cdot s^2)$.
Compare the approximate value with ``1`` term to the exact value.
```julia;
G = 6.67408e-11
H = 780 * 1000
R = 3959 * 1609.34 # 1609 meters per mile
M = 5.972e24
P0, HR = (2pi)/sqrt(G*M) * R^(3/2), H/R
Preal = P0 * (1 + HR)^(3/2)
P1 = P0 * (1 + 3*HR/2)
Preal, P1
```
With terms out to the fifth power, we get a better approximation:
The units of the period above are in seconds. That answer here is about ``100`` minutes:
```julia;
Preal/60
```
When $H$ is much smaller than $R$ the approximation with ``5``th order is
really good, and serviceable with just ``1`` term. Next we check if this
is the same when $H$ is larger than $R$.
----
The height of a [GPS satellite](http://www.gps.gov/systems/gps/space/) is about $12,550$ miles. Compute the period of a circular orbit and compare with the estimates.
H = uconvert(m, 12250 * mi) # unit convert miles to meter
R = uconvert(m, 3959 * mi)
M = 5.972e24 * kg
P0, HR = (2pi)/sqrt(G*M) * R^(3/2), H/R
Preal = P0 * (1 + HR)^(3/2) # in seconds
Preal, uconvert(hr, Preal) # ≈ 11.65 hours
```
We see `Preal` has the right units - the units of mass and distance cancel leaving a measure of time - but it is hard to sense how long this is. Converting to hours, helps us see the satellite orbits about twice per day.
##### Example: computing $\log(x)$
Where exactly does the value assigned to $\log(5)$ come from? The
value needs to be computed. At some level, many questions resolve down
to the basic operations of addition, subtraction, multiplication, and
division. Preferably not the latter, as division is slow. Polynomials
then should be fast to compute, and so computing logarithms using a
First, there is usually a reduction stage. In this phase, the problem
is transformed in a manner to one involving only a fixed interval of values. For this,
function values of $k$ and $m$ are found so that $x = 2^k \cdot (1+m)$
*and* $\sqrt{2}/2 < 1+m < \sqrt{2}$. If these are found, then $\log(x)$ can be computed with
$k \cdot \log(2) + \log(1+m)$. The first value - a multiplication - can easily be
computed using pre-computed value of $\log(2)$, the second then *reduces* the problem to an interval.
Now, for this problem a further
trick is utilized, writing $s= f/(2+f)$ so that
$\log(1+m)=\log(1+s)-\log(1-s)$ for some small range of $s$ values. These combined make it possible to compute $\log(x)$ for any real $x$.
To compute $\log(1\pm s)$, we can find a Taylor polynomial. Let's go out to degree $19$ and use `SymPy` to do the work:
```julia;
@syms s
aₗ = series(log(1 + s), s, 0, 19)
bₗ = series(log(1 - s), s, 0, 19)
a_b = (aₗ - bₗ).removeO() # remove"Oh" not remove"zero"
```
This is re-expressed as $2s + s \cdot p$ with $p$ given by:
```julia;
cancel(a_b - 2s/s)
```
Now, $2s = m - s\cdot m$, so the above can be reworked to be $\log(1+m) = m - s\cdot(m-p)$.
(For larger values of $m$, a similar, but different approximation, can be used to minimize floating point errors.)
How big can the error be between this *approximations* and $\log(1+m)$? We plot to see how big $s$ can be:
```julia;
@syms v
plot(v/(2+v), sqrt(2)/2 - 1, sqrt(2)-1)
```
This shows, $s$ is as big as
```julia;
Max = (v/(2+v))(v => sqrt(2) - 1)
```
The error term is like $2/19 \cdot \xi^{19}$ which is largest at this value of $M$. Large is relative - it is really small:
```julia;
(2/19)*Max^19
```
Basically that is machine precision. Which means, that as far as can be told on the computer, the value produced by $2s + s \cdot p$ is about as accurate as can be done.
To try this out to compute $\log(5)$. We have $5 = 2^2(1+0.25)$, so $k=2$ and $m=0.25$.
```julia
k, m = 2, 0.25
𝒔 = m / (2+m)
pₗ = 2 * sum(𝒔^(2i)/(2i+1) for i in 1:8) # where the polynomial approximates the logarithm...
log(1 + m), m - 𝒔*(m-pₗ), log(1 + m) - ( m - 𝒔*(m-pₗ))
```
The two values differ by less than $10^{-16}$, as advertised. Re-assembling then, we compare the computed values:
```julia;
Δ = k * log(2) + (m - 𝒔*(m-pₗ)) - log(5)
```
The actual code is different, as the Taylor polynomial isn't
used. The Taylor polynomial is a great approximation near a point, but
there might be better polynomial approximations for all values in an interval.
In this case there is, and that polynomial is used in the production
setting. This makes things a bit more efficient, but the basic idea
remains - for a prescribed accuracy, a polynomial approximation can
be found over a given interval, which can be cleverly utilized to
solve for all applicable values.
##### Example: higher order derivatives of the inverse function
For notational purposes, let ``g(x)`` be the inverse function for ``f(x)``. Assume *both* functions have a Taylor polynomial expansion:
(This is following [Liptaj](https://vixra.org/pdf/1703.0295v1.pdf)).
We will use `SymPy` to take this limit for the first `4` derivatives. Here is some code that expands ``x + \Delta_x = g(f(x_0 + \Delta_x))`` and then uses `SymPy` to solve:
```julia;
@syms x₀ Δₓ f′[1:4] g′[1:4]
as(i) = f′[i]/factorial(i)
bs(i) = g′[i]/factorial(i)
gᵏs = Any[]
eqns = Any[]
for n ∈ 1:4
Δy = sum(as(j) * Δₓ^j for j ∈ 1:n)
left = x₀ + Δₓ
right = x₀ + sum(bs(i)*Δy^i for i ∈ 1:n)
eqn = left ~ right
push!(eqns, eqn)
gⁿ = g′[n]
ϕ = solve(eqn, gⁿ)[1]
# replace g′ᵢs in terms of computed f′ᵢs
for j ∈ 1:n-1
ϕ = subs(ϕ, g′[j] => gᵏs[j])
end
L = limit(ϕ, Δₓ => 0)
push!(gᵏs, L)
end
gᵏs
```
We can see the expected `g' = 1/f'` (where the point of evalution is ``g(y) = 1/f'(f^{-1}(y))`` is not written). In addition, we get 3 more formulas, hinting that the answers grow rapidly in terms of their complexity.
In the above, for each `n`, the code above sets up the two sides, `left` and `right`, of an equation involving the higher-order derivatives of ``g``. For example, when `n=2` we have:
```julia;
eqns[2]
```
The `solve` function is used to identify ``g^{(n)}`` represented in terms of lower-order derivatives of ``g``. These values have been computed and stored and are then substituted into `ϕ`. Afterwards a limit is taken and the answer recorded.
## Questions
###### Question
Compute the Taylor polynomial of degree ``10`` for $\sin(x)$ about $c=0$ using `SymPy`. Based on the form, which formula seems appropriate:
The ``5``th order Taylor polynomial for $\sin(x)$ about $c=0$ is: $x - x^3/3! + x^5/5!$. Use this to find the first ``3`` terms of the Taylor polynomial of $\sin(x^2)$ about $c=0$.
A more direct derivation of the form of the Taylor polynomial (here taken about $c=0$) is to *assume* a polynomial form that matches $f$:
```math
f(x) = a + bx + cx^2 + dx^3 + ex^4 + \cdots
```
If this is true, then formally evaluating at $x=0$ gives $f(0) = a$, so $a$ is determined. Similarly, formally differentiating and evaluating at $0$ gives $f'(0) = b$. What is the result of formally differentiating $4$ times and evaluating at $0$:
How big an error is there in approximating $e^x$ by its ``5``th degree Taylor polynomial about $c=0$, $1 + x + x^2/2! + x^3/3! + x^4/4! + x^5/5!$, over $[-1,1]$?
The error is known to be $( f^{(6)}(\xi)/6!) \cdot x^6$ for some $\xi$ in $[-1,1]$.
* The ``6``th derivative of $e^x$ is still $e^x$:
```julia; hold=true; echo=false
yesnoq(true)
```
* Which is true about the function $e^x$:
```julia; hold=true; echo=false
choices =["It is increasing", "It is decreasing", "It both increases and decreases"]
The error in using $T_k(x)$ to approximate $e^x$ over the interval $[-1/2, 1/2]$ is $(1/(k+1)!) e^\xi x^{k+1}$, for some $\xi$ in the interval. This is *less* than $1/((k+1)!) e^{1/2} (1/2)^{k+1}$.
* Why?
```julia; hold=true; echo=false
choices = [
L"The function $e^x$ is increasing, so takes on its largest value at the endpoint and the function $|x^n| \leq |x|^n \leq (1/2)^n$",
L"The function has a critical point at $x=1/2$",
L"The function is monotonic in $k$, so achieves its maximum at $k+1$"
Assuming the above is right, find the smallest value $k$ guaranteeing a error no more than $10^{-16}$.
```julia; hold=true; echo=false
f(k) = 1/factorial(k+1) * exp(1/2) * (1/2)^(k+1)
(f(13) > 1e-16 && f(14) < 1e-16) && numericq(14)
```
* The function $f(x) = (1 - x + x^2) \cdot e^x$ has a Taylor polynomial about ``0`` such that all coefficients are rational numbers. Is it true that the numerators are all either ``1`` or prime? (From the 2014 [Putnam](http://kskedlaya.org/putnam-archive/2014.pdf) exam.)
Here is one way to get all the values bigger than 1:
```julia; hold=true;
ex = (1 - x + x^2)*exp(x)
Tn = series(ex, x, 0, 100).removeO()
ps = sympy.Poly(Tn, x).coeffs()
qs = numer.(ps)
qs[qs .> 1] |> Tuple # format better for output
```
Verify by hand that each of the remaining values is a prime number to answer the question (Or you can use `sympy.isprime.(qs)`).
Are they all prime or $1$?
```julia; hold=true; echo=false
yesnoq(true)
```
## Appendix
We mentioned two facts that could use a proof: the Newton form of the interpolating polynomial and the mean value theorem for divided differences. Our explanation tries to emphasize a parallel with the secant line's relationship with the tangent line. The standard way to discuss Taylor polynomials is different (also more direct) and so these two proofs are not in most calculus texts.
A [proof](https://www.math.uh.edu/~jingqiu/math4364/interpolation.pdf) of the Newton form can be done knowing that the interpolating polynomial is unique and can be expressed either as
These two polynomials are of degree $n$ or less and have $u(x) = h(x)-g(x)=0$, by uniqueness. So the coefficients of $u(x)$ are $0$. We have that the coefficient of $x^n$ must be $a_n-b_n$ so $a_n=b_n$. Our goal is to express $a_n$ in terms of $a_{n-1}$ and $b_{n-1}$. Focusing on the $x^{n-1}$ term, we have:
where $p_{n-2}$ is a polynomial of at most degree $n-2$. (The expansion of $(x-x_1)\cdot\cdots\cdot(x-x_{n-1}))$ leaves $x^{n-1}$ plus some lower degree polynomial.) Similarly, we have
$a_{n-1}(x-x_0)\cdot\cdots\cdot(x-x_{n-2}) = a_{n-1}x^{n-1} + q_{n-2}$ and
$b_{n-1}(x-x_n)\cdot\cdots\cdot(x-x_2) = b_{n-1}x^{n-1}+r_{n-2}$. Combining, we get that the $x^{n-1}$ term of $u(x)$ is
```math
(b_{n-1}-a_{n-1}) - a_n(x_n-x_0) = 0.
```
On rearranging, this yields $a_n = (b_{n-1}-a_{n-1}) / (x_n - x_0)$. By *induction* - that $a_i=f[x_0, x_1, \dots, x_i]$ and $b_i = f[x_n, x_{n-1}, \dots, x_{n-i}]$ (which has trivial base case) - this is $(f[x_1, \dots, x_n] - f[x_0,\dots x_{n-1}])/(x_n-x_0)$.
Now, assuming the Newton form is correct, a
[proof](http://tinyurl.com/zjogv83) of the mean value theorem for
divided differences comes down to Rolle's theorem. Starting from the
Newton form of the polynomial and expanding in terms of
polynomial of degree at most $n-1$. That is, the coefficient of
$x^n$ is $f[x_0, x_1, \dots, x_n]$. Consider the function $h(x)=f(x) - g(x)$.
It has zeros $x_0, x_1, \dots, x_n$.
By Rolle's theorem, between any two such zeros $x_i, x_{i+1}$, $0 \leq i < n$ there must be a zero of the derivative of $h(x)$, say $\xi^1_i$. So $h'(x)$ has zeros $\xi^1_0 < \xi^1_1 < \dots < \xi^1_{n-1}$.
We visualize this with $f(x) = \sin(x)$ and $x_i = i$ for $i=0, 1, 2, 3$, The $x_i$ values are indicated with circles, the $\xi^1_i$ values indicated with squares:
Again by Rolle's theorem, between any pair of adjacent zeros $\xi^1_i, \xi^1_{i+1}$ there must be a zero $\xi^2_i$ of $h''(x)$. So there are $n-1$ zeros of $h''(x)$. Continuing, we see that there will be
$n+1-3$ zeros of $h^{(3)}(x)$,
$n+1-4$ zeros of $h^{4}(x)$, $\dots$,
$n+1-(n-1)$ zeros of $h^{n-1}(x)$, and finally
$n+1-n$ ($1$) zeros of $h^{(n)}(x)$. Call this last zero $\xi$. It satisfies $x_0 \leq \xi \leq x_n$. Further,
$0 = h^{(n)}(\xi) = f^{(n)}(\xi) - g^{(n)}(\xi)$. But $g$ is a degree $n$ polynomial, so the $n$th derivative is the coefficient of $x^n$ times $n!$. In this case we have $0 = f^{(n)}(\xi) - f[x_0, \dots, x_n] n!$. Rearranging yields the result.