Compare commits

...

83 Commits

Author SHA1 Message Date
john verzani
f988ecd5e5 Merge pull request #156 from u3ks/patch-1
Corrected Latex function in optimization example
2026-02-27 13:24:13 -05:00
Krasen Samardzhiev
1093dee791 Corrected Latex function in optimization example 2026-02-27 17:06:57 +00:00
john verzani
2e5c11a25f Merge pull request #155 from aligurbu/patch-2
Include Julia plot for limit function
2026-02-02 16:54:34 -05:00
john verzani
ed2e014f65 Merge pull request #154 from aligurbu/patch-1
Correct 'An historic' to 'A historic' in limits.qmd
2026-02-02 16:53:34 -05:00
Ali Gürbüz
c268744501 Include Julia plot for limit function
Add Julia code for plotting the limit function.
2026-01-31 16:40:11 -05:00
Ali Gürbüz
e47da767c8 Correct 'An historic' to 'A historic' in limits.qmd 2026-01-31 16:23:02 -05:00
john verzani
83bddc19e3 Merge pull request #153 from fangliu-tju/main
some typos
2025-08-29 15:32:49 -04:00
Fang Liu
bf2d5f6c76 some typos 2025-08-29 13:52:29 +08:00
john verzani
7c869a83ce Merge pull request #152 from jverzani/v0.25
sequences and series
2025-08-14 19:17:52 -04:00
jverzani
2725c80902 typo 2025-08-14 19:10:46 -04:00
jverzani
042a332f71 sequences and series 2025-08-14 19:08:41 -04:00
john verzani
a192358e96 Merge pull request #151 from jverzani/v0.24
V0.24
2025-07-30 07:26:51 -04:00
jverzani
d7c0d08dde typos 2025-07-29 17:12:23 -04:00
jverzani
8398f21b87 typos 2025-07-29 17:11:34 -04:00
jverzani
50cb645452 add some examples, figures 2025-07-29 17:05:59 -04:00
jverzani
beb166e117 merge main 2025-07-27 15:28:39 -04:00
jverzani
e9efe6985b ignore more 2025-07-27 15:27:22 -04:00
jverzani
05bf5afa79 Merge branch 'main' of https://github.com/jverzani/CalculusWithJuliaNotes.jl 2025-07-27 15:27:08 -04:00
jverzani
33c6e62d68 em dash; sentence case 2025-07-27 15:26:00 -04:00
john verzani
b02cadab07 Merge pull request #150 from aligurbu/patch-1
Update quick_notes.qmd
2025-07-27 15:22:44 -04:00
Ali Gürbüz
22896a36ea Update quick_notes.qmd
Typo fixing
2025-07-27 11:07:38 -07:00
jverzani
c3b221cd29 Merge branch 'main' into v0.24 2025-07-23 08:06:04 -04:00
jverzani
e0b14c233e Merge branch 'main' of https://github.com/jverzani/CalculusWithJuliaNotes.jl 2025-07-23 08:05:55 -04:00
jverzani
c3a94878f3 edits 2025-07-23 08:05:43 -04:00
john verzani
9dd9fa76b9 Merge pull request #149 from aligurbu/patch-1
Update README.md
2025-07-21 16:51:40 -04:00
Ali Gürbüz
661267778f Update README.md
The hyperlink has an extra set of parentheses, which is causing the markdown link syntax to be malformed.

The issue was an extra set of parentheses in the markdown link syntax. The link should now work properly and will direct users to the Calculus with Julia notes website.
2025-07-21 12:49:32 -07:00
jverzani
31ce21c8ad merge in main 2025-07-02 14:06:31 -04:00
jverzani
c38a7c9f1d WIP; better figures 2025-07-02 14:05:09 -04:00
jverzani
6c10803712 modify adjust_plotly 2025-07-02 11:09:37 -04:00
jverzani
2a7e0cb660 Merge branch 'main' of https://github.com/jverzani/CalculusWithJuliaNotes.jl 2025-07-02 06:25:18 -04:00
jverzani
5013211954 work on better figures 2025-07-02 06:25:10 -04:00
john verzani
c5a9ece77c Merge pull request #148 from fangliu-tju/main
some typos.
2025-07-02 06:24:48 -04:00
Fang Liu
bc71a0b9d7 some typos. 2025-07-02 09:26:04 +08:00
jverzani
aa8e9e04ca script to adjust plotly position in loading of javascript 2025-06-27 21:02:20 -04:00
jverzani
30300f295b adjust plots 2025-06-27 20:55:42 -04:00
jverzani
145bc6043f WIP 2025-06-27 20:53:48 -04:00
jverzani
2af5a6a213 WIP 2025-06-27 19:23:48 -04:00
jverzani
50cc2b2193 xxx 2025-06-27 19:22:38 -04:00
jverzani
23e00863a5 Plots 2025-06-27 19:21:39 -04:00
jverzani
c4bd214ecd WIP: Plots 2025-06-27 19:21:20 -04:00
jverzani
580e87ccb2 orthogonal; work around plotly() 2025-06-27 14:29:32 -04:00
jverzani
c8f9fd4995 merge main 2025-06-14 07:24:47 -04:00
jverzani
4f60e9a414 typos 2025-06-14 07:23:19 -04:00
john verzani
071a495010 Merge pull request #147 from fangliu-tju/main
some typos.
2025-06-14 07:07:51 -04:00
Fang Liu
95429d26f9 some typos. 2025-06-14 09:27:53 +08:00
jverzani
e7684ac2f7 incorporate PR 2025-06-06 12:06:10 -04:00
jverzani
878c27c5a5 Merge branch 'ptoml' 2025-06-06 06:50:09 -04:00
jverzani
c08eb90a60 Merge branch 'main' of https://github.com/jverzani/CalculusWithJuliaNotes.jl 2025-06-06 06:49:59 -04:00
jverzani
748797fee5 WIP 2025-06-06 06:49:50 -04:00
john verzani
5093863199 Merge pull request #146 from fangliu-tju/main
some typos
2025-06-06 06:46:26 -04:00
Fang Liu
8ede2b33fb some typos 2025-06-06 16:56:56 +08:00
jverzani
6065782e07 typos 2025-05-29 07:49:08 -04:00
jverzani
4d4e2342d6 Merge branch 'main' of https://github.com/jverzani/CalculusWithJuliaNotes.jl 2025-05-29 07:47:47 -04:00
john verzani
9dcafd7d7d Merge pull request #145 from fangliu-tju/main
some typos.
2025-05-29 07:47:35 -04:00
Fang Liu
4d0a9e9a72 some typos. 2025-05-23 16:20:13 +08:00
jverzani
efd69d2fa1 updates 2025-05-10 15:44:25 -04:00
jverzani
38785d432a Merge branch 'main' of https://github.com/jverzani/CalculusWithJuliaNotes.jl 2025-05-09 07:33:31 -04:00
jverzani
ad52202c92 edits 2025-05-09 07:33:21 -04:00
john verzani
837a8eb42d Merge pull request #144 from fangliu-tju/main
some typos
2025-05-09 07:33:03 -04:00
Fang Liu
f7b7df3586 some typos 2025-05-09 16:54:52 +08:00
jverzani
1518e3b9be add back file 2025-05-07 12:05:34 -04:00
john verzani
d2e00d7bd9 Merge pull request #143 from fangliu-tju/main
some typos
2025-05-04 07:07:44 -04:00
Fang Liu
503de9c85e Merge branch 'main' of github.com:fangliu-tju/CalculusWithJuliaNotes.jl into main 2025-05-04 15:38:39 +08:00
Fang Liu
d55e4802fb some typos 2025-05-04 15:08:47 +08:00
jverzani
fa5f9f449d add matrix calculus notes 2025-04-30 17:57:44 -04:00
jverzani
a650cf8fa0 Merge branch 'main' into v0.24 2025-04-23 13:39:27 -04:00
john verzani
c5b59ebecb Merge pull request #141 from jverzani/typos
typos
2025-04-23 11:50:36 -04:00
jverzani
2dafd02065 typos 2025-04-23 11:49:39 -04:00
jverzani
cead7f651a typos 2025-04-23 11:40:01 -04:00
john verzani
8df336b595 Merge pull request #140 from fangliu-tju/main
some typos
2025-04-23 11:36:48 -04:00
jverzani
b5f0300921 WIP 2025-04-23 11:35:45 -04:00
Fang Liu
ed1d92197a some typos 2025-04-22 15:19:44 +08:00
jverzani
30be930f0f WIP 2025-04-16 14:31:16 -04:00
john verzani
36895faafe Merge pull request #139 from fangliu-tju/main
some typos and a modification for association of + and * in calculator.qmd
2025-04-16 10:52:59 -04:00
Fang Liu
b24f19002d some typos and a modification for association of + and * 2025-04-16 15:18:55 +08:00
Fang Liu
cbc6e5375a some typos and a modification for association of + and * 2025-04-16 15:11:16 +08:00
john verzani
d56705e09b Merge pull request #138 from jverzani/v0.23
updates
2025-01-24 11:10:34 -05:00
jverzani
33c02f08ce typos 2025-01-24 11:10:06 -05:00
jverzani
92f4cba496 updates 2025-01-24 11:04:54 -05:00
john verzani
ff0f8a060d Merge pull request #137 from jverzani/v0.22
align fix; theorem style; condition number
2024-10-31 14:27:07 -04:00
jverzani
b1bafe190c fix typo 2024-10-31 14:26:45 -04:00
jverzani
490f3813b1 fix typo 2024-10-31 14:25:49 -04:00
jverzani
18aae2aa93 align fix; theorem style; condition number 2024-10-31 14:22:21 -04:00
116 changed files with 13335 additions and 3533 deletions

6
.gitignore vendored
View File

@@ -5,6 +5,7 @@ docs/site
test/benchmarks.json
Manifest.toml
TODO.md
Changelog.md
/*/_pdf_index.pdf
/*/*/_pdf_index.pdf
/*/_pdf_index.typ
@@ -12,4 +13,7 @@ TODO.md
/*/CalculusWithJulia.pdf
default.profraw
/quarto/default.profraw
/*/*/default.profraw
/*/*/default.profraw
/*/bonepile.qmd
/*/*/bonepile.qmd
/*/*_files

View File

@@ -3,4 +3,4 @@
[![Read Notes](https://img.shields.io/badge/docs-latest-blue.svg)](https://jverzani.github.io/CalculusWithJuliaNotes.jl/)
A collection of [notes]((https://jverzani.github.io/CalculusWithJuliaNotes.jl/)) related to using `Julia` in the study of Calculus.
A collection of [notes](https://jverzani.github.io/CalculusWithJuliaNotes.jl/) related to using `Julia` in the study of Calculus.

View File

@@ -13,4 +13,7 @@ Strang = "Strang"
multline = "multline"
infiniment = "infiniment"
infiniment = "infiniment"
typ = "typ"
Comput = "Comput"

43
adjust_plotly.jl Normal file
View File

@@ -0,0 +1,43 @@
# The issue with `PlotlyLight` appears to be that
# the `str` below is called *after* the inclusion of `require.min.js`
# (That str is included in the `.qmd` file to be included in the header
# but the order of inclusion appears not to be adjustable)
# This little script just adds a line *before* the require call
# which seems to make it all work. The line number 83 might change.
#alternatives/plotly_plotting.html
function _add_plotly(f)
lineno = 117
str = """
<script src="https://cdn.plot.ly/plotly-2.11.0.min.js"></script>
"""
r = readlines(f)
open(f, "w") do io
for (i,l) enumerate(r)
i == lineno && println(io, str)
println(io, l)
end
end
end
function (@main)(args...)
for (root, dirs, files) in walkdir("_book")
for fᵢ files
f = joinpath(root, fᵢ)
if endswith(f, ".html")
_add_plotly(f)
end
end
end
#f = "_book/integrals/center_of_mass.html"
#_add_plotly(f)
return 1
end
["ODEs", "alternatives", "derivatives", "differentiable_vector_calculus", "integral_vector_calculus", "integrals", "limits", "misc", "precalc", "site_libs"]

2
quarto/.gitignore vendored
View File

@@ -3,4 +3,6 @@
/_freeze/
/*/*_files/
/*/*.ipynb/
/*/bonepile.qmd
/*/references.bib
weave_support.jl

View File

@@ -1,11 +1,16 @@
[deps]
CalculusWithJulia = "a2e0e22d-7d4c-5312-9169-8b992201a882"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
ModelingToolkit = "961ee093-0014-501f-94e3-6117800e7a78"
MonteCarloMeasurements = "0987c9cc-fe09-11e8-30f0-b96dd679fdca"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
OrdinaryDiffEq = "1dea7af3-3e70-54e6-95c3-0bf5283fa5ed"
PlotlyBase = "a03496cd-edff-5a9b-9e67-9cda94a718b5"
PlotlyKaleido = "f2990250-8cf9-495f-b13a-cce12b45703c"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -71,14 +71,15 @@ $$
The author's apply this model to flu statistics from Hong Kong where:
$$
\begin{align*}
S(0) &= 7,900,000\\
I(0) &= 10\\
R(0) &= 0\\
\end{align*}
$$
In `Julia` we define these, `N` to model the total population, and `u0` to be the proportions.
In `Julia` we define these parameter values and `N` to model the total population and `u0` to be represent the proportions.
```{julia}
@@ -93,7 +94,7 @@ An *estimated* set of values for $k$ and $b$ are $k=1/3$, coming from the averag
Okay, the mathematical modeling is done; now we try to solve for the unknown functions using `DifferentialEquations`.
To warm up, if $b=0$ then $i'(t) = -k \cdot i(t)$ describes the infected. (There is no circulation of people in this case.) The solution would be achieved through:
To warm up, if $b=0$ then $i'(t) = -k \cdot i(t)$ describes the infected. (There is no circulation of people in this case.) This is a single ODE. The solution would be achieved through:
```{julia}
@@ -101,10 +102,12 @@ To warm up, if $b=0$ then $i'(t) = -k \cdot i(t)$ describes the infected. (The
k = 1/3
f(u,p,t) = -k * u # solving u(t) = - k u(t)
uᵢ0= I0/N
time_span = (0.0, 20.0)
prob = ODEProblem(f, I0/N, time_span)
sol = solve(prob, Tsit5(), reltol=1e-8, abstol=1e-8)
prob = ODEProblem(f, uᵢ0, time_span)
sol = solve(prob, Tsit5(); reltol=1e-8, abstol=1e-8)
plot(sol)
```
@@ -119,7 +122,7 @@ $$
\frac{di}{dt} = -k \cdot i(t) = F(i(t), k, t)
$$
where $F$ depends on the current value ($i$), a parameter ($k$), and the time ($t$). We did not utilize $p$ above for the parameter, as it was easy not to, but could have, and will in the following. The time variable $t$ does not appear by itself in our equation, so only `f(u, p, t) = -k * u` was used, `u` the generic name for a solution which in this case is $i$.
where $F$ depends on the current value ($i$), a parameter ($k$), and the time ($t$). We did not utilize $p$ above for the parameter, as it was easy not to, but could have, and will in the following. The time variable $t$ does not appear by itself in our equation, so only `f(u, p, t) = -k * u` was used, `u` the generic name for a solution which in this case was labeled with an $i$.
The problem we set up needs an initial value (the $u0$) and a time span to solve over. Here we want time to model real time, so use floating point values.
@@ -130,12 +133,13 @@ The plot shows steady decay, as there is no mixing of infected with others.
Adding in the interaction requires a bit more work. We now have what is known as a *system* of equations:
$$
\begin{align*}
\frac{ds}{dt} &= -b \cdot s(t) \cdot i(t)\\
\frac{di}{dt} &= b \cdot s(t) \cdot i(t) - k \cdot i(t)\\
\frac{dr}{dt} &= k \cdot i(t)\\
\end{align*}
$$
Systems of equations can be solved in a similar manner as a single ordinary differential equation, though adjustments are made to accommodate the multiple functions.
@@ -165,7 +169,7 @@ The `sir!` function has the trailing `!` indicating by convention it *mu
:::
With the update function defined, the problem is setup and a solution found with in the same manner:
With the update function defined, the problem is setup and a solution is found using the same manner as before:
```{julia}
@@ -191,7 +195,7 @@ p = (k=1/2, b=2) # change b from 1/2 to 2 -- more daily contact
prob = ODEProblem(sir!, u0, time_span, p)
sol = solve(prob, Tsit5())
plot(sol)
plot(sol; legend=:right)
```
The graphs are somewhat similar, but the steady state is reached much more quickly and nearly everyone became infected.
@@ -250,7 +254,7 @@ end
p
```
The 3-dimensional graph with `plotly` can have its viewing angle adjusted with the mouse. When looking down on the $x-y$ plane, which code `b` and `k`, we can see the rapid growth along a line related to $b/k$.
(A 3-dimensional graph with `plotly` or `Makie` can have its viewing angle adjusted with the mouse. When looking down on the $x-y$ plane, which code `b` and `k`, we can see the rapid growth along a line related to $b/k$.)
Smith and Moore point out that $k$ is roughly the reciprocal of the number of days an individual is sick enough to infect others. This can be estimated during a breakout. However, they go on to note that there is no direct way to observe $b$, but there is an indirect way.
@@ -277,11 +281,12 @@ We now solve numerically the problem of a trajectory with a drag force from air
The general model is:
$$
\begin{align*}
x''(t) &= - W(t,x(t), x'(t), y(t), y'(t)) \cdot x'(t)\\
y''(t) &= -g - W(t,x(t), x'(t), y(t), y'(t)) \cdot y'(t)\\
\end{align*}
$$
with initial conditions: $x(0) = y(0) = 0$ and $x'(0) = v_0 \cos(\theta), y'(0) = v_0 \sin(\theta)$.
@@ -379,7 +384,7 @@ SOL = solve(trajectory_problem, Tsit5(); p = ps, callback=cb)
plot(t -> SOL(t)[1], t -> SOL(t)[2], TSPAN...; legend=false)
```
Finally, we note that the `ModelingToolkit` package provides symbolic-numeric computing. This allows the equations to be set up symbolically, as in `SymPy` before being passed off to `DifferentialEquations` to solve numerically. The above example with no wind resistance could be translated into the following:
Finally, we note that the `ModelingToolkit` package provides symbolic-numeric computing. This allows the equations to be set up symbolically, as has been illustrated with `SymPy`, before being passed off to `DifferentialEquations` to solve numerically. The above example with no wind resistance could be translated into the following:
```{julia}
@@ -406,7 +411,7 @@ p = [γ => 0.0,
prob = ODEProblem(sys, u0, TSPAN, p, jac=true)
sol = solve(prob,Tsit5())
plot(t -> sol(t)[3], t -> sol(t)[4], TSPAN..., legend=false)
plot(t -> sol(t)[1], t -> sol(t)[3], TSPAN..., legend=false)
```
The toolkit will automatically generate fast functions and can perform transformations (such as is done by `ode_order_lowering`) before passing along to the numeric solves.

View File

@@ -184,7 +184,7 @@ plot(exp(-1/2)*exp(x^2/2), x0, 2)
plot!(xs, ys)
```
Not bad. We wouldn't expect this to be exact - due to the concavity of the solution, each step is an underestimate. However, we see it is an okay approximation and would likely be better with a smaller $h$. A topic we pursue in just a bit.
Not bad. We wouldn't expect this to be exact---due to the concavity of the solution, each step is an underestimate. However, we see it is an okay approximation and would likely be better with a smaller $h$. A topic we pursue in just a bit.
Rather than type in the above command each time, we wrap it all up in a function. The inputs are $n$, $a=x_0$, $b=x_n$, $y_0$, and, most importantly, $F$. The output is massaged into a function through a call to `linterp`, rather than two vectors. The `linterp` function[^Interpolations] we define below just finds a function that linearly interpolates between the points and is `NaN` outside of the range of the $x$ values:
@@ -263,7 +263,7 @@ Each step introduces an error. The error in one step is known as the *local trun
The total error, or more commonly, *global truncation error*, is the error between the actual answer and the approximate answer at the end of the process. It reflects an accumulation of these local errors. This error is *bounded* by a constant times $h$. Since it gets smaller as $h$ gets smaller in direct proportion, the Euler method is called *first order*.
Other, somewhat more complicated, methods have global truncation errors that involve higher powers of $h$ - that is for the same size $h$, the error is smaller. In analogy is the fact that Riemann sums have error that depends on $h$, whereas other methods of approximating the integral have smaller errors. For example, Simpson's rule had error related to $h^4$. So, the Euler method may not be employed if there is concern about total resources (time, computer, ...), it is important for theoretical purposes in a manner similar to the role of the Riemann integral.
Other, somewhat more complicated, methods have global truncation errors that involve higher powers of $h$---that is for the same size $h$, the error is smaller. In analogy is the fact that Riemann sums have error that depends on $h$, whereas other methods of approximating the integral have smaller errors. For example, Simpson's rule had error related to $h^4$. So, the Euler method may not be employed if there is concern about total resources (time, computer, ...), it is important for theoretical purposes in a manner similar to the role of the Riemann integral.
In the examples, we will see that for many problems the simple Euler method is satisfactory, but not always so. The task of numerically solving differential equations is not a one-size-fits-all one. In the following, a few different modifications are presented to the basic Euler method, but this just scratches the surface of the topic.
@@ -297,10 +297,10 @@ plot!(f, x0, xn)
From the graph it appears our value for `f(xn)` will underestimate the actual value of the solution slightly.
##### Example
##### Example: the power series method and Euler
The equation $y'(x) = \sin(x \cdot y)$ is not separable, so need not have an easy solution. The default method will fail. Looking at the available methods with `sympy.classify_ode(𝐞qn, u(x))` shows a power series method which can return a power series *approximation* (a Taylor polynomial). Let's look at comparing an approximate answer given by the Euler method to that one returned by `SymPy`.
The equation $y'(x) = \sin(x \cdot y)$ is not separable, so need not have an easy solution. The default method will fail. Looking at the available methods with `sympy.classify_ode(𝐞qn, u(x))` shows a power series method which can return a power series *approximation* (a polynomial). Let's look at comparing an approximate answer given by the Euler method to that one returned by `SymPy`.
First, the `SymPy` solution:
@@ -335,6 +335,71 @@ plot!(u, linewidth=5)
We see that the answer found from using a polynomial series matches that of Euler's method for a bit, but as time evolves, the approximate solution given by Euler's method more closely tracks the slope field.
----
The [power series method](https://en.wikipedia.org/wiki/Power_series_solution_of_differential_equations) to solve a differential equation starts with an assumption that the solution can be represented as a power series with some positive radius of convergence. This is formally substituted into the differential equation and derivatives may be taken term by term. The resulting coefficients are equated for like powers giving a system of equations to be solved.
An example of the method applied to the ODE $f''(x) - 2xf'(x) + \lambda f(x)=0$ is given in the reference above and follows below. Assume $f(x) = \sum_{n=0}^\infty a_n x^n$. Then
$$
\begin{align*}
f(x) &= \sum_{n=0} a_n x^n\\
f'(x) &= \sum_{n=1} a_n nx^{n-1}\\
f''(x) &= \sum_{n=2} a_n n (n-1) x^{n-2}\\
\end{align*}
$$
Putting these into the differential equation gives
$$
\begin{align*}
0 &=\sum_{n=2} a_n n (n-1) x^{n-2}
- 2x \cdot \sum_{n=1} a_n n x^{n-1}
+ \lambda \sum_{n=0} a_n x^n\\
&=\sum_{n=0} a_{n+2} (n+2) (n+1) x^{n}
- \sum_{n=1} 2 n a_n x^{n}
+ \sum_{n=0} \lambda a_n x^n\\
&= a_2 (1)(2) + \sum_{n=1} a_{n+2} (n+2) (n+1) x^{n}
- \sum_{n=1} 2 a_n n x^n
+ \lambda a_0 + \sum_{n=1} \lambda a_n x^n\\
&= (\lambda a_0 + 2a_2)\cdot x^0
+ \sum_{n=1} \left((n+2)(n+1)a_{n+2} + (-2n+\lambda)a_n\right) x^n
\end{align*}
$$
For a power series to be $0$ all its coefficients must be zero. This mandates:
$$
\begin{align*}
0 &= \lambda a_0 + 2a_2,\quad \text{and} \\
0 &= ((n+2)(n+1)a_{n+2} + (-2n+\lambda)a_n), \quad \text{for } n \geq 1
\end{align*}
$$
The last equation allows one to compute $a_{n+2}$ based on $a_n$. With $a_0$ and $a_1$ parameters we can create the first few values for the terms in the series for the solution:
```{julia}
@syms a_0::real a_1::real λ::real
a_2 = -λ/2 * a_0
recurse(n) = (2n-λ)/((n+2)*(n+1))
a_3 = expand(recurse(1)*a_1)
a_4 = expand(recurse(2)*a_2)
[a_2, a_3, a_4]
```
We can see these terms in the `SymPy` solution which uses the power series method for this differential equation:
```{julia}
@syms x::real u()
∂ = Differential(x)
eqn = (∂∘∂)(u(x)) - 2*x*∂(u(x)) + λ*u(x) ~ 0
inits = Dict(u(0) => a_0, ∂(u(x))(0) => a_1)
dsolve(eqn, u(x); ics=inits)
```
##### Example
@@ -602,12 +667,13 @@ $$
We can try the Euler method here. A simple approach might be this iteration scheme:
$$
\begin{align*}
x_{n+1} &= x_n + h,\\
u_{n+1} &= u_n + h v_n,\\
v_{n+1} &= v_n - h \cdot g/l \cdot \sin(u_n).
\end{align*}
$$
Here we need *two* initial conditions: one for the initial value $u(t_0)$ and the initial value of $u'(t_0)$. We have seen if we start at an angle $a$ and release the bob from rest, so $u'(0)=0$ we get a sinusoidal answer to the linearized model. What happens here? We let $a=1$, $l=5$ and $g=9.8$:
@@ -647,7 +713,7 @@ plot(euler2(x0, xn, y0, yp0, 360), 0, 4T)
plot!(x -> pi/4*cos(sqrt(g/l)*x), 0, 4T)
```
Even now, we still see that something seems amiss, though the issue is not as dramatic as before. The oscillatory nature of the pendulum is seen, but in the Euler solution, the amplitude grows, which would necessarily mean energy is being put into the system. A familiar instance of a pendulum would be a child on a swing. Without pumping the legs - putting energy in the system - the height of the swing's arc will not grow. Though we now have oscillatory motion, this growth indicates the solution is still not quite right. The issue is likely due to each step mildly overcorrecting and resulting in an overall growth. One of the questions pursues this a bit further.
Even now, we still see that something seems amiss, though the issue is not as dramatic as before. The oscillatory nature of the pendulum is seen, but in the Euler solution, the amplitude grows, which would necessarily mean energy is being put into the system. A familiar instance of a pendulum would be a child on a swing. Without pumping the legs---putting energy in the system---the height of the swing's arc will not grow. Though we now have oscillatory motion, this growth indicates the solution is still not quite right. The issue is likely due to each step mildly overcorrecting and resulting in an overall growth. One of the questions pursues this a bit further.
## Questions
@@ -793,10 +859,64 @@ Modify the `euler2` function to implement the Euler-Cromer method. What do you s
#| hold: true
#| echo: false
choices = [
"The same as before - the amplitude grows",
"The same as before---the amplitude grows",
"The solution is identical to that of the approximation found by linearization of the sine term",
"The solution has a constant amplitude, but its period is slightly *shorter* than that of the approximate solution found by linearization",
"The solution has a constant amplitude, but its period is slightly *longer* than that of the approximate solution found by linearization"]
answ = 4
radioq(choices, answ, keep_order=true)
```
###### Question
In the above, we noted that a power series that is always zero must have zero coefficients. Why?
Suppose we have a series $u(x) = \sum_{n=0} a_n x^n$ with a radius of convergence $r > 0$ such that $u(x) = 0$ for $|x| < r$.
Why is $u^{n}(x) = 0$ for any $n \geq 0$ and $|x| < r$?
```{julia}
#| echo: false
choices = ["A constant function has derivatives which are constantly zero.",
"A power series is just a number, hence has derivatives which are always zero."]
radioq(choices, 1)
```
Answer the following as specifically as possible.
What is the value of $u(0)$?
```{julia}
#| echo: false
choices = [L"0", L"a_0", L"both $0$ and $a_0$"]
radioq(choices, 3; keep_order=true)
```
What is the value of $u'(0)$?
```{julia}
#| echo: false
choices = [L"0", L"a_1", L"both $0$ and $a_1$"]
radioq(choices, 3; keep_order=true)
```
What is the value of $u''(0)$?
```{julia}
#| echo: false
choices = [L"0", L"2a_2", L"both $0$ and $2a_2$"]
radioq(choices, 3; keep_order=true)
```
What is the value of $u^{n}(0)$?
```{julia}
#| echo: false
choices = [L"0", L"n!\cdot a_n", L"both $0$ and $n!\cdot a_n$"]
radioq(choices, 3; keep_order=true)
```

View File

@@ -86,12 +86,13 @@ $$
Again, we can integrate to get an answer for any value $t$:
$$
\begin{align*}
x(t) - x(t_0) &= \int_{t_0}^t \frac{dx}{dt} dt \\
&= (v_0t + \frac{1}{2}a t^2 - at_0 t) |_{t_0}^t \\
&= (v_0 - at_0)(t - t_0) + \frac{1}{2} a (t^2 - t_0^2).
\end{align*}
$$
There are three constants: the initial value for the independent variable, $t_0$, and the two initial values for the velocity and position, $v_0, x_0$. Assuming $t_0 = 0$, we can simplify the above to get a formula familiar from introductory physics:
@@ -148,7 +149,7 @@ $$
U'(t) = -r U(t), \quad U(0) = U_0.
$$
This shows that the rate of change of $U$ depends on $U$. Large positive values indicate a negative rate of change - a push back towards the origin, and large negative values of $U$ indicate a positive rate of change - again, a push back towards the origin. We shouldn't be surprised to either see a steady decay towards the origin, or oscillations about the origin.
This shows that the rate of change of $U$ depends on $U$. Large positive values indicate a negative rate of change---a push back towards the origin, and large negative values of $U$ indicate a positive rate of change---again, a push back towards the origin. We shouldn't be surprised to either see a steady decay towards the origin, or oscillations about the origin.
What will we find? This equation is different from the previous two equations, as the function $U$ appears on both sides. However, we can rearrange to get:
@@ -176,7 +177,7 @@ $$
In words, the initial difference in temperature of the object and the environment exponentially decays to $0$.
That is, as $t > 0$ goes to $\infty$, the right hand will go to $0$ for $r > 0$, so $T(t) \rightarrow T_a$ - the temperature of the object will reach the ambient temperature. The rate of this is largest when the difference between $T(t)$ and $T_a$ is largest, so when objects are cooling the statement "hotter things cool faster" is appropriate.
That is, as $t > 0$ goes to $\infty$, the right hand will go to $0$ for $r > 0$, so $T(t) \rightarrow T_a$---the temperature of the object will reach the ambient temperature. The rate of this is largest when the difference between $T(t)$ and $T_a$ is largest, so when objects are cooling the statement "hotter things cool faster" is appropriate.
A graph of the solution for $T_0=200$ and $T_a=72$ and $r=1/2$ is made as follows. We've added a few line segments from the defining formula, and see that they are indeed tangent to the solution found for the differential equation.
@@ -336,11 +337,12 @@ Differential equations are classified according to their type. Different types h
The first-order initial value equations we have seen can be described generally by
$$
\begin{align*}
y'(x) &= F(y,x),\\
y(x_0) &= x_0.
y(x_0) &= y_0.
\end{align*}
$$
Special cases include:
@@ -373,7 +375,7 @@ $$
u'(x) = a u(1-u), \quad a > 0
$$
Before beginning, we look at the form of the equation. When $u=0$ or $u=1$ the rate of change is $0$, so we expect the function might be bounded within that range. If not, when $u$ gets bigger than $1$, then the slope is negative and when $u$ gets less than $0$, the slope is positive, so there will at least be a drift back to the range $[0,1]$. Let's see exactly what happens. We define a parameter, restricting `a` to be positive:
Before beginning, we look at the form of the equation. When $u=0$ or $u=1$ the rate of change is $0$, so we expect the function might be bounded within that range. If not, when $u$ gets bigger than $1$, then the slope is negative and though the slope is negative too when $u<0$, but for a realistic problem, it always be $u\ge0$. so we focus $u$ on the range $[0,1]$. Let's see exactly what happens. We define a parameter, restricting `a` to be positive:
```{julia}
@@ -401,7 +403,7 @@ To finish, we call `dsolve` to find a solution (if possible):
out = dsolve(eqn)
```
This answer - to a first-order equation - has one free constant, `C_1`, which can be solved for from an initial condition. We can see that when $a > 0$, as $x$ goes to positive infinity the solution goes to $1$, and when $x$ goes to negative infinity, the solution goes to $0$ and otherwise is trapped in between, as expected.
This answer---to a first-order equation---has one free constant, `C`, which can be solved for from an initial condition. We can see that when $a > 0$, as $x$ goes to positive infinity the solution goes to $1$, and when $x$ goes to negative infinity, the solution goes to $0$ and otherwise is trapped in between, as expected.
The limits are confirmed by investigating the limits of the right-hand:
@@ -418,7 +420,7 @@ We can confirm that the solution is always increasing, hence trapped within $[0,
diff(rhs(out),x)
```
Suppose that $u(0) = 1/2$. Can we solve for $C_1$ symbolically? We can use `solve`, but first we will need to get the symbol for `C_1`:
Suppose that $u(0) = 1/2$. Can we solve for $C_1$ symbolically? We can use `solve`, but first we will need to get the symbol for `C`:
```{julia}
@@ -616,6 +618,7 @@ nothing
```
![The cables of an unloaded suspension bridge have a different shape than a loaded suspension bridge. As seen, the cables in this [figure](https://www.brownstoner.com/brooklyn-life/verrazano-narrows-bridge-anniversary-historic-photos/) would be modeled by a catenary.](./figures/verrazano-narrows-bridge-anniversary-historic-photos-2.jpeg)
---
@@ -639,7 +642,7 @@ $$
x''(t) = 0, \quad y''(t) = -g.
$$
That is, the $x$ position - where no forces act - has $0$ acceleration, and the $y$ position - where the force of gravity acts - has constant acceleration, $-g$, where $g=9.8m/s^2$ is the gravitational constant. These equations can be solved to give:
That is, the $x$ position---where no forces act---has $0$ acceleration, and the $y$ position---where the force of gravity acts---has constant acceleration, $-g$, where $g=9.8m/s^2$ is the gravitational constant. These equations can be solved to give:
$$
@@ -667,12 +670,13 @@ Though `y` is messy, it can be seen that the answer is a quadratic polynomial in
In a resistive medium, there are drag forces at play. If this force is proportional to the velocity, say, with proportion $\gamma$, then the equations become:
$$
\begin{align*}
x''(t) &= -\gamma x'(t), & \quad y''(t) &= -\gamma y'(t) -g, \\
x(0) &= x_0, &\quad y(0) &= y_0,\\
x'(0) &= v_0\cos(\alpha),&\quad y'(0) &= v_0 \sin(\alpha).
\end{align*}
$$
We now attempt to solve these.
@@ -954,7 +958,7 @@ radioq(choices, answ)
##### Question
The example with projectile motion in a medium has a parameter $\gamma$ modeling the effect of air resistance. If `y` is the answer - as would be the case if the example were copy-and-pasted in - what can be said about `limit(y, gamma=>0)`?
The example with projectile motion in a medium has a parameter $\gamma$ modeling the effect of air resistance. If `y` is the answer---as would be the case if the example were copy-and-pasted in---what can be said about `limit(y, gamma=>0)`?
```{julia}
@@ -963,7 +967,7 @@ The example with projectile motion in a medium has a parameter $\gamma$ modeling
choices = [
"The limit is a quadratic polynomial in `x`, mirroring the first part of that example.",
"The limit does not exist, but the limit to `oo` gives a quadratic polynomial in `x`, mirroring the first part of that example.",
"The limit does not exist -- there is a singularity -- as seen by setting `gamma=0`."
"The limit does not exist---there is a singularity---as seen by setting `gamma=0`."
]
answ = 1
radioq(choices, answ)

View File

@@ -118,10 +118,10 @@ function solve(prob::Problem, alg::EulerMethod)
end
```
The post has a more elegant means to unpack the parameters from the structures, but for each of the above, the parameters are unpacked, and then the corresponding algorithm employed. As of version `v1.7` of `Julia`, the syntax `(;g,y0,v0,tspan) = prob` could also be employed.
The post has a more elegant means to unpack the parameters from the structures, but for each of the above, the parameters are unpacked using the dot notation for `getproperty`, and then the corresponding algorithm employed. As of version `v1.7` of `Julia`, the syntax `(;g,y0,v0,tspan) = prob` could also have been employed.
The exact formulas, `y(t) = y0 + v0*(t - t0) - g*(t - t0)^2/2` and `v(t) = v0 - g*(t - t0)`, follow from well-known physics formulas. Each answer is wrapped in a `Solution` type so that the answers found can be easily extracted in a uniform manner.
The exact answers, `y(t) = y0 + v0*(t - t0) - g*(t - t0)^2/2` and `v(t) = v0 - g*(t - t0)`, follow from well-known physics formulas for constant-acceleration motion. Each answer is wrapped in a `Solution` type so that the answers found can be easily extracted in a uniform manner.
For example, plots of each can be obtained through:
@@ -138,7 +138,9 @@ plot!(sol_exact.t, sol_exact.y; label="exact solution", ls=:auto)
title!("On the Earth"; xlabel="t", legend=:bottomleft)
```
Following the post, since the time step `dt = 0.1` is not small enough, the error of the Euler method is rather large. Next we change the algorithm parameter, `dt`, to be smaller:
Following the post, since the time step `dt = 0.1` is not small enough, the error of the Euler method is readily identified.
Next we change the algorithm parameter, `dt`, to be smaller:
```{julia}
@@ -155,7 +157,7 @@ title!("On the Earth"; xlabel="t", legend=:bottomleft)
It is worth noting that only the first line is modified, and only the method requires modification.
Were the moon to be considered, the gravitational constant would need adjustment. This parameter is part of the problem, not the solution algorithm.
Were the moon to be considered, the gravitational constant would need adjustment. This parameter is a property of the problem, not the solution algorithm, as `dt` is.
Such adjustments are made by passing different values to the `Problem` constructor:
@@ -175,7 +177,9 @@ title!("On the Moon"; xlabel="t", legend=:bottomleft)
The code above also adjusts the time span in addition to the graviational constant. The algorithm for exact formula is set to use the `dt` value used in the `euler` formula, for easier comparison. Otherwise, outside of the labels, the patterns are the same. Only those things that need changing are changed, the rest comes from defaults.
The above shows the benefits of using a common interface. Next, the post illustrates how *other* authors could extend this code, simply by adding a *new* `solve` method. For example,
The above shows the benefits of using a common interface.
Next, the post illustrates how *other* authors could extend this code, simply by adding a *new* `solve` method. For example, a sympletic method conserves a quantity, so can track long-term evolution without drift.
```{julia}

View File

@@ -4,12 +4,22 @@ Short cut. Run first command until happy, then run second to publish
```
quarto render
#julia adjust_plotly.jl # <-- no longer needed
# maybe git config --global http.postBuffer 157286400
quarto publish gh-pages --no-render
```
But better to
```
quarto render
# commit changes and push
# fix typos
quarto render
quarto publish gh-pages --no-render
```
To compile the pages through quarto

View File

@@ -2,6 +2,28 @@
#| output: false
#| echo: false
# Some style choices for `Plots.jl`
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35))
range_style = (; fill=(:blue, 0.35))
nothing
```
```{julia}
#| output: false
#| echo: false
## Formatting options are included here; not in CalculusWithJulia.WeaveSupport
using QuizQuestions
nothing

View File

@@ -11,18 +11,25 @@ typst_tpl = mt"""
---
title: {{:title}}
date: today
jupyter: julia-1.11
engine: julia
execute:
daemon: false
format:
typst:
toc: false
section-numbering: "1."
section-numbering: "1.1.1"
number-depth: 3
keep-typ: false
include-before-body:
- text: |
#set figure(placement: auto)
bibliography: references.bib
---
```{julia}
#| echo: false
import Plots; Plots.plotly() = Plots.gr();
nothing
```
"""
index = "_pdf_index"

View File

@@ -1,5 +1,5 @@
version: "0.21"
engine: julia
version: "0.25"
engines: ['julia']
project:
type: book
@@ -14,7 +14,6 @@ book:
search: true
repo-url: https://github.com/jverzani/CalculusWithJuliaNotes.jl
repo-subdir: quarto/
downloads: [pdf]
repo-actions: [edit, issue]
navbar:
background: light
@@ -23,17 +22,19 @@ book:
pinned: false
sidebar:
collapse-level: 1
page-footer: "Copyright 2022-24, John Verzani"
page-footer: "Copyright 2022-25, John Verzani"
chapters:
- index.qmd
- part: basics.qmd
chapters:
- basics/calculator.qmd
- basics/variables.qmd
- basics/numbers_types.qmd
- basics/logical_expressions.qmd
- basics/vectors.qmd
- basics/ranges.qmd
- part: precalc.qmd
chapters:
- precalc/calculator.qmd
- precalc/variables.qmd
- precalc/numbers_types.qmd
- precalc/logical_expressions.qmd
- precalc/vectors.qmd
- precalc/ranges.qmd
- precalc/functions.qmd
- precalc/plotting.qmd
- precalc/transformations.qmd
@@ -50,6 +51,7 @@ book:
chapters:
- limits/limits.qmd
- limits/limits_extensions.qmd
- limits/sequences_series.qmd
- limits/continuity.qmd
- limits/intermediate_value_theorem.qmd
@@ -84,6 +86,7 @@ book:
- integrals/volumes_slice.qmd
- integrals/arc_length.qmd
- integrals/surface_area.qmd
- integrals/orthogonal_polynomials.qmd
- integrals/twelve-qs.qmd
- part: ODEs.qmd
@@ -101,6 +104,7 @@ book:
- differentiable_vector_calculus/scalar_functions.qmd
- differentiable_vector_calculus/scalar_functions_applications.qmd
- differentiable_vector_calculus/vector_fields.qmd
- differentiable_vector_calculus/matrix_calculus_notes.qmd
- differentiable_vector_calculus/plots_plotting.qmd
- part: integral_vector_calculus.qmd
@@ -115,7 +119,7 @@ book:
chapters:
- alternatives/symbolics.qmd
- alternatives/SciML.qmd
# - alternatives/interval_arithmetic.qmd
#- alternatives/interval_arithmetic.qmd
- alternatives/plotly_plotting.qmd
- alternatives/makie_plotting.qmd
@@ -159,3 +163,5 @@ execute:
error: false
# freeze: false
freeze: auto
# cache: false
# enabled: true

View File

@@ -5,19 +5,42 @@
# This little script just adds a line *before* the require call
# which seems to make it all work. The line number 83 might change.
f = "_book/alternatives/plotly_plotting.html"
lineno = 88
#alternatives/plotly_plotting.html
function _add_plotly(f)
#lineno = 117
str = """
<script src="https://cdn.plot.ly/plotly-2.11.0.min.js"></script>
"""
r = readlines(f)
open(f, "w") do io
for (i,l) enumerate(r)
i == lineno && println(io, str)
println(io, l)
r = readlines(f)
inserted = false
open(f, "w") do io
for (i,l) enumerate(r)
if contains(l, "require.min.js")
!inserted && println(io, """
<script src="https://cdn.plot.ly/plotly-2.6.3.min.js"></script>
""")
inserted = true
end
println(io, l)
end
end
end
function (@main)(args...)
for (root, dirs, files) in walkdir("_book")
for fᵢ files
f = joinpath(root, fᵢ)
if endswith(f, ".html")
dirname(f) == "_book" && continue
_add_plotly(f)
end
end
end
#f = "_book/integrals/center_of_mass.html"
#_add_plotly(f)
return 1
end
["ODEs", "alternatives", "derivatives", "differentiable_vector_calculus", "integral_vector_calculus", "integrals", "limits", "misc", "precalc", "site_libs"]

View File

@@ -6,8 +6,8 @@ These notes use a particular selection of packages. This selection could have be
* The finding of zeros of scalar-valued, univariate functions is done with `Roots`. The [NonlinearSolve](./alternatives/SciML.html#nonlinearsolve) package provides an alternative for univariate and multi-variate functions.
* The finding of minima and maxima was done mirroring the framework of a typical calculus class; the [Optimization](./alternatives/SciML.html#optimization-optimization.jl) provides an alternative.
* The finding of minima and maxima was done mirroring the framework of a typical calculus class; the [Optimization](./alternatives/SciML.html#optimization-optimization.jl) package provides an alternative.
* The computation of numeric approximations for definite integrals is computed with the `QuadGK` and `HCubature` packages. The [Integrals](./alternatives/SciML.html#integration-integrals.jl) package provides a unified interface for numeric to these two packages, among others.
* The computation of numeric approximations for definite integrals is computed with the `QuadGK` and `HCubature` packages. The [Integrals](./alternatives/SciML.html#integration-integrals.jl) package provides a unified interface for numeric integration, including these two packages, among others.
* Plotting was done using the popular `Plots` package. The [Makie](./alternatives/makie_plotting.html) package provides a very powerful alternative. Whereas the [PlotlyLight](./alternatives/plotly_plotting.html) package provides a light-weight alternative using an open-source JavaScript library.

View File

@@ -5,11 +5,13 @@ ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
GLMakie = "e9467ef8-e4e7-5192-8a1a-b1aee30e663a"
GeometryBasics = "5c1252a2-5f33-56bf-86c9-59e7332b4326"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
Implicit3DPlotting = "d997a800-832a-4a4c-b340-7dddf3c1ad50"
Integrals = "de52edbc-65ea-441a-8357-d3a637375a31"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
Meshing = "e6723b4c-ebff-59f1-b4b7-d97aa5274f73"
ModelingToolkit = "961ee093-0014-501f-94e3-6117800e7a78"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
NonlinearSolve = "8913a72c-1f9b-4ce2-8d82-65094dcecaec"
Optimization = "7f7a1694-90dd-40f0-9382-eb1efda571ba"
OptimizationOptimJL = "36348300-93cb-4f02-beb5-3c3902f8871e"
@@ -19,6 +21,7 @@ PlotlyKaleido = "f2990250-8cf9-495f-b13a-cce12b45703c"
PlotlyLight = "ca7969ec-10b3-423e-8d99-40f33abb42bf"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuadGK = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
SplitApplyCombine = "03a91e81-4c3e-53e1-a0a4-9c0c8f19dd66"
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
@@ -26,3 +29,5 @@ SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
SymbolicLimits = "19f23fe9-fdab-4a78-91af-e7b7767979c3"
SymbolicNumericIntegration = "78aadeae-fbc0-11eb-17b6-c7ec0477ba9e"
Symbolics = "0c5d862f-8b57-4792-8d23-62f2024744c7"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -250,7 +250,7 @@ With the system defined, we can pass this to `NonlinearProblem`, as was done wit
```{julia}
prob = NonlinearProblem(ns, [1.0], [α => 1.0])
prob = NonlinearProblem(mtkcompile(ns), [1.0], Dict(α => 1.0))
```
The problem is solved as before:
@@ -573,13 +573,14 @@ As well, suppose we wanted to parameterize our function and then differentiate.
Consider $d/dp \int_0^\pi \sin(px) dx$. We can do this integral directly to get
$$
\begin{align*}
\frac{d}{dp} \int_0^\pi \sin(px)dx
&= \frac{d}{dp}\left( \frac{-1}{p} \cos(px)\Big\rvert_0^\pi\right)\\
&= \frac{d}{dp}\left( -\frac{\cos(p\cdot\pi)-1}{p}\right)\\
&= \frac{\cos(p\cdot \pi) - 1}{p^2} + \frac{\pi\cdot\sin(p\cdot\pi)}{p}
\end{align*}
$$
Using `Integrals` with `QuadGK` we have:

View File

@@ -1,6 +1,5 @@
# Calculus plots with Makie
{{< include ../_common_code.qmd >}}
The [Makie.jl webpage](https://github.com/JuliaPlots/Makie.jl) says
@@ -36,8 +35,7 @@ using GLMakie
import LinearAlgebra: norm
```
The `Makie` developers have workarounds for the delayed time to first plot, but without utilizing these the time to load the package is lengthy.
The package load time as of recent version of `Makie` is quite reasonable for a complicated project. (The time to first plot is under 3 seconds on a typical machine.)
## Points (`scatter`)
@@ -158,7 +156,8 @@ A point is drawn with a "marker" with a certain size and color. These attributes
```{julia}
scatter(xs, ys;
marker=[:x,:cross, :circle], markersize=25,
marker=[:x,:cross, :circle],
markersize=25,
color=:blue)
```
@@ -176,7 +175,7 @@ A single value will be repeated. A vector of values of a matching size will spec
## Curves
The curves of calculus are lines. The `lines` command of `Makie` will render a curve by connecting a series of points with straight-line segments. By taking a sufficient number of points the connect-the-dot figure can appear curved.
A visualization of a curve in calculus is comprised of line segments. The `lines` command of `Makie` will render a curve by connecting a series of points with straight-line segments. By taking a sufficient number of points the connect-the-dot figure can appear curved.
### Plots of univariate functions
@@ -304,7 +303,6 @@ current_figure()
### Text (`annotations`)
Text can be placed at a point, as a marker is. To place text, the desired text and a position need to be specified along with any adjustments to the default attributes.
@@ -315,25 +313,43 @@ For example:
xs = 1:5
pts = Point2.(xs, xs)
scatter(pts)
annotations!("Point " .* string.(xs), pts;
fontsize = 50 .- 2*xs,
rotation = 2pi ./ xs)
annotation!(pts;
text = "Point " .* string.(xs),
fontsize = 30 .- 5*xs)
current_figure()
```
The graphic shows that `fontsize` adjusts the displayed size and `rotation` adjusts the orientation. (The graphic also shows a need to manually override the limits of the `y` axis, as the `Point 5` is chopped off; the `ylims!` function to do so will be shown later.)
The graphic shows that `fontsize` adjusts the displayed size.
Attributes for `text`, among many others, include:
* `align` Specify the text alignment through `(:pos, :pos)`, where `:pos` can be `:left`, `:center`, or `:right`.
* `rotation` to indicate how the text is to be rotated
* `fontsize` the font point size for the text
* `font` to indicate the desired font
Annotations with an arrow can be useful to highlight a feature of a graph. This example is modified from the documentation and utilizes some interval functions to draw an arrow with an arc:
```{julia}
g(x) = cos(6x) * exp(x)
xs = 0:0.01:4
_, ax, _ = lines(xs, g.(xs); axis = (; xgridvisible = false, ygridvisible = false))
annotation!(ax, 1, 20, 2.1, g(2.1),
text = "A relative maximum",
path = Ann.Paths.Arc(0.3),
style = Ann.Styles.LineArrow(),
labelspace = :data
)
current_figure()
```
#### Line attributes
@@ -666,6 +682,7 @@ A surface of revolution for $g(u)$ revolved about the $z$ axis can be visualized
```{julia}
g(u) = u^2 * exp(-u)
r(u,v) = (g(u)*sin(v), g(u)*cos(v), u)
us = range(0, 3, length=10)
vs = range(0, 2pi, length=10)
xs, ys, zs = parametric_grid(us, vs, r)
@@ -681,6 +698,7 @@ A torus with big radius $2$ and inner radius $1/2$ can be visualized as follows
```{julia}
r1, r2 = 2, 1/2
r(u,v) = ((r1 + r2*cos(v))*cos(u), (r1 + r2*cos(v))*sin(u), r2*sin(v))
us = vs = range(0, 2pi, length=25)
xs, ys, zs = parametric_grid(us, vs, r)
@@ -696,6 +714,7 @@ A Möbius strip can be produced with:
ws = range(-1/4, 1/4, length=8)
thetas = range(0, 2pi, length=30)
r(w, θ) = ((1+w*cos(θ/2))*cos(θ), (1+w*cos(θ/2))*sin(θ), w*sin(θ/2))
xs, ys, zs = parametric_grid(ws, thetas, r)
surface(xs, ys, zs)
@@ -865,20 +884,19 @@ end
#### Implicitly defined surfaces, $F(x,y,z)=0$
To plot the equation $F(x,y,z)=0$, for $F$ a scalar-valued function, again the implicit function theorem says that, under conditions, near any solution $(x,y,z)$, $z$ can be represented as a function of $x$ and $y$, so the graph will look likes surfaces stitched together. The `Implicit3DPlotting` package takes an approach like `ImplicitPlots` to represent these surfaces. It replaces the `Contour` package computation with a $3$-dimensional alternative provided through the `Meshing` and `GeometryBasics` packages.
To plot the equation $F(x,y,z)=0$, for $F$ a scalar-valued function, again the implicit function theorem says that, under conditions, near any solution $(x,y,z)$, $z$ can be represented as a function of $x$ and $y$, so the graph will look like surfaces stitched together.
```{julia}
using Implicit3DPlotting
```
With `Makie`, many implicitly defined surfaces can be adequately represented using `countour` with the attribute `levels=[0]`. We will illustrate this technique.
The `Implicit3DPlotting` package takes an approach like `ImplicitPlots` to represent these surfaces. It replaces the `Contour` package computation with a $3$-dimensional alternative provided through the `Meshing` and `GeometryBasics` packages. This package has a `plot_implicit_surface` function that does something similar to below. We don't illustrate it, as it *currently* doesn't work with the latest version of `Makie`.
This example, plotting an implicitly defined sphere, comes from the documentation of `Implicit3DPlotting`. The `f` to be plotted is a scalar-valued function of a vector:
To begin, we plot a sphere implicitly as a solution to $F(x,y,z) = x^2 + y^2 + z^2 - 1 = 0$>
```{julia}
f(x) = sum(x.^2) - 1
xlims = ylims = zlims = (-5, 5)
plot_implicit_surface(f; xlims, ylims, zlims)
f(x,y,z) = x^2 + y^2 + z^2 - 1
xs = ys = zs = range(-3/2, 3/2, 100)
contour(xs, ys, zs, f; levels=[0])
```
@@ -887,11 +905,13 @@ Here we visualize an intersection of a sphere with another figure:
```{julia}
r₂(x) = sum(x.^2) - 5/4 # a sphere
r₂(x) = sum(x.^2) - 2 # a sphere
r₄(x) = sum(x.^4) - 1
xlims = ylims = zlims = (-2, 2)
p = plot_implicit_surface(r₂; xlims, ylims, zlims, color=:yellow)
plot_implicit_surface!(p, r₄; xlims, ylims, zlims, color=:red)
ϕ(x,y,z) = (x,y,z)
xs = ys = zs = range(-2, 2, 100)
contour(xs, ys, zs, r₂∘ϕ; levels = [0], colormap=:RdBu)
contour!(xs, ys, zs, r₄∘ϕ; levels = [0], colormap=:viridis)
current_figure()
```
@@ -900,11 +920,12 @@ This example comes from [Wikipedia](https://en.wikipedia.org/wiki/Implicit_surfa
```{julia}
f(x,y,z) = 2y*(y^2 -3x^2)*(1-z^2) + (x^2 +y^2)^2 - (9z^2-1)*(1-z^2)
xlims = ylims = zlims = (-5/2, 5/2)
plot_implicit_surface(x -> f(x...); xlims, ylims, zlims)
xs = ys = zs = range(-5/2, 5/2, 100)
contour(xs, ys, zs, f; levels=[0], colormap=:RdBu)
```
(This figure does not render well through `contour(xs, ys, zs, f, levels=[0])`, as the hole is not shown.)
(This figure does not render well though, as the hole is not shown.)
For one last example from Wikipedia, we have the Cassini oval which "can be defined as the point set for which the *product* of the distances to $n$ given points is constant." That is:
@@ -915,8 +936,8 @@ function cassini(λ, ps = ((1,0,0), (-1, 0, 0)))
n = length(ps)
x -> prod(norm(x .- p) for p ∈ ps) - λ^n
end
xlims = ylims = zlims = (-2, 2)
plot_implicit_surface(cassini(1.05); xlims, ylims, zlims)
xs = ys = zs = range(-2, 2, 100)
contour(xs, ys, zs, cassini(0.80) ∘ ϕ; levels=[0], colormap=:RdBu)
```
## Vector fields. Visualizations of $f:R^2 \rightarrow R^2$
@@ -1064,7 +1085,7 @@ F
### Observables
The basic components of a plot in `Makie` can be updated [interactively](https://makie.juliaplots.org/stable/documentation/nodes/index.html#observables_interaction). `Makie` uses the `Observables` package which allows complicated interactions to be modeled quite naturally. In the following we give a simple example.
The basic components of a plot in `Makie` can be updated [interactively](https://makie.juliaplots.org/stable/documentation/nodes/index.html#observables_interaction). Historically `Makie` used the `Observables` package which allows complicated interactions to be modeled quite naturally. In the following we give a simple example, though newer versions of `Makie` rely on a different mechanism.
In Makie, an `Observable` is a structure that allows its value to be updated, similar to an array. When changed, observables can trigger an event. Observables can rely on other observables, so events can be cascaded.
@@ -1123,6 +1144,7 @@ end
lines!(ax, xs, f)
lines!(ax, points)
scatter!(ax, points; markersize=10)
current_figure()
```

View File

@@ -73,7 +73,6 @@ The `Config` constructor (from the `EasyConfig` package loaded with `PlotlyLight
```{julia}
#| hold: true
cfg = Config()
cfg.key1.key2.key3 = "value"
cfg
@@ -89,7 +88,6 @@ A basic scatter plot of points $(x,y)$ is created as follows:
```{julia}
#| hold: true
xs = 1:5
ys = rand(5)
data = Config(x = xs,
@@ -113,7 +111,6 @@ A line plot is very similar, save for a different `mode` specification:
```{julia}
#| hold: true
xs = 1:5
ys = rand(5)
data = Config(x = xs,
@@ -134,7 +131,6 @@ The line graph plays connect-the-dots with the points specified by paired `x` an
```{julia}
#| hold: true
data = Config(
x=[0,1,nothing,3,4,5],
y = [0,1,2,3,4,5],
@@ -149,7 +145,6 @@ More than one graph or layer can appear on a plot. The `data` argument can be a
```{julia}
#| hold: true
data = [Config(x = 1:5,
y = rand(5),
type = "scatter",
@@ -177,7 +172,6 @@ For example, here we plot the graphs of both the $\sin(x)$ and $\cos(x)$ over $[
```{julia}
#| hold: true
a, b = 0, 2pi
xs, ys = PlotUtils.adapted_grid(sin, (a,b))
@@ -193,7 +187,6 @@ The values for `a` and `b` are used to generate the $x$- and $y$-values. These c
```{julia}
#| hold: true
xs, ys = PlotUtils.adapted_grid(x -> x^5 - x - 1, (0, 2)) # answer is (0,2)
p = Plot([Config(x=xs, y=ys, name="Polynomial"),
Config(x=xs, y=0 .* ys, name="x-axis", mode="lines", line=Config(width=5))]
@@ -232,7 +225,6 @@ A marker's attributes can be adjusted by values passed to the `marker` key. Labe
```{julia}
#| hold: true
data = Config(x = 1:5,
y = rand(5),
mode="markers+text",
@@ -251,40 +243,7 @@ The `text` mode specification is necessary to have text be displayed on the char
#### RGB Colors
The `ColorTypes` package is the standard `Julia` package providing an `RGB` type (among others) for specifying red-green-blue colors. To make this work with `Config` and `JSON3` requires some type-piracy (modifying `Base.string` for the `RGB` type) to get, say, `RGB(0.5, 0.5, 0.5)` to output as `"rgb(0.5, 0.5, 0.5)"`. (RGB values in JavaScript are integers between $0$ and $255$ or floating point values between $0$ and $1$.) A string with this content can be specified. Otherwise, something like the following can be used to avoid the type piracy:
```{julia}
struct rgb
r
g
b
end
PlotlyLight.JSON3.StructTypes.StructType(::Type{rgb}) = PlotlyLight.JSON3.StructTypes.StringType()
Base.string(x::rgb) = "rgb($(x.r), $(x.g), $(x.b))"
```
With these defined, red-green-blue values can be used for colors. For example to give a range of colors, we might have:
```{julia}
#| hold: true
cols = [rgb(i,i,i) for i in range(10, 245, length=5)]
sizes = [12, 16, 20, 24, 28]
data = Config(x = 1:5,
y = rand(5),
mode="markers+text",
type="scatter",
name="scatter plot",
text = ["marker $i" for i in 1:5],
textposition = "top center",
marker = Config(size=sizes, color=cols)
)
Plot(data)
```
The `opacity` key can be used to control the transparency, with a value between $0$ and $1$.
The `ColorTypes` package is the standard `Julia` package providing an `RGB` type (among others) for specifying red-green-blue colors. To make this work with `Config` and `JSON3` requires some type-piracy (modifying `Base.string` for the `RGB` type) to get, say, `RGB(0.5, 0.5, 0.5)` to output as `"rgb(0.5, 0.5, 0.5)"`. (RGB values in JavaScript are integers between $0$ and $255$ or floating point values between $0$ and $1$.) A string with this content can be specified.
#### Marker symbols
@@ -293,7 +252,6 @@ The `marker_symbol` key can be used to set a marker shape, with the basic values
```{julia}
#| hold: true
markers = ["circle", "square", "diamond", "cross", "x", "triangle", "pentagon",
"hexagram", "star", "diamond", "hourglass", "bowtie", "asterisk",
"hash", "y", "line"]
@@ -327,7 +285,6 @@ The `shape` attribute determine how the points are connected. The default is `li
```{julia}
#| hold: true
shapes = ["linear", "hv", "vh", "hvh", "vhv", "spline"]
data = [Config(x = 1:5, y = 5*(i-1) .+ [1,3,2,3,1], mode="lines+markers", type="scatter",
name=shape,
@@ -358,7 +315,6 @@ In the following, to highlight the difference between $f(x) = \cos(x)$ and $p(x)
```{julia}
#| hold: true
xs = range(-1, 1, 100)
data = [
Config(
@@ -381,7 +337,6 @@ The `toself` declaration is used below to fill in a polygon:
```{julia}
#| hold: true
data = Config(
x=[-1,1,1,-1,-1], y = [-1,1,-1,1,-1],
fill="toself",
@@ -399,7 +354,6 @@ The legend is shown when $2$ or more charts or specified, by default. This can b
```{julia}
#| hold: true
data = Config(x=1:5, y=rand(5), type="scatter", mode="markers", name="legend label")
lyt = Config(title = "Main chart title",
xaxis = Config(title="x-axis label"),
@@ -416,7 +370,6 @@ The aspect ratio of the chart can be set to be equal through the `scaleanchor` k
```{julia}
#| hold: true
ts = range(0, 2pi, length=100)
data = Config(x = sin.(ts), y = cos.(ts), mode="lines", type="scatter")
lyt = Config(title = "A circle",
@@ -434,7 +387,6 @@ Text annotations may be specified as part of the layout object. Annotations may
```{julia}
#| hold: true
data = Config(x = [0, 1], y = [0, 1], mode="markers", type="scatter")
layout = Config(title = "Annotations",
xaxis = Config(title="x",
@@ -452,7 +404,7 @@ Plot(data, layout)
The following example is more complicated use of the elements previously described. It mimics an image from [Wikipedia](https://en.wikipedia.org/wiki/List_of_trigonometric_identities) for trigonometric identities. The use of `LaTeX` does not seem to be supported through the `JavaScript` interface; unicode symbols are used instead. The `xanchor` and `yanchor` keys are used to position annotations away from the default. The `textangle` key is used to rotate text, as desired.
```{julia, hold=true}
```{julia}
alpha = pi/6
beta = pi/5
xₘ = cos(alpha)*cos(beta)
@@ -569,7 +521,6 @@ Earlier, we plotted a two dimensional circle, here we plot the related helix.
```{julia}
#| hold: true
helix(t) = [cos(t), sin(t), t]
ts = range(0, 4pi, length=200)
@@ -596,7 +547,6 @@ There is no `quiver` plot for `plotly` using JavaScript. In $2$-dimensions a tex
```{julia}
#| hold: true
helix(t) = [cos(t), sin(t), t]
helix(t) = [-sin(t), cos(t), 1]
ts = range(0, 4pi, length=200)
@@ -642,7 +592,6 @@ A contour plot is created by the "contour" trace type. The data is prepared as a
```{julia}
#| hold: true
f(x,y) = x^2 - 2y^2
xs = range(0,2,length=25)
@@ -661,7 +610,6 @@ The same `zs` data can be achieved by broadcasting and then collecting as follow
```{julia}
#| hold: true
f(x,y) = x^2 - 2y^2
xs = range(0,2,length=25)
@@ -692,7 +640,6 @@ Surfaces defined through a scalar-valued function are drawn quite naturally, sav
```{julia}
#| hold: true
peaks(x,y) = 3 * (1-x)^2 * exp(-(x^2) - (y+1)^2) -
10*(x/5 - x^3 - y^5) * exp(-x^2-y^2) - 1/3 * exp(-(x+1)^2 - y^2)
@@ -713,7 +660,6 @@ For parametrically defined surfaces, the $x$ and $y$ values also correspond to m
```{julia}
#| hold: true
r, R = 1, 5
X(theta,phi) = [(r*cos(theta)+R)*cos(phi),
(r*cos(theta)+R)*sin(phi),

View File

@@ -601,14 +601,7 @@ det(N)
det(collect(N))
```
Similarly, with `norm`:
```{julia}
norm(v)
```
and
Similarly, with `norm`, which returns a generator unless collected:
```{julia}

3
quarto/basics.qmd Normal file
View File

@@ -0,0 +1,3 @@
# Mathematical basics
This chapter introduces some mathematical basics and their counterparts within the `Julia` programming language.

View File

@@ -0,0 +1,15 @@
[deps]
CalculusWithJulia = "a2e0e22d-7d4c-5312-9169-8b992201a882"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
Measures = "442fdcdd-2543-5da2-b0f3-8c86c306513e"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
PlotlyBase = "a03496cd-edff-5a9b-9e67-9cda94a718b5"
PlotlyKaleido = "f2990250-8cf9-495f-b13a-cce12b45703c"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Primes = "27ebfcd6-29c5-5fa9-bf4b-fb8fc14df3ae"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -0,0 +1 @@
../basics.qmd

View File

@@ -207,6 +207,7 @@ The authors note in a supplement to their paper that over 15,700 species and sub
(3.02 * 10^15 + 1.34*10^15) * 100 / 22
```
The answer is in *scientific notation* and reads as $1.98\ldots \cdot 10^{16}$.
Shifting the decimal point, this gives a value rounded to $20\cdot 10^{15}$ ants.
The authors used a value for the *dry weight* of an average (and representative) single ant. What was that value? (Which they indicate is perhaps unreliable,
@@ -216,7 +217,7 @@ as, for example, small-bodied ants may be much more abundant than large-bodied a
(12 * 1_000_000 * 1_000 * 1_000) / 20_000_000_000_000_000
```
Which translates to an *average* dry *carbon* weight of $0.6/1000$ grams, that is $0.6$ milligrams ($0.62$ mg C was actually used).
Which translates to an *average* dry *carbon* weight of $0.6/1000$ grams, that is $0.6$ milligrams ($0.62$ mg C was actually used). The underscores in `20_000_000_000_000_000` are *ignored* when parsed and are only for readability. This is a readable alternate to scientific notation for large numbers.
The authors write that insects are generally considered to have a dry weight of 30% wet weight, and a carbon weight of 50% dry weight, so the weight in grams of an *average* living ant would be multiplied by $2$ and then $10/3$:
@@ -228,8 +229,6 @@ That is 4 milligrams, or 250 ants per gram on average.
Numeric combinations, as above, will be easier to check for correctness when variable names are assigned to the respective values.
Using the underscore, as above, to separate groups of digits, is helpful, as an alternate to scientific notation, when working with large numbers.
@@ -245,7 +244,7 @@ With the Google Calculator, typing `1 + 2 x 3 =` will give the value $7$, but *i
In `Julia`, the entire expression is typed in before being evaluated, so the usual conventions of mathematics related to the order of operations may be used. These are colloquially summarized by the acronym [PEMDAS](http://en.wikipedia.org/wiki/Order_of_operations).
> **PEMDAS**. This acronym stands for Parentheses, Exponents, Multiplication, Division, Addition, Subtraction. The order indicates which operation has higher precedence, or should happen first. This isn't exactly the case, as "M" and "D" have the same precedence, as do "A" and "S". In the case of two operations with equal precedence, *associativity* is used to decide which to do. For the operations `+`, `-`, `*`, `/` the associativity is left to right, as in the left one is done first, then the right. However, `^` has right associativity, so `4^3^2` is `4^(3^2)` and not `(4^3)^2`. (Be warned that some calculators - and spread sheets, such as Excel - will treat this expression with left associativity.)
> **PEMDAS**. This acronym stands for Parentheses, Exponents, Multiplication, Division, Addition, Subtraction. The order indicates which operation has higher precedence, or should happen first. This isn't exactly the case, as "M" and "D" have the same precedence, as do "A" and "S". In the case of two operations with equal precedence, *associativity* is used to decide which to do. For the operations `-`, `/` the associativity is left to right, as in the left one is done first, then the right. However, `^` has right associativity, so `4^3^2` is `4^(3^2)` and not `(4^3)^2` (Be warned that some calculators - and spread sheets, such as Excel - will treat this expression with left associativity). But, `+` and `*` don't have associativity, so `1+2+3` can be `(1+2)+3` or `1+(2+3)`.
@@ -326,7 +325,7 @@ $$
\frac{1 + 2}{3 + 4}?
$$
It would have to be computed through $(1 + 2) / (3 + 4)$. This is because unlike `/`, the implied order of operation in the mathematical notation with the *horizontal division symbol* (the [vincula](http://tinyurl.com/y9tj6udl)) is to compute the top and the bottom and then divide. That is, the vincula is a grouping notation like parentheses, only implicitly so. Thus the above expression really represents the more verbose:
It would have to be computed through $(1 + 2) / (3 + 4)$. This is because unlike `/`, the implied order of operation in the mathematical notation with the *horizontal division symbol* (the [vinculum](http://tinyurl.com/y9tj6udl)) is to compute the top and the bottom and then divide. That is, the vincula is a grouping notation like parentheses, only implicitly so. Thus the above expression really represents the more verbose:
$$
@@ -368,7 +367,7 @@ In `Julia` many infix operations can be done using a prefix manner. For example
## Constants
The Google calculator has two built in constants, `e` and `π`. Julia provides these as well, though not quite as easily. First, `π` is just `pi`:
The Google calculator has two built in constants, `e` and `π`. Julia provides these as well, though not quite as easily, as they have names and not dedicated buttons. First, `π` is just `pi`:
```{julia}
@@ -407,7 +406,7 @@ In most cases. There are occasional (basically rare) spots where using `pi` by i
### Numeric literals
For some special cases, Julia parses *multiplication* without a multiplication symbol. This is when the value on the left is a number, as in `2pi`, which has an equivalent value to `2*pi`. *However* the two are not equivalent, in that multiplication with *numeric literals* does not have the same precedence as regular multiplication - it is higher. This has practical importance when used in division or powers. For instance, these two are **not** the same:
For some special cases, Julia parses *multiplication* without a multiplication symbol. One case is when the value on the left is a number, as in `2pi`, which has an equivalent value to `2*pi`. *However* the two are not equivalent, in that multiplication with *numeric literals* does not have the same precedence as regular multiplication - it is higher. This has practical importance when used in division or powers. For instance, these two expressions are **not** the same:
```{julia}
@@ -485,7 +484,9 @@ Using a function is very straightforward. A function is called using parentheses
sqrt(4), sqrt(5)
```
The function is referred to by name (`sqrt`) and called with parentheses. Any arguments are passed into the function using commas to separate values, should there be more than one. When there are numerous values for a function, the arguments may need to be given in a specific order or may possibly be specified with *keywords*. (A semicolon can be used instead of a comma to separate keyword arguments.)
The function is referred to by name (`sqrt`) and called with parentheses.
Any arguments are passed into the function using commas to separate values, should there be more than one. When there are numerous values for a function, the arguments may need to be given in a specific order or may possibly be specified with *keywords*. (A semicolon can be used instead of a comma to separate keyword arguments from positional arguments.)
Some more examples:
@@ -533,7 +534,7 @@ sqrt(11^2 + 12^2)
##### Example
A formula from statistics to compute the variance of a binomial random variable for parameters $p$ and $n$ is $\sqrt{n p (1-p)}$. Compute this value for $p=1/4$ and $n=10$.
A formula from statistics to compute the variance of a binomial random variable with parameters $p$ and $n$ is $\sqrt{n p (1-p)}$. Compute this value for $p=1/4$ and $n=10$.
```{julia}
@@ -568,20 +569,54 @@ Not all computations on a calculator are valid. For example, the Google calculat
In `Julia`, there is a richer set of error types. The value `0/0` will in fact not be an error, but rather a value `NaN`. This is a special floating point value indicating "not a number" and is the result for various operations. The output of $\sqrt{-1}$ (computed via `sqrt(-1)`) will indicate a domain error:
```{julia}
#| error: true
sqrt(-1)
```
Other calls may result in an overflow error:
```{julia}
#| error: true
factorial(1000)
```
How `Julia` handles overflow is a study in tradeoffs. For integer operations that demand high performance, `Julia` does not check for overflow. So, for example, if we are not careful strange answers can be had. Consider the difference here between powers of 2:
How `Julia` handles overflow is a study in tradeoffs. For integer operations that demand high performance, `Julia` does not check for overflow. So, for example, if we are not careful strange answers can be had. Consider the difference here between these powers of 2:
```{julia}
2^62, 2^63
2^62, 2^63, 2^64
```
On a machine with $64$-bit integers, the first of these two values is correct, the second, clearly wrong, as the answer given is negative. This is due to overflow. The cost of checking is considered too high, so no error is thrown. The user is expected to have a sense that they need to be careful when their values are quite large. (Or the user can use floating point numbers, which though not always exact, can represent much bigger values and are exact for a reasonably wide range of integer values.)
On a machine with $64$-bit integers, the first of these two values is correct, the third is clearly "wrong," and the second clearly "wrong," as the answer given is negative.
Wrong is in quotes, as though they are mathematically incorrect, computationally they are correct. The last two are due to overflow. The cost of checking is considered too high, so no error is thrown and the values represent what happens at the machine level.
The user is expected to have a sense that they need to be careful when their values are quite large. But the better recommendation is that the user use floating point numbers, which as easy as typing `2.0^63`. Though not always exact, floating point values can represent a much bigger range values and are exact for a reasonably wide range of integer values.
::: {.callout-note}
## Bit-level details
We can see in the following, using the smaller 8-bit type, what goes on internally with successive powers of `2`: the bit pattern is found by shifting the previous one over to the left, consistent with what happens at the bit level when multiplying by `2`:
```
[bitstring(Int8(2)^i) for i in 1:8]
```
The last line is similar to what happens to `2^64` which is also `0`, as seen. The second to last line requires some understanding of how integers are represented internally. Of the 8 bits, the last 7 represent which powers of `2` (`2^6`, `2^5`, ..., `2^1`, `2^0`) are included. The first `1` represents `-2^7`. So all zeros, as we see `2^8` is, means `0`, but `"10000000"` means `-2^7`. The largest *positive* number would be represented as `"01111111"` or $2^6 + 2^5 + \cdots + 2^1 + 2^0 = 2^7 - 1$. These values can be seen using `typemin` and `typemax`:
```{julia}
typemin(Int8), typemax(Int8)
```
For 64-bit, these are:
```{julia}
typemin(Int), typemax(Int)
```
And we see, `2^63` is also just `typemin(Int)`.
:::
:::{.callout-warning}
@@ -1135,7 +1170,7 @@ choices = [
"`10e21`",
"`1e22`",
"`10_000_000_000_000_000_000_000`"]
explanation = "With an *integer* base, `10^21` overflows. For typical integers, only `10^18` is defined as expected. Once `10^19` is entered the mathematical value is larger the the `typemax` for `Int64` and so the value *wraps* around. The number written out with underscores to separate groups of 0s is parsed as an integer with 128 bits, not 64."
explanation = "With an *integer* base, `10^21` overflows. For typical integers, only `10^18` is defined as expected. Once `10^19` is entered the mathematical value is larger than the `typemax` for `Int64` and so the value *wraps* around. The number written out with underscores to separate groups of 0s is parsed as an integer with 128 bits, not 64."
buttonq(choices, 1; explanation=explanation)
```

View File

Before

Width:  |  Height:  |  Size: 50 KiB

After

Width:  |  Height:  |  Size: 50 KiB

View File

Before

Width:  |  Height:  |  Size: 114 KiB

After

Width:  |  Height:  |  Size: 114 KiB

View File

Before

Width:  |  Height:  |  Size: 10 KiB

After

Width:  |  Height:  |  Size: 10 KiB

View File

@@ -94,7 +94,7 @@ Well, almost... When `Inf` or `NaN` are involved, this may not hold, for example
So adding or subtracting most any finite value from an inequality will preserve the inequality, just as it does for equations.
What about addition and multiplication?
What about multiplication?
Consider the case $a < b$ and $c > 0$. Then $ca < cb$. Here we investigate using $3$ random values (which will be positive):
@@ -231,7 +231,7 @@ Read aloud this would be "minus $7$ is less than $x$ minus $5$ **and** $x$ minus
The "and" equations can be combined as above with a natural notation. However, an equation like $\lvert x - 5\rvert > 7$ would emphasize an **or** and be "$x$ minus $5$ less than minus $7$ **or** $x$ minus $5$ greater than $7$". Expressing this requires some new notation.
The *boolean shortcut operators* `&&` and `||` implement "and" and "or." (There are also *bitwise* boolean operators `&` and `|`, but we only describe the former.)
The *boolean shortcut operators* `&&` and `||` implement "and" and "or". (There are also *bitwise* boolean operators `&` and `|`, but we only describe the former.)
Thus we could write $-7 < x-5 < 7$ as

16
quarto/basics/make_pdf.jl Normal file
View File

@@ -0,0 +1,16 @@
module Make
# makefile for generating typst pdfs
# per directory usage
dir = "basics"
files = ("calculator",
"variables",
"numbers_types",
"logical_expressions",
"vectors",
"ranges",
)
include("../_make_pdf.jl")
main()
end

View File

@@ -26,7 +26,7 @@ On top of these, we have special subsets, such as the natural numbers $\{1, 2, \
Mathematically, these number systems are naturally nested within each other as integers are rational numbers which are real numbers, which can be viewed as part of the complex numbers.
Calculators typically have just one type of number - floating point values. These model the real numbers. `Julia`, on the other hand, has a rich type system, and within that has many different number types. There are types that model each of the four main systems above, and within each type, specializations for how these values are stored.
Calculators typically have just one type of number - floating point values. These model the real numbers. `Julia`, on the other hand, has a rich type system, and within that has several different number types. There are types that model each of the four main systems above, and within each type, specializations for how these values are stored.
Most of the details will not be of interest to all, and will be described later.
@@ -62,8 +62,7 @@ Similarly, each type is printed slightly differently.
The key distinction is between integers and floating points. While floating point values include integers, and so can be used exclusively on the calculator, the difference is that an integer is guaranteed to be an exact value, whereas a floating point value, while often an exact representation of a number is also often just an *approximate* value. This can be an advantage floating point values can model a much wider range of numbers.
Now in nearly all cases the differences are not noticeable. Take for instance this simple calculation involving mixed types.
In nearly all cases the differences are not noticeable. Take for instance this simple calculation involving mixed types.
```{julia}
@@ -89,10 +88,10 @@ These values are *very* small numbers, but not exactly $0$, as they are mathemat
---
The only common issue is with powers. `Julia` tries to keep a predictable output from the input types (not their values). Here are the two main cases that arise where this can cause unexpected results:
The only common issue is with powers. We saw this previously when discussing a distinction between `2^64` and `2.0^64`. `Julia` tries to keep a predictable output from the input types (not their values). Here are the two main cases that arise where this can cause unexpected results:
* integer bases and integer exponents can *easily* overflow. Not only `m^n` is always an integer, it is always an integer with a fixed storage size computed from the sizes of `m` and `n`. So the powers can quickly get too big. This can be especially noticeable on older $32$-bit machines, where too big is $2^{32} = 4,294,967,296$. On $64$-bit machines, this limit is present but much bigger.
* integer bases and integer exponents can *easily* overflow. Not only `m^n` is always an integer, it is always an integer with a fixed storage size computed from the sizes of `m` and `n`. So the powers can quickly get too big. This can be especially noticeable on older $32$-bit machines, where too big is $2^{32} = 4,294,967,296$. On $64$-bit machines, this limit is present but much bigger.
Rather than give an error though, `Julia` gives seemingly arbitrary answers, as can be seen in this example on a $64$-bit machine:
@@ -102,13 +101,13 @@ Rather than give an error though, `Julia` gives seemingly arbitrary answers, as
2^62, 2^63
```
(They aren't arbitrary, rather integer arithmetic is implemented as modular arithmetic.)
(They aren't arbitrary, as explained previously.)
This could be worked around, as it is with some programming languages, but it isn't, as it would slow down this basic computation. So, it is up to the user to be aware of cases where their integer values can grow to big. The suggestion is to use floating point numbers in this domain, as they have more room, at the cost of sometimes being approximate values.
This could be worked around, as it is with some programming languages, but it isn't, as it would slow down this basic computation. So, it is up to the user to be aware of cases where their integer values can grow to big. The suggestion is to use floating point numbers in this domain, as they have more room, at the cost of sometimes being approximate values for fairly large values.
* the `sqrt` function will give a domain error for negative values:
* the `sqrt` function will give a domain error for negative values:
```{julia}
@@ -161,11 +160,12 @@ Integers are often used casually, as they come about from parsing. As with a cal
### Floating point numbers
[Floating point](http://en.wikipedia.org/wiki/Floating_point) numbers are a computational model for the real numbers. For floating point numbers, $64$ bits are used by default for both $32$- and $64$-bit systems, though other storage sizes can be requested. This gives a large ranging - but still finite - set of real numbers that can be represented. However, there are infinitely many real numbers just between $0$ and $1$, so there is no chance that all can be represented exactly on the computer with a floating point value. Floating point then is *necessarily* an approximation for all but a subset of the real numbers. Floating point values can be viewed in normalized [scientific notation](http://en.wikipedia.org/wiki/Scientific_notation) as $a\cdot 2^b$ where $a$ is the *significand* and $b$ is the *exponent*. Save for special values, the significand $a$ is normalized to satisfy $1 \leq \lvert a\rvert < 2$, the exponent can be taken to be an integer, possibly negative.
[Floating point](http://en.wikipedia.org/wiki/Floating_point) numbers are a computational model for the real numbers. For floating point numbers, $64$ bits are used by default for both $32$- and $64$-bit systems, though other storage sizes can be requested. This gives a large range - but still finite - set of real numbers that can be represented. However, there are infinitely many real numbers just between $0$ and $1$, so there is no chance that all can be represented exactly on the computer with a floating point value. Floating point then is *necessarily* an approximation for all but a subset of the real numbers. Floating point values can be viewed in normalized [scientific notation](http://en.wikipedia.org/wiki/Scientific_notation) as $a\cdot 2^b$ where $a$ is the *significand* and $b$ is the *exponent*. Save for special values, the significand $a$ is normalized to satisfy $1 \leq \lvert a\rvert < 2$, the exponent can be taken to be an integer, possibly negative.
As per IEEE Standard 754, the `Float64` type gives 52 bits to the precision (with an additional implied one), 11 bits to the exponent and the other bit is used to represent the sign. Positive, finite, floating point numbers have a range approximately between $10^{-308}$ and $10^{308}$, as 308 is about $\log_{10} 2^{1023}$. The numbers are not evenly spread out over this range, but, rather, are much more concentrated closer to $0$.
The use of 32-bit floating point values is common, as some widely used computer chips expect this. These values have a narrower range of possible values.
:::{.callout-warning}
## More on floating point numbers
@@ -174,12 +174,12 @@ You can discover more about the range of floating point values provided by calli
* `typemax(0.0)` gives the largest value for the type (`Inf` in this case).
* `prevfloat(Inf)` gives the largest finite one, in general `prevfloat` is the next smallest floating point value.
:::
* `nextfloat(-Inf)`, similarly, gives the smallest finite floating point value, and in general returns the next largest floating point value.
* `nextfloat(0.0)` gives the closest positive value to 0.
* `eps()` gives the distance to the next floating point number bigger than `1.0`. This is sometimes referred to as machine precision.
:::
#### Scientific notation
@@ -205,8 +205,10 @@ The special coding `aeb` (or if the exponent is negative `ae-b`) is used to repr
avogadro = 6.022e23
```
Here `e` is decidedly *not* the Euler number, rather syntax to separate the exponent from the mantissa.
::: {.callout-note}
## Not `e`
Here `e` is decidedly *not* the Euler number, rather **syntax** to separate the exponent from the mantissa.
:::
The first way of representing this number required using `10.0` and not `10` as the integer power will return an integer and even for 64-bit systems is only valid up to `10^18`. Using scientific notation avoids having to concentrate on such limitations.
@@ -296,7 +298,7 @@ That is adding `1/10` and `2/10` is not exactly `3/10`, as expected mathematical
1/10 + (2/10 + 3/10) == (1/10 + 2/10) + 3/10
```
* For real numbers subtraction of similar-sized numbers is not exceptional, for example $1 - \cos(x)$ is positive if $0 < x < \pi/2$, say. This will not be the case for floating point values. If $x$ is close enough to $0$, then $\cos(x)$ and $1$ will be so close, that they will be represented by the same floating point value, `1.0`, so the difference will be zero:
* Mathematically, for real numbers, subtraction of similar-sized numbers is not exceptional, for example $1 - \cos(x)$ is positive if $0 < x < \pi/2$, say. This will not be the case for floating point values. If $x$ is close enough to $0$, then $\cos(x)$ and $1$ will be so close, that they will be represented by the same floating point value, `1.0`, so the difference will be zero:
```{julia}
@@ -306,7 +308,7 @@ That is adding `1/10` and `2/10` is not exactly `3/10`, as expected mathematical
### Rational numbers
Rational numbers can be used when the exactness of the number is more important than the speed or wider range of values offered by floating point numbers. In `Julia` a rational number is comprised of a numerator and a denominator, each an integer of the same type, and reduced to lowest terms. The operations of addition, subtraction, multiplication, and division will keep their answers as rational numbers. As well, raising a rational number to a positive, integer value will produce a rational number.
Rational numbers can be used when the exactness of the number is more important than the speed or wider range of values offered by floating point numbers. In `Julia` a rational number is comprised of a numerator and a denominator, each an integer of the same type, and reduced to lowest terms. The operations of addition, subtraction, multiplication, and division will keep their answers as rational numbers. As well, raising a rational number to an integer value will produce a rational number.
As mentioned, these are constructed using double slashes:
@@ -402,6 +404,42 @@ Though complex numbers are stored as pairs of numbers, the imaginary unit, `im`,
:::
### Strings and symbols
For text, `Julia` has a `String` type. When double quotes are used to specify a string, the parser creates this type:
```{julia}
x = "The quick brown fox jumped over the lazy dog"
typeof(x)
```
Values can be inserted into a string through *interpolation* using a dollar sign.
```{julia}
animal = "lion"
x = "The quick brown $(animal) jumped over the lazy dog"
```
The use of parentheses allows more complicated expressions; it isn't always necessary.
Longer strings can be produced using *triple* quotes:
```{julia}
lincoln = """
Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
"""
```
Strings are comprised of *characters* which can be produced directly using *single* quotes:
```{julia}
'c'
```
We won't use these.
Finally, `Julia` has *symbols* which are *interned* strings which are used as identifiers. Symbols are used for advanced programming techniques; we will only see them as shortcuts to specify plotting arguments.
## Type stability
@@ -641,7 +679,7 @@ Finding the value through division introduces a floating point deviation. Which
```{julia}
#| echo: false
as = [`1/10^21`, `1e-21`]
as = ["1/10^21", "1e-21"]
explanation = "The scientific notation is correct. Due to integer overflow `10^21` is not the same number as `10.0^21`."
buttonq(as, 2; explanation=explanation)
buttonq(as, 2; explanation)
```

View File

@@ -1,4 +1,4 @@
# Ranges and Sets
# Ranges and sets
{{< include ../_common_code.qmd >}}
@@ -66,7 +66,7 @@ Rather than express sequences by the $a_0$, $h$, and $n$, `Julia` uses the start
1:10
```
But wait, nothing different printed? This is because `1:10` is efficiently stored. Basically, a recipe to generate the next number from the previous number is created and `1:10` just stores the start and end point and that recipe is used to generate the set of all values. To expand the values, you have to ask for them to be `collect`ed (though this typically isn't needed in practice):
But wait, nothing different printed? This is because `1:10` is efficiently stored. Basically, a recipe to generate the next number from the previous number is created and `1:10` just stores the start and end points (the step size is implicit in how this is stored) and that recipe is used to generate the set of all values. To expand the values, you have to ask for them to be `collect`ed (though this typically isn't needed in practice, as values are usually *iterated* over):
```{julia}
@@ -103,13 +103,13 @@ h = (b-a)/(n-1)
collect(a:h:b)
```
Pretty neat. If we were doing this many times - such as once per plot - we'd want to encapsulate this into a function, for example:
Pretty neat. If we were doing this many times - such as once per plot - we'd want to encapsulate this into a function, for example using a comprehension:
```{julia}
function evenly_spaced(a, b, n)
h = (b-a)/(n-1)
collect(a:h:b)
[a + i*h for i in 0:n-1]
end
```
@@ -131,27 +131,20 @@ It seems to work as expected. But looking just at the algorithm it isn't quite s
```{julia}
1/5 + 2*1/5 # last value
1/5 + 2*1/5 # last value if h is exactly 1/5 or 0.2
```
Floating point roundoff leads to the last value *exceeding* `0.6`, so should it be included? Well, here it is pretty clear it *should* be, but better to have something programmed that hits both `a` and `b` and adjusts `h` accordingly.
Floating point roundoff leads to the last value *exceeding* `0.6`, so should it be included? Well, here it is pretty clear it *should* be, but better to have something programmed that hits both `a` and `b` and adjusts `h` accordingly. Something which isn't subject to the vagaries of `(3/5 - 1/5)/2` not being `0.2`.
Enter the base function `range` which solves this seemingly simple - but not really - task. It can use `a`, `b`, and `n`. Like the range operation, this function returns a generator which can be collected to realize the values.
The number of points is specified with keyword arguments, as in:
The number of points is specified as a third argument (though keyword arguments can be given):
```{julia}
xs = range(-1, 1, length=9) # or simply range(-1, 1, 9) as of v"1.7"
```
and
```{julia}
collect(xs)
xs = range(1/5, 3/5, 3) |> collect
```
:::{.callout-note}
@@ -168,11 +161,13 @@ Now we concentrate on some more general styles to modify a sequence to produce a
### Filtering
The act of throwing out elements of a collection based on some condition is called *filtering*.
For example, another way to get the values between $0$ and $100$ that are multiples of $7$ is to start with all $101$ values and throw out those that don't match. To check if a number is divisible by $7$, we could use the `rem` function. It gives the remainder upon division. Multiples of `7` match `rem(m, 7) == 0`. Checking for divisibility by seven is unusual enough there is nothing built in for that, but checking for division by $2$ is common, and for that, there is a built-in function `iseven`.
The act of throwing out elements of a collection based on some condition is called *filtering*. The `filter` function does this in `Julia`; the basic syntax being `filter(predicate_function, collection)`. The "`predicate_function`" is one that returns either `true` or `false`, such as `iseven`. The output of `filter` consists of the new collection of values - those where the predicate returns `true`.
The `filter` function does this in `Julia`; the basic syntax being `filter(predicate_function, collection)`. The "`predicate_function`" is one that returns either `true` or `false`, such as `iseven`. The output of `filter` consists of the new collection of values - those where the predicate returns `true`.
To see it used, lets start with the numbers between `0` and `25` (inclusive) and filter out those that are even:
@@ -210,7 +205,7 @@ Let's return to the case of the set of even numbers between $0$ and $100$. We ha
* The set of numbers $\{2k: k=0, \dots, 50\}$.
While `Julia` has a special type for dealing with sets, we will use a vector for such a set. (Unlike a set, vectors can have repeated values, but as vectors are more widely used, we demonstrate them.) Vectors are described more fully in a previous section, but as a reminder, vectors are constructed using square brackets: `[]` (a special syntax for [concatenation](http://docs.julialang.org/en/latest/manual/arrays/#concatenation)). Square brackets are used in different contexts within `Julia`, in this case we use them to create a *collection*. If we separate single values in our collection by commas (or semicolons), we will create a vector:
While `Julia` has a special type for dealing with sets, we will use a vector for such a set. (Unlike a set, vectors can have repeated values, but, as vectors are more widely used, we demonstrate them.) Vectors are described more fully in a previous section, but as a reminder, vectors are constructed using square brackets: `[]` (a special syntax for [concatenation](http://docs.julialang.org/en/latest/manual/arrays/#concatenation)). Square brackets are used in different contexts within `Julia`, in this case we use them to create a *collection*. If we separate single values in our collection by commas (or semicolons), we will create a vector:
```{julia}
@@ -264,7 +259,7 @@ Here are decreasing powers of $2$:
[1/2^i for i in 1:10]
```
Sometimes, the comprehension does not produce the type of output that may be expected. This is related to `Julia`'s more limited abilities to infer types at the command line. If the output type is important, the extra prefix of `T[]` can be used, where `T` is the desired type. We will see that this will be needed at times with symbolic math.
Sometimes, the comprehension does not produce the type of output that may be expected. This is related to `Julia`'s more limited abilities to infer types at the command line. If the output type is important, the extra prefix of `T[]` can be used, where `T` is the desired type.
### Generators
@@ -277,20 +272,24 @@ A typical pattern would be to generate a collection of numbers and then apply a
sum([2^i for i in 1:10])
```
Conceptually this is easy to understand, but computationally it is a bit inefficient. The generator syntax allows this type of task to be done more efficiently. To use this syntax, we just need to drop the `[]`:
Conceptually this is easy to understand: one step generates the numbers, the other adds them up. Computationally it is a bit inefficient. The generator syntax allows this type of task to be done more efficiently. To use this syntax, we just need to drop the `[]`:
```{julia}
sum(2^i for i in 1:10)
```
(The difference being no intermediate object is created to store the collection of all values specified by the generator.)
The difference being no intermediate object is created to store the collection of all values specified by the generator. Not all functions allow generators as arguments, but most common reductions do.
### Filtering generated expressions
Both comprehensions and generators allow for filtering through the keyword `if`. The following shows *one* way to add the prime numbers in $[1,100]$:
Both comprehensions and generators allow for filtering through the keyword `if`. The basic pattern is
`[expr for variable in collection if expr]`
The following shows *one* way to add the prime numbers in $[1,100]$:
```{julia}
@@ -299,6 +298,10 @@ sum(p for p in 1:100 if isprime(p))
The value on the other side of `if` should be an expression that evaluates to either `true` or `false` for a given `p` (like a predicate function, but here specified as an expression). The value returned by `isprime(p)` is such.
::: {.callout-note}
In these notes we primarily use functions rather than expressions for various actions. We will see creating a function is not much more difficult than specifying an expression, though there is additional notation necessary. Generators are one very useful means to use expressions, symbolic math will be seen as another.
:::
In this example, we use the fact that `rem(k, 7)` returns the remainder found from dividing `k` by `7`, and so is `0` when `k` is a multiple of `7`:
@@ -323,7 +326,7 @@ This example of Stefan Karpinski's comes from a [blog](http://julialang.org/blog
First, a simple question: using pennies, nickels, dimes, and quarters how many different ways can we generate one dollar? Clearly $100$ pennies, or $20$ nickels, or $10$ dimes, or $4$ quarters will do this, so the answer is at least four, but how much more than four?
Well, we can use a comprehension to enumerate the possibilities. This example illustrates how comprehensions and generators can involve one or more variable for the iteration.
Well, we can use a comprehension to enumerate the possibilities. This example illustrates how comprehensions and generators can involve one or more variable for the iteration. By judiciously choosing what is iterated over, the entire set can be described.
First, we either have $0,1,2,3$, or $4$ quarters, or $0$, $25$ cents, $50$ cents, $75$ cents, or a dollar's worth. If we have, say, $1$ quarter, then we need to make up $75$ cents with the rest. If we had $3$ dimes, then we need to make up $45$ cents out of nickels and pennies, if we then had $6$ nickels, we know we must need $15$ pennies.
@@ -333,22 +336,42 @@ The following expression shows how counting this can be done through enumeration
```{julia}
ways = [(q, d, n, p) for q = 0:25:100 for d = 0:10:(100 - q) for n = 0:5:(100 - q - d) for p = (100 - q - d - n)]
ways = [(q, d, n, p)
for q = 0:25:100
for d = 0:10:(100 - q)
for n = 0:5:(100 - q - d)
for p = (100 - q - d - n)]
length(ways)
```
We see $242$ cases, each distinct. The first $3$ are:
There are $242$ distinct cases. The first three are:
```{julia}
ways[1:3]
```
The generating expression reads naturally. It introduces the use of multiple `for` statements, each subsequent one depending on the value of the previous (working left to right). Now suppose, we want to ensure that the amount in pennies is less than the amount in nickels, etc. We could use `filter` somehow to do this for our last answer, but using `if` allows for filtering while the events are generating. Here our condition is simply expressed: `q > d > n > p`:
The generating expression reads naturally. It introduces the use of multiple `for` statements, each subsequent one depending on the value of the previous (working left to right).
The cashier might like to know the number of coins, not the dollar amount:
```{julia}
[amt ./ [25, 10, 5, 1] for amt in ways[1:3]]
```
There are various ways to get integer values, and not floating point values. One way is to call `round`. Here though, we use the integer division operator, `div`, through its infix operator `÷`:
```{julia}
[amt .÷ [25, 10, 5, 1] for amt in ways[1:3]]
```
Now suppose, we want to ensure that the amount in pennies is less than the amount in nickels, etc. We could use `filter` somehow to do this for our last answer, but using `if` allows for filtering while the events are generating. Here our condition is simply expressed: `q > d > n > p`:
```{julia}
[(q, d, n, p) for q = 0:25:100
[(q, d, n, p)
for q = 0:25:100
for d = 0:10:(100 - q)
for n = 0:5:(100 - q - d)
for p = (100 - q - d - n)
@@ -523,6 +546,24 @@ radioq(choices, answ)
###### Question
An arithmetic sequence ($a_0$, $a_1 = a_0 + h$, $a_2=a_0 + 2h, \dots,$ $a_n=a_0 + n\cdot h$) is specified with a starting point ($a_0$), a step size ($h$), and a number of points $(n+1)$. This is not the case with the colon constructor which take a starting point, a step size, and a suggested last value. This is not the case a with the default for the `range` function, with signature `range(start, stop, length)`. However, the documentation for `range` shows that indeed the three values ($a_0$, $h$, and $n$) can be passed in. Which signature (from the docs) would allow this:
```{julia}
#| echo: false
choices = [
"`range(start, stop, length)`",
"`range(start, stop; length, step)`",
"`range(start; length, stop, step)`",
"`range(;start, length, stop, step)`"]
answer = 3
explanation = """
This is a somewhat vague question, but the use of `range(a0; length=n+1, step=h)` will produce the arithmetic sequence with this parameterization.
"""
buttonq(choices, answer; explanation)
```
###### Question
Create the sequence $10, 100, 1000, \dots, 1,000,000$ using a list comprehension. Which of these works?
@@ -670,7 +711,7 @@ Let's see if `4137 8947 1175 5804` is a valid credit card number?
First, we enter it as a value and immediately break the number into its digits:
```{julia}
x = 4137_8947_1175_5904 # _ in a number is ignored by parser
x = 4137_8947_1175_5804 # _ in a number is ignored by parser
xs = digits(x)
```
@@ -688,7 +729,7 @@ for i in 1:2:length(xs)
end
```
Number greater than 9, have their digits added, then all the resulting numbers are added. This can be done with a generator:
Numbers greater than 9, have their digits added, then all the resulting numbers are added. This can be done with a generator:
```{julia}
@@ -701,7 +742,7 @@ If this sum has a remainder of 0 when dividing by 10, the credit card number is
iszero(rem(z,10))
```
Darn. A typo. is `4137 8947 1175 5804` a possible credit card number?
Darn. A typo. is `4137 8047 1175 5804` a possible credit card number?
```{julia}
#| hold: true

View File

@@ -28,7 +28,7 @@ nothing
The Google calculator has a button `Ans` to refer to the answer to the previous evaluation. This is a form of memory. The last answer is stored in a specific place in memory for retrieval when `Ans` is used. In some calculators, more advanced memory features are possible. For some, it is possible to push values onto a stack of values for them to be referred to at a later time. This proves useful for complicated expressions, say, as the expression can be broken into smaller intermediate steps to be computed. These values can then be appropriately combined. This strategy is a good one, though the memory buttons can make its implementation a bit cumbersome.
With `Julia`, as with other programming languages, it is very easy to refer to past evaluations. This is done by *assignment* whereby a computed value stored in memory is associated with a name. The name can be used to look up the value later. Assignment does not change the value of the object being assigned, it only introduces a reference to it.
With `Julia`, as with other programming languages, it is very easy to refer to past evaluations. This is done by *assignment* whereby a computed value stored in memory is associated with a name (sometimes thought of as symbol or label). The name can be used to look up the value later. Assignment does not change the value of the object being assigned, it only introduces a reference to it.
Assignment in `Julia` is handled by the equals sign and takes the general form `variable_name = value`. For example, here we assign values to the variables `x` and `y`
@@ -49,7 +49,7 @@ x
Just typing a variable name (without a trailing semicolon) causes the assigned value to be displayed.
Variable names can be reused, as here, where we redefine `x`:
Variable names can be reused (or reassigned), as here, where we redefine `x`:
```{julia}
@@ -111,11 +111,31 @@ a = v0 * cosd(theta)
By defining a new variable `a` to represent a value that is repeated a few times in the expression, the last command is greatly simplified. Doing so makes it much easier to check for accuracy against the expression to compute.
##### Example
A common expression in mathematics is a polynomial expression, for example $-16s^2 + 32s - 12$. Translating this to `Julia` at $s =3$ we might have:
```{julia}
s = 3
-16*s^2 + 32*s - 12
```
This looks nearly identical to the mathematical expression, but we inserted `*` to indicate multiplication between the constant and the variable. In fact, this step is not needed as Julia allows numeric literals to have an implied multiplication:
```{julia}
-16s^2 + 32s - 12
```
##### Example
A [grass swale](https://stormwater.pca.state.mn.us/index.php?title=Design_criteria_for_dry_swale_(grass_swale)) is a design to manage surface water flow resulting from a storm. Swales detain, filter, and infiltrate runoff limiting erosion in the process.
![Swale cross section](precalc/figures/swale.png)
![Swale cross section](figures/swale.png)
There are a few mathematical formula that describe the characteristics of swale:
@@ -155,26 +175,7 @@ n, S = 0.025, 2/90
A = (b + d/tan(theta)) * d
P = b + 2d/sin(theta)
R = A / P
Q = R^(2/3) * S^(1/2) * A / n
```
##### Example
A common expression in mathematics is a polynomial expression, for example $-16s^2 + 32s - 12$. Translating this to `Julia` at $s =3$ we might have:
```{julia}
s = 3
-16*s^2 + 32*s - 12
```
This looks nearly identical to the mathematical expression, but we inserted `*` to indicate multiplication between the constant and the variable. In fact, this step is not needed as Julia allows numeric literals to have an implied multiplication:
```{julia}
-16s^2 + 32s - 12
Q = R^(2/3) * S^(1/2) / n * A
```
## Where math and computer notations diverge
@@ -198,7 +199,7 @@ This is completely unlike the mathematical equation $x = x^2$ which is typically
##### Example
Having `=` as assignment is usefully exploited when modeling sequences. For example, an application of Newton's method might end up with this expression:
Having `=` as assignment is usefully exploited when modeling sequences. For example, an application of Newton's method might end up with this mathematical expression:
$$
@@ -208,7 +209,7 @@ $$
As a mathematical expression, for each $i$ this defines a new value for $x_{i+1}$ in terms of a known value $x_i$. This can be used to recursively generate a sequence, provided some starting point is known, such as $x_0 = 2$.
The above might be written instead with:
The above might be written instead using assignment with:
```{julia}
@@ -220,11 +221,19 @@ x = x - (x^2 - 2) / (2x)
Repeating this last line will generate new values of `x` based on the previous one - no need for subscripts. This is exactly what the mathematical notation indicates is to be done.
::: {.callout-note}
## Use of =
The distinction between ``=`` versus `=` is important and one area where common math notation and common computer notation diverge. The mathematical ``=`` indicates *equality*, and is often used with equations and also for assignment. Later, when symbolic math is introduced, the `~` symbol will be used to indicate an equation, though this is by convention and not part of base `Julia`. The computer syntax use of `=` is for *assignment* and *re-assignment*. Equality is tested with `==` and `===`.
:::
## Context
The binding of a value to a variable name happens within some context. For our simple illustrations, we are assigning values, as though they were typed at the command line. This stores the binding in the `Main` module. `Julia` looks for variables in this module when it encounters an expression and the value is substituted. Other uses, such as when variables are defined within a function, involve different contexts which may not be visible within the `Main` module.
The binding of a value to a variable name happens within some context. When a variable is assigned or referenced, the scope of the variable---the region of code where it is accessible---is taken into consideration.
For our simple illustrations, we are assigning values, as though they were typed at the command line. This stores the binding in the `Main` module. `Julia` looks for variables in this module when it encounters an expression and the value is substituted. Other uses, such as when variables are defined within a function, involve different contexts which may not be visible within the `Main` module.
:::{.callout-note}
@@ -235,14 +244,16 @@ The `varinfo` function will list the variables currently defined in the main wor
:::{.callout-warning}
## Warning
**Shooting oneselves in the foot.** `Julia` allows us to locally redefine variables that are built in, such as the value for `pi` or the function object assigned to `sin`. For example, this is a perfectly valid command `sin=3`. However, it will overwrite the typical value of `sin` so that `sin(3)` will be an error. At the terminal, the binding to `sin` occurs in the `Main` module. This shadows that value of `sin` bound in the `Base` module. Even if redefined in `Main`, the value in base can be used by fully qualifying the name, as in `Base.sin(pi)`. This uses the notation `module_name.variable_name` to look up a binding in a module.
**Shooting oneselves in the foot.** `Julia` allows us to locally redefine variables that are built in, such as the value for `pi` or the function object assigned to `sin`. This is called shadowing. For example, this is a perfectly valid command `x + y = 3`. However, it doesn't specify an equation, rather it *redefines* addition. At the terminal, this binding to `+` occurs in the `Main` module. This shadows that value of `+` bound in the `Base` module. Even if redefined in `Main`, the value in base can be used by fully qualifying the name, as in `Base.:+(2, 3)`. This uses the notation `module_name.variable_name` to look up a binding in a module.
:::
## Variable names
`Julia` has a very wide set of possible [names](https://docs.julialang.org/en/stable/manual/variables/#Allowed-Variable-Names-1) for variables. Variables are case sensitive and their names can include many [Unicode](http://en.wikipedia.org/wiki/List_of_Unicode_characters) characters. Names must begin with a letter or an appropriate Unicode value (but not a number). There are some reserved words, such as `try` or `else` which can not be assigned to. However, many built-in names can be locally overwritten. Conventionally, variable names are lower case. For compound names, it is not unusual to see them squished together, joined with underscores, or written in camelCase.
`Julia` has a very wide set of possible [names](https://docs.julialang.org/en/stable/manual/variables/#Allowed-Variable-Names-1) for variables. Variables are case sensitive and their names can include many [Unicode](http://en.wikipedia.org/wiki/List_of_Unicode_characters) characters. Names must begin with a letter or an appropriate Unicode value (but not a number). There are some reserved words, such as `try` or `else` which can not be assigned to. However, many built-in names can be locally overwritten (shadowed).
Conventionally, variable names are lower case. For compound names, it is not unusual to see them squished together, joined with underscores, or written in camelCase.
```{julia}
@@ -255,7 +266,7 @@ __private = 2 # a convention
### Unicode names
Julia allows variable names to use Unicode identifiers. Such names allow `julia` notation to mirror that of many mathematical texts. For example, in calculus the variable $\epsilon$ is often used to represent some small number. We can assign to a symbol that looks like $\epsilon$ using `Julia`'s LaTeX input mode. Typing `\epsilon[tab]` will replace the text with the symbol within `IJulia` or the command line.
`Julia` allows variable names to use Unicode identifiers. Such names allow `julia` notation to mirror that of many mathematical texts. For example, in calculus the variable $\epsilon$ is often used to represent some small number. We can assign to a symbol that looks like $\epsilon$ using `Julia`'s LaTeX input mode. Typing `\epsilon[tab]` will replace the text with the symbol within `IJulia` or the command line.
```{julia}
@@ -272,18 +283,20 @@ For example, we could have defined `theta` (`\theta[tab]`) and `v0` (`v\_0[tab]`
θ = 45; v₀ = 200
```
:::{.callout-note}
## Unicode
These notes can be presented as HTML files *or* as `Pluto` notebooks. They often use Unicode alternatives to avoid the `Pluto` requirement of a single use of assigning to a variable name in a notebook without placing the assignment in a `let` block or a function body.
:::
:::{.callout-note}
## Emojis
There is even support for tab-completion of [emojis](https://github.com/JuliaLang/julia/blob/master/stdlib/REPL/src/emoji_symbols.jl) such as `\:snowman:[tab]` or `\:koala:[tab]`
:::
:::{.callout-note}
## Unicode
These notes often use Unicode alternatives for some variable. Originally this was to avoid a requirement of `Pluto` of a single use of assigning to a variable name in a notebook without placing the assignment in a `let` block or a function body. Now, they are just for clarity through distinction.
:::
##### Example
@@ -322,7 +335,7 @@ a, b = 1, 2
a, b = b, a
```
#### Example, finding the slope
### Example, finding the slope
Find the slope of the line connecting the points $(1,2)$ and $(4,6)$. We begin by defining the values and then applying the slope formula:
@@ -337,6 +350,7 @@ m = (y1 - y0) / (x1 - x0)
Of course, this could be computed directly with `(6-2) / (4-1)`, but by using familiar names for the values we can be certain we apply the formula properly.
## Questions

1244
quarto/basics/vectors.qmd Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -5,12 +5,19 @@ DualNumbers = "fa6b7ba4-c1ee-5f82-b5fc-ecf0adba8f74"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
ImplicitEquations = "95701278-4526-5785-aba3-513cca398f19"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
PlotlyBase = "a03496cd-edff-5a9b-9e67-9cda94a718b5"
PlotlyKaleido = "f2990250-8cf9-495f-b13a-cce12b45703c"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Polynomials = "f27b6e38-b328-58d1-80ce-0feddd5e7a45"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TaylorSeries = "6aa5eb33-94cf-58f4-a9d0-e4b2c4fc25ea"
TermInterface = "8ea1fca8-c5ef-4a55-8b96-4e9afe9c9a3c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"
Unitful = "1986cc42-f94f-5a68-af5c-568840ba703d"

View File

@@ -2,73 +2,74 @@
In Part III of @doi:10.1137/1.9781611977165 we find language of numerical analysis useful to formally describe the zero-finding problem. Key concepts are errors, conditioning, and stability.
Abstractly a *problem* is a mapping, or function, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm is a generalization of the absolute value.
Abstractly a *problem* is a mapping, $F$, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm is a generalization of the absolute value and gives quantitative meaning to terms like small and large.
> A *well-conditioned* problem is one with the property that all small perturbations of $x$ lead to only small changes in $f(x)$.
> A *well-conditioned* problem is one with the property that all small perturbations of $x$ lead to only small changes in $F(x)$.
This sense of "small" is quantified through a *condition number*.
This sense of "small" is measured through a *condition number*.
If we let $\delta_x$ be a small perturbation of $x$ then $\delta_f = f(x + \delta_x) - f(x)$.
If we let $\delta_x$ be a small perturbation of $x$ then $\delta_F = F(x + \delta_x) - F(x)$.
The *forward error* is $\lVert\delta_f\rVert = \lVert f(x+\delta_x) - f(x)\rVert$, the *relative forward error* is $\lVert\delta_f\rVert/\lVert f\rVert = \lVert f(x+\delta_x) - f(x)\rVert/ \lVert f(x)\rVert$.
The *forward error* is $\lVert\delta_F\rVert = \lVert F(x+\delta_x) - F(x)\rVert$, the *relative forward error* is $\lVert\delta_F\rVert/\lVert F\rVert = \lVert F(x+\delta_x) - F(x)\rVert/ \lVert F(x)\rVert$.
The *backward error* is $\lVert\delta_x\rVert$, the *relative backward error* is $\lVert\delta_x\rVert / \lVert x\rVert$.
The *absolute condition number* $\hat{\kappa}$ is worst case of this ratio $\lVert\delta_f\rVert/ \lVert\delta_x\rVert$ as the perturbation size shrinks to $0$.
The relative condition number $\kappa$ divides $\lVert\delta_f\rVert$ by $\lVert f(x)\rVert$ and $\lVert\delta_x\rVert$ by $\lVert x\rVert$ before taking the ratio.
The *absolute condition number* $\hat{\kappa}$ is worst case of this ratio $\lVert\delta_F\rVert/ \lVert\delta_x\rVert$ as the perturbation size shrinks to $0$.
The relative condition number $\kappa$ divides $\lVert\delta_F\rVert$ by $\lVert F(x)\rVert$ and $\lVert\delta_x\rVert$ by $\lVert x\rVert$ before taking the ratio.
A *problem* is a mathematical concept, an *algorithm* the computational version. Algorithms may differ for many reasons, such as floating point errors, tolerances, etc. We use notation $\tilde{f}$ to indicate the algorithm.
A *problem* is a mathematical concept, an *algorithm* the computational version. Algorithms may differ for many reasons, such as floating point errors, tolerances, etc. We use notation $\tilde{F}$ to indicate the algorithm.
The absolute error in the algorithm is $\lVert\tilde{f}(x) - f(x)\rVert$, the relative error divides by $\lVert f(x)\rVert$. A good algorithm would have smaller relative errors.
The absolute error in the algorithm is $\lVert\tilde{F}(x) - F(x)\rVert$, the relative error divides by $\lVert F(x)\rVert$. A good algorithm would have smaller relative errors.
An algorithm is called *stable* if
$$
\frac{\lVert\tilde{f}(x) - f(\tilde{x})\rVert}{\lVert f(\tilde{x})\rVert}
\frac{\lVert\tilde{F}(x) - F(\tilde{x})\rVert}{\lVert F(\tilde{x})\rVert}
$$
is *small* for *some* $\tilde{x}$ relatively near $x$, $\lVert\tilde{x}-x\rVert/\lVert x\rVert$.
> "A *stable* algorithm gives nearly the right answer to nearly the right question."
> A *stable* algorithm gives nearly the right answer to nearly the right question.
(The answer it gives is $\tilde{f}(x)$, the nearly right question is $f(\tilde{x})$.)
(The answer it gives is $\tilde{F}(x)$, the nearly right question: what is $F(\tilde{x})$?)
A related concept is an algorithm $\tilde{f}$ for a problem $f$ is *backward stable* if for each $x \in X$,
A related concept is an algorithm $\tilde{F}$ for a problem $F$ is *backward stable* if for each $x \in X$,
$$
\tilde{f}(x) = f(\tilde{x})
\tilde{F}(x) = F(\tilde{x})
$$
for some $\tilde{x}$ with $\lVert\tilde{x} - x\rVert/\lVert x\rVert$ is small.
for some $\tilde{x}$ with $\lVert\tilde{x} - x\rVert/\lVert x\rVert$ is small.
> "A backward stable algorithm gives exactly the right answer to nearly the right question."
The concepts are related by Trefethen and Bao's Theorem 15.1 which says for a backward stable algorithm the relative error $\lVert\tilde{f}(x) - f(x)\rVert/\lVert f(x)\rVert$ is small in a manner proportional to the relative condition number.
The concepts are related by Trefethen and Bao's Theorem 15.1 which says for a backward stable algorithm the relative error $\lVert\tilde{F}(x) - F(x)\rVert/\lVert F(x)\rVert$ is small in a manner proportional to the relative condition number.
Applying this to the zero-finding we follow @doi:10.1137/1.9781611975086.
To be specific, the problem is finding a zero of $f$ starting at an initial point $x_0$. The data is $(f, x_0)$, the solution is $r$ a zero of $f$.
To be specific, the problem, $F$, is finding a zero of a function $f$ starting at an initial point $x_0$. The data is $(f, x_0)$, the solution is $r$ a zero of $f$.
Take the algorithm as Newton's method. Any implementation must incorporate tolerances, so this is a computational approximation to the problem. The data is the same, but technically we use $\tilde{f}$ for the function, as any computation is dependent on machine implementations. The output is $\tilde{r}$ an *approximate* zero.
Suppose for sake of argument that $\tilde{f}(x) = f(x) + \epsilon$ and $r$ is a root of $f$ and $\tilde{r}$ is a root of $\tilde{f}$. Then by linearization:
Suppose for sake of argument that $\tilde{f}(x) = f(x) + \epsilon$, $f$ has a continuous derivative, and $r$ is a root of $f$ and $\tilde{r}$ is a root of $\tilde{f}$. Then by linearization:
$$
\begin{align*}
0 &= \tilde{f}(\tilde r) \\
&= f(r + \delta) + \epsilon\\
&\approx f(r) + f'(r)\delta + \epsilon\\
&= 0 + f'(r)\delta + \epsilon
\end{align*}
Rearranging gives $\lVert\delta/\epsilon\rVert \approx 1/\lVert f'(r)\rVert$ leading to:
$$
Rearranging gives $\lVert\delta/\epsilon\rVert \approx 1/\lVert f'(r)\rVert$. But the $|\delta|/|\epsilon|$ ratio is related to the the condition number:
> The absolute condition number is $\hat{\kappa}_r = |f'(r)|^{-1}$.
The error formula in Newton's method includes the derivative in the denominator, so we see large condition numbers are tied into larger errors.
The error formula in Newton's method measuring the distance between the actual root and an approximation includes the derivative in the denominator, so we see large condition numbers are tied into possibly larger errors.
Now consider $g(x) = f(x) - f(\tilde{r})$. Call $f(\tilde{r})$ the residual. We have $g$ is near $f$ if the residual is small. The algorithm will solve $(g, x_0)$ with $\tilde{r}$, so with a small residual an exact solution to an approximate question will be found. Driscoll and Braun state
@@ -83,4 +84,5 @@ Practically these two observations lead to
For the first observation, the example of Wilkinson's polynomial is often used where $f(x) = (x-1)\cdot(x-2)\cdot \cdots\cdot(x-20)$. When expanded this function has exactness issues of typical floating point values, the condition number is large and some of the roots found are quite different from the mathematical values.
The second observation helps explain why a problem like finding the zero of $f(x) = x \cdot \exp(x)$ using Newton's method starting at $2$ might return a value like $5.89\dots$. The residual is checked to be zero in a *relative* manner which would basically use a tolerance of `atol + abs(xn)*rtol`. Functions with asymptotes of $0$ will eventually be smaller than this value.
The second observation follows from $f(x_n)$ monitoring the backward error and the product of the condition number and the backward error monitoring the forward error. This product is on the order of $|f(x_n)/f'(x_n)|$ or $|x_{n+1} - x_n|$.

View File

@@ -1,4 +1,4 @@
# Curve Sketching
# Curve sketching
{{< include ../_common_code.qmd >}}
@@ -195,7 +195,7 @@ To identify how wide a viewing window should be, for the rational function the a
```{julia}
cps = find_zeros(f', -10, 10)
poss_ips = find_zero(f'', (-10, 10))
poss_ips = find_zeros(f'', (-10, 10))
extrema(union(cps, poss_ips))
```
@@ -340,7 +340,7 @@ radioq(choices, answ)
###### Question
Consider the function $p(x) = x + 2x^3 + 3x^3 + 4x^4 + 5x^5 +6x^6$. Which interval shows more than a $U$-shaped graph that dominates for large $x$ due to the leading term being $6x^6$?
Consider the function $p(x) = x + 2x^2 + 3x^3 + 4x^4 + 5x^5 +6x^6$. Which interval shows more than a $U$-shaped graph that dominates for large $x$ due to the leading term being $6x^6$?
(Find an interval that contains the zeros, critical points, and inflection points.)
@@ -494,7 +494,7 @@ Does a plot over $[0,50]$ show qualitatively similar behaviour?
```{julia}
#| hold: true
#| echo: false
yesnoq(true)
yesnoq("no")
```
Exponential growth has $P''(t) = P_0 a^t \log(a)^2 > 0$, so has no inflection point. By plotting over a sufficiently wide interval, can you answer: does the logistic growth model have an inflection point?

View File

@@ -22,6 +22,8 @@ nothing
---
![Device to measure units of distance by units of time](figures/galileo-ramp.png){width=60%}
Before defining the derivative of a function, let's begin with two motivating examples.
@@ -227,13 +229,35 @@ function secant_line_tangent_line_graph(n)
m = (f(c+h) - f(c))/h
xs = range(0, stop=pi, length=50)
plt = plot(f, 0, pi, legend=false, size=fig_size)
fig_size=(800, 600)
plt = plot(;
xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false,
ylims=(-.1,1.5)
)
plot!(f, 0, pi/2; line=(:black, 2))
plot!(f, pi/2, pi/2 + pi/5; line=(:black, 2, 1/4))
plot!(f, pi/2 + pi/5, pi; line=(:black, 2))
plot!(0.1 .+ [0,0],[-.1, 1.5]; line=(:gray,1), arrow=true, side=:head)
plot!([-0.2, 3.4], [.1, .1]; line=(:gray, 1), arrow=true, side=:head)
plot!(plt, xs, f(c) .+ cos(c)*(xs .- c), color=:orange)
plot!(plt, xs, f(c) .+ m*(xs .- c), color=:black)
scatter!(plt, [c,c+h], [f(c), f(c+h)], color=:orange, markersize=5)
plot!(plt, [c, c+h, c+h], [f(c), f(c), f(c+h)], color=:gray30)
annotate!(plt, [(c+h/2, f(c), text("h", :top)), (c + h + .05, (f(c) + f(c + h))/2, text("f(c+h) - f(c)", :left))])
annotate!(plt, [(c+h/2, f(c), text(L"h", :top)),
(c + h + .05, (f(c) + f(c + h))/2, text(L"f(c+h) - f(c)", :left)),
])
plt
end
@@ -245,7 +269,7 @@ The slope of each secant line represents the *average* rate of change between $c
n = 5
n = 6
anim = @animate for i=0:n
secant_line_tangent_line_graph(i)
end
@@ -266,11 +290,59 @@ $$
We will define the tangent line at $(c, f(c))$ to be the line through the point with the slope from the limit above - provided that limit exists. Informally, the tangent line is the line through the point that best approximates the function.
::: {#fig-tangent_line_approx_graph}
```{julia}
#| echo: false
gr()
let
function make_plot(Δ)
f(x) = 1 + sin(x-c)
df(x) = cos(x-c)
plt = plot(;
#xaxis=([], false),
yaxis=([], false),
aspect_ratio=:equal,
legend=false,
)
c = 1
xticks!([c-Δ, c, c+Δ], [latexstring("c-$Δ"), L"c", latexstring("c-$Δ")])
y₀ = f(c) - 2/3 * Δ
tl(x) = f(c) + df(c) * (x-c)
plot!(f, c - Δ, c + Δ; line=(:black, 2))
plot!(tl, c - Δ, c + Δ; line=(:red, 2))
plot!([c,c], [tl(c-Δ), f(c)]; line=(:gray, :dash, 1))
#plot!([c-1.1*Δ, c+1.1*Δ], y₀ .+ [0,0]; line=(:gray, 1), arrow=true)
current()
end
ps = make_plot.((1.5, 1.0, 0.5, 0.1))
plot(ps...)
end
```
Illustration that the tangent line is the best linear approximation *near* $c$.
:::
```{julia}
#| echo: false
plotly()
nothing
```
```{julia}
#| hold: true
#| echo: false
#| cache: true
#| eval: false
gr()
function line_approx_fn_graph(n)
f(x) = sin(x)
@@ -419,7 +491,7 @@ $$
\frac{\log(x+h) - \log(x)}{h} = \frac{1}{h}\log(\frac{x+h}{x}) = \log((1+h/x)^{1/h}).
$$
As noted earlier, Cauchy saw the limit as $u$ goes to $0$ of $f(u) = (1 + u)^{1/u}$ is $e$. Re-expressing the above we can get $1/h \cdot \log(f(h/x))$. The limit as $h$ goes to $0$ of this is found from the composition rules for limits: as $\lim_{h \rightarrow 0} f(h/x) = e^{1/x}$, and since $\log(x)$ is continuous at $e^{1/x}$ we get this expression has a limit of $1/x$.
As noted earlier, Cauchy saw the limit as $u$ goes to $0$ of $f(u) = (1 + u)^{1/u}$ is $e$. Re-expressing the above we can get $1/x \cdot \log(f(h/x))$. The limit as $h$ goes to $0$ of this is found from the composition rules for limits: as $\lim_{h \rightarrow 0} f(h/x) = e$, and since $\log(x)$ is continuous at $e$ we get this expression has a limit of $1/x$.
We verify through:
@@ -520,13 +592,14 @@ This holds two rules: the derivative of a constant times a function is the const
This example shows a useful template:
$$
\begin{align*}
[2x^2 - \frac{x}{3} + 3e^x]' & = 2[\square]' - \frac{[\square]'}{3} + 3[\square]'\\
&= 2[x^2]' - \frac{[x]'}{3} + 3[e^x]'\\
&= 2(2x) - \frac{1}{3} + 3e^x\\
&= 4x - \frac{1}{3} + 3e^x
\end{align*}
$$
### Product rule
@@ -548,12 +621,13 @@ The output uses the Leibniz notation to represent that the derivative of $u(x) \
This example shows a useful template for the product rule:
$$
\begin{align*}
[(x^2+1)\cdot e^x]' &= [\square]' \cdot (\square) + (\square) \cdot [\square]'\\
&= [x^2 + 1]' \cdot (e^x) + (x^2+1) \cdot [e^x]'\\
&= (2x)\cdot e^x + (x^2+1)\cdot e^x
\end{align*}
$$
### Quotient rule
@@ -572,12 +646,13 @@ limit((f(x+h) - f(x))/h, h => 0)
This example shows a useful template for the quotient rule:
$$
\begin{align*}
[\frac{x^2+1}{e^x}]' &= \frac{[\square]' \cdot (\square) - (\square) \cdot [\square]'}{(\square)^2}\\
&= \frac{[x^2 + 1]' \cdot (e^x) - (x^2+1) \cdot [e^x]'}{(e^x)^2}\\
&= \frac{(2x)\cdot e^x - (x^2+1)\cdot e^x}{e^{2x}}
\end{align*}
$$
##### Examples
@@ -672,19 +747,21 @@ There are $n$ terms, each where one of the $f_i$s have a derivative. Were we to
With this, we can proceed. Each term $x-i$ has derivative $1$, so the answer to $f'(x)$, with $f$ as above, is
$$
\begin{align*}
f'(x) &= f(x)/(x-1) + f(x)/(x-2) + f(x)/(x-3)\\
&+ f(x)/(x-4) + f(x)/(x-5),
\end{align*}
$$
That is
$$
\begin{align*}
f'(x) &= (x-2)(x-3)(x-4)(x-5) + (x-1)(x-3)(x-4)(x-5)\\
&+ (x-1)(x-2)(x-4)(x-5) + (x-1)(x-2)(x-3)(x-5) \\
&+ (x-1)(x-2)(x-3)(x-4).
\end{align*}
$$
---
@@ -749,17 +826,18 @@ Combined, we would end up with:
To see that this works in our specific case, we assume the general power rule that $[x^n]' = n x^{n-1}$ to get:
$$
\begin{align*}
f(x) &= x^2 & g(x) &= \sqrt{x}\\
f'(\square) &= 2(\square) & g'(x) &= \frac{1}{2}x^{-1/2}
\end{align*}
$$
We use $\square$ for the argument of `f'` to emphasize that $g(x)$ is the needed value, not just $x$:
We use $\square$ for the argument of $f'$ to emphasize that $g(x)$ is the needed value, not just $x$:
$$
\begin{align*}
[(\sqrt{x})^2]' &= [f(g(x)]'\\
&= f'(g(x)) \cdot g'(x) \\
@@ -767,6 +845,7 @@ We use $\square$ for the argument of `f'` to emphasize that $g(x)$ is the needed
&= \frac{2\sqrt{x}}{2\sqrt{x}}\\
&=1
\end{align*}
$$
This is the same as the derivative of $x$ found by first evaluating the composition. For this problem, the chain rule is not necessary, but typically it is a needed rule to fully differentiate a function.
@@ -778,11 +857,12 @@ This is the same as the derivative of $x$ found by first evaluating the composit
Find the derivative of $f(x) = \sqrt{1 - x^2}$. We identify the composition of $\sqrt{x}$ and $(1-x^2)$. We set the functions and their derivatives into a pattern to emphasize the pieces in the chain-rule formula:
$$
\begin{align*}
f(x) &=\sqrt{x} = x^{1/2} & g(x) &= 1 - x^2 \\
f'(\square) &=(1/2)(\square)^{-1/2} & g'(x) &= -2x
\end{align*}
$$
Then:
@@ -823,11 +903,12 @@ This is a useful rule to remember for expressions involving exponentials.
Find the derivative of $\sin(x)\cos(2x)$ at $x=\pi$.
$$
\begin{align*}
[\sin(x)\cos(2x)]'\big|_{x=\pi} &=(\cos(x)\cos(2x) + \sin(x)(-\sin(2x)\cdot 2))\big|_{x=\pi} \\
& =((-1)(1) + (0)(-0)(2)) = -1.
\end{align*}
$$
##### Proof of the Chain Rule
@@ -844,23 +925,25 @@ g(a+h) = g(a) + g'(a)h + \epsilon_g(h) h = g(a) + h',
$$
Where $h' = (g'(a) + \epsilon_g(h))h \rightarrow 0$ as $h \rightarrow 0$ will be used to simplify the following:
$$
\begin{align*}
f(g(a+h)) - f(g(a)) &=
f(g(a) + g'(a)h + \epsilon_g(h)h) - f(g(a)) \\
&= f(g(a)) + f'(g(a)) (g'(a)h + \epsilon_g(h)h) + \epsilon_f(h')(h') - f(g(a))\\
&= f'(g(a)) g'(a)h + f'(g(a))(\epsilon_g(h)h) + \epsilon_f(h')(h').
\end{align*}
$$
Rearranging:
$$
\begin{align*}
f(g(a+h)) &- f(g(a)) - f'(g(a)) g'(a) h\\
&= f'(g(a))\epsilon_g(h)h + \epsilon_f(h')(h')\\
&=(f'(g(a)) \epsilon_g(h) + \epsilon_f(h') (g'(a) + \epsilon_g(h)))h \\
&=\epsilon(h)h,
\end{align*}
$$
where $\epsilon(h)$ combines the above terms which go to zero as $h\rightarrow 0$ into one. This is the alternative definition of the derivative, showing $(f\circ g)'(a) = f'(g(a)) g'(a)$ when $g$ is differentiable at $a$ and $f$ is differentiable at $g(a)$.
@@ -871,17 +954,18 @@ where $\epsilon(h)$ combines the above terms which go to zero as $h\rightarrow 0
The chain rule name could also be simply the "composition rule," as that is the operation the rule works for. However, in practice, there are usually *multiple* compositions, and the "chain" rule is used to chain together the different pieces. To get a sense, consider a triple composition $u(v(w(x)))$. This will have derivative:
$$
\begin{align*}
[u(v(w(x)))]' &= u'(v(w(x))) \cdot [v(w(x))]' \\
&= u'(v(w(x))) \cdot v'(w(x)) \cdot w'(x)
\end{align*}
$$
The answer can be viewed as a repeated peeling off of the outer function, a view with immediate application to many compositions. To see that in action with an expression, consider this derivative problem, shown in steps:
$$
\begin{align*}
[\sin(e^{\cos(x^2-x)})]'
&= \cos(e^{\cos(x^2-x)}) \cdot [e^{\cos(x^2-x)}]'\\
@@ -889,6 +973,7 @@ The answer can be viewed as a repeated peeling off of the outer function, a view
&= \cos(e^{\cos(x^2-x)}) \cdot e^{\cos(x^2-x)} \cdot (-\sin(x^2-x)) \cdot [x^2-x]'\\
&= \cos(e^{\cos(x^2-x)}) \cdot e^{\cos(x^2-x)} \cdot (-\sin(x^2-x)) \cdot (2x-1)\\
\end{align*}
$$
##### More examples of differentiation
@@ -1004,7 +1089,7 @@ Find the derivative of $f(x) = x \cdot e^{-x^2}$.
Using the product rule and then the chain rule, we have:
$$
\begin{align*}
f'(x) &= [x \cdot e^{-x^2}]'\\
&= [x]' \cdot e^{-x^2} + x \cdot [e^{-x^2}]'\\
@@ -1012,6 +1097,7 @@ f'(x) &= [x \cdot e^{-x^2}]'\\
&= e^{-x^2} + x \cdot e^{-x^2} \cdot (-2x)\\
&= e^{-x^2} (1 - 2x^2).
\end{align*}
$$
---
@@ -1022,7 +1108,7 @@ Find the derivative of $f(x) = e^{-ax} \cdot \sin(x)$.
Using the product rule and then the chain rule, we have:
$$
\begin{align*}
f'(x) &= [e^{-ax} \cdot \sin(x)]'\\
&= [e^{-ax}]' \cdot \sin(x) + e^{-ax} \cdot [\sin(x)]'\\
@@ -1030,6 +1116,7 @@ f'(x) &= [e^{-ax} \cdot \sin(x)]'\\
&= e^{-ax} \cdot (-a) \cdot \sin(x) + e^{-ax} \cos(x)\\
&= e^{-ax}(\cos(x) - a\sin(x)).
\end{align*}
$$
---
@@ -1164,13 +1251,14 @@ Find the first $3$ derivatives of $f(x) = ax^3 + bx^2 + cx + d$.
Differentiating a polynomial is done with the sum rule, here we repeat three times:
$$
\begin{align*}
f(x) &= ax^3 + bx^2 + cx + d\\
f'(x) &= 3ax^2 + 2bx + c \\
f''(x) &= 3\cdot 2 a x + 2b \\
f'''(x) &= 6a
\end{align*}
$$
We can see, the fourth derivative and all higher order ones would be identically $0$. This is part of a general phenomenon: an $n$th degree polynomial has only $n$ non-zero derivatives.
@@ -1181,7 +1269,7 @@ We can see, the fourth derivative and all higher order ones would be ide
Find the first $5$ derivatives of $\sin(x)$.
$$
\begin{align*}
f(x) &= \sin(x) \\
f'(x) &= \cos(x) \\
@@ -1190,6 +1278,7 @@ f'''(x) &= -\cos(x) \\
f^{(4)} &= \sin(x) \\
f^{(5)} &= \cos(x)
\end{align*}
$$
We see the derivatives repeat themselves. (We also see alternative notation for higher order derivatives.)
@@ -1616,13 +1705,14 @@ The right graph is of $g(x) = \exp(x)$ at $x=1$, the left graph of $f(x) = \sin(
Assuming the approximation gets better for $h$ close to $0$, as it visually does, the derivative at $1$ for $f(g(x))$ should be given by this limit:
$$
\begin{align*}
\frac{d(f\circ g)}{dx}\mid_{x=1}
&= \lim_{h\rightarrow 0} \frac{f(g(1) + g'(1)h)-f(g(1))}{h}\\
&= \lim_{h\rightarrow 0} \frac{f(g(1) + g'(1)h)-f(g(1))}{g'(1)h} \cdot g'(1)\\
&= \lim_{h\rightarrow 0} (f\circ g)'(g(1)) \cdot g'(1).
&= \lim_{h\rightarrow 0} f'(g(1)) \cdot g'(1).
\end{align*}
$$
What limit law, described below assuming all limits exist. allows the last equals sign?

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

View File

@@ -36,21 +36,21 @@ Of course, we define *negative* in a parallel manner. The intermediate value th
Next,
::: {.callout-note icon=false}
## Strictly increasing
> A function, $f$, is (strictly) **increasing** on an interval $I$ if for any $a < b$ it must be that $f(a) < f(b)$.
A function, $f$, is (strictly) **increasing** on an interval $I$ if for any $a < b$ it must be that $f(a) < f(b)$.
The word strictly is related to the inclusion of the $<$ precluding the possibility of a function being flat over an interval that the $\leq$ inequality would allow.
A parallel definition with $a < b$ implying $f(a) > f(b)$ would be used for a *strictly decreasing* function.
:::
We can try and prove these properties for a function algebraically we'll see both are related to the zeros of some function. However, before proceeding to that it is usually helpful to get an idea of where the answer is using exploratory graphs.
We will use a helper function, `plotif(f, g, a, b)` that plots the function `f` over `[a,b]` highlighting the regions in the domain when `g` is non-negative. Such a function is defined for us in the accompanying `CalculusWithJulia` package, which has been previously been loaded.
We will use a helper function, `plotif(f, g, a, b)` that plots the function `f` over `[a,b]` highlighting the regions in the domain when `g` is non-negative. Such a function is defined for us in the accompanying `CalculusWithJulia` package, which has been previously loaded.
To see where a function is positive, we simply pass the function object in for *both* `f` and `g` above. For example, let's look at where $f(x) = \sin(x)$ is positive:
@@ -160,13 +160,17 @@ This leaves the question:
This question can be answered by considering the first derivative.
> *The first derivative test*: If $c$ is a critical point for $f(x)$ and *if* $f'(x)$ changes sign at $x=c$, then $f(c)$ will be either a relative maximum or a relative minimum.
>
> * $f$ will have a relative maximum at $c$ if the derivative changes sign from $+$ to $-$.
> * $f$ will have a relative minimum at $c$ if the derivative changes sign from $-$ to $+$.
>
> Further, If $f'(x)$ does *not* change sign at $c$, then $f$ will *not* have a relative maximum or minimum at $c$.
::: {.callout-note icon=false}
## The first derivative test
If $c$ is a critical point for $f(x)$ and *if* $f'(x)$ changes sign at $x=c$, then $f(c)$ will be either a relative maximum or a relative minimum.
* $f$ will have a relative maximum at $c$ if the derivative changes sign from $+$ to $-$.
* $f$ will have a relative minimum at $c$ if the derivative changes sign from $-$ to $+$.
Further, If $f'(x)$ does *not* change sign at $c$, then $f$ will *not* have a relative maximum or minimum at $c$.
:::
The classification part, should be clear: e.g., if the derivative is positive then negative, the function $f$ will increase to $(c,f(c))$ then decrease from $(c,f(c))$ so $f$ will have a local maximum at $c$.
@@ -239,7 +243,7 @@ g(x) = sqrt(abs(x^2 - 1))
cps = find_zeros(g', -2, 2)
```
We see the three values $-1$, $0$, $1$ that correspond to the two zeros and the relative minimum of $x^2 - 1$. We could graph things, but instead we characterize these values using a sign chart. A piecewise continuous function can only change sign when it crosses $0$ or jumps over $0$. The derivative will be continuous, except possibly at the three values above, so is piecewise continuous.
We see the three values $-1$, $0$, $1$ that correspond to the two zeros and the relative maximum of $x^2 - 1$. We could graph things, but instead we characterize these values using a sign chart. A piecewise continuous function can only change sign when it crosses $0$ or jumps over $0$. The derivative will be continuous, except possibly at the three values above, so is piecewise continuous.
A sign chart picks convenient values between crossing points to test if the function is positive or negative over those intervals. When computing by hand, these would ideally be values for which the function is easily computed. On the computer, this isn't a concern; below the midpoint is chosen:
@@ -324,7 +328,7 @@ At $x=0$ we have to the left and right signs found by
fp(-pi/2), fp(pi/2)
```
Both are negative. The derivative does not change sign at $0$, so the critical point is neither a relative minimum or maximum.
Both are negative. The derivative does not change sign at $0$, so the critical point is neither a relative minimum nor maximum.
What about at $2\pi$? We do something similar:
@@ -334,7 +338,7 @@ What about at $2\pi$? We do something similar:
fp(2pi - pi/2), fp(2pi + pi/2)
```
Again, both negative. The function $f(x)$ is just decreasing near $2\pi$, so again the critical point is neither a relative minimum or maximum.
Again, both negative. The function $f(x)$ is just decreasing near $2\pi$, so again the critical point is neither a relative minimum nor maximum.
A graph verifies this:
@@ -424,12 +428,15 @@ The graph attempts to illustrate that for this function the secant line between
This is a special property not shared by all functions. Let $I$ be an open interval.
::: {.callout-note icon=false}
## Concave up
> **Concave up**: A function $f(x)$ is concave up on $I$ if for any $a < b$ in $I$, the secant line between $a$ and $b$ lies above the graph of $f(x)$ over $[a,b]$.
A function $f(x)$ is concave up on $I$ if for any $a < b$ in $I$, the secant line between $a$ and $b$ lies above the graph of $f(x)$ over $[a,b]$.
A similar definition exists for *concave down* where the secant lines lie below the graph.
:::
A similar definition exists for *concave down* where the secant lines lie below the graph. Notationally, concave up says for any $x$ in $[a,b]$:
Notationally, concave up says for any $x$ in $[a,b]$:
$$
@@ -447,7 +454,7 @@ We won't work with these definitions in this section, rather we will characteriz
A proof of this makes use of the same trick used to establish the mean value theorem from Rolle's theorem. Assume $f'$ is increasing and let $g(x) = f(x) - (f(a) + M \cdot (x-a))$, where $M$ is the slope of the secant line between $a$ and $b$. By construction $g(a) = g(b) = 0$. If $f'(x)$ is increasing, then so is $g'(x) = f'(x) + M$. By its definition above, showing $f$ is concave up is the same as showing $g(x) \leq 0$. Suppose to the contrary that there is a value where $g(x) > 0$ in $[a,b]$. We show this can't be. Assuming $g'(x)$ always exists, after some work, Rolle's theorem will ensure there is a value where $g'(c) = 0$ and $(c,g(c))$ is a relative maximum, and as we know there is at least one positive value, it must be $g(c) > 0$. The first derivative test then ensures that $g'(x)$ will increase to the left of $c$ and decrease to the right of $c$, since $c$ is at a critical point and not an endpoint. But this can't happen as $g'(x)$ is assumed to be increasing on the interval.
A proof of this makes use of the same trick used to establish the mean value theorem from Rolle's theorem. Assume $f'$ is increasing and let $g(x) = f(x) - (f(a) + M \cdot (x-a))$, where $M$ is the slope of the secant line between $a$ and $b$. By construction $g(a) = g(b) = 0$. If $f'(x)$ is increasing, then so is $g'(x) = f'(x) + M$. By its definition above, showing $f$ is concave up is the same as showing $g(x) \leq 0$. Suppose to the contrary that there is a value where $g(x) > 0$ in $[a,b]$. We show this can't be. Assuming $g'(x)$ always exists, after some work, Rolle's theorem will ensure there is a value where $g'(c) = 0$ and $(c,g(c))$ is a relative maximum, and as we know there is at least one positive value, it must be $g(c) > 0$. The first derivative test then ensures that $g'(x)$ will be positive to the left of $c$ and negative to the right of $c$, since $c$ is at a critical point and not an endpoint. But this can't happen as $g'(x)$ is assumed to be increasing on the interval.
The relationship between increasing functions and their derivatives if $f'(x) > 0$ on $I$, then $f$ is increasing on $I$ gives this second characterization of concavity when the second derivative exists:
@@ -468,22 +475,22 @@ Let's look at the function $x^2 \cdot e^{-x}$ for positive $x$. A quick graph sh
```{julia}
h(x) = x^2 * exp(-x)
plotif(h, h'', 0, 8)
g(x) = x^2 * exp(-x)
plotif(g, g'', 0, 8)
```
From the graph, we would expect that the second derivative - which is continuous - would have two zeros on $[0,8]$:
```{julia}
ips = find_zeros(h'', 0, 8)
ips = find_zeros(g'', 0, 8)
```
As well, between the zeros we should have the sign pattern `+`, `-`, and `+`, as we verify:
```{julia}
sign_chart(h'', 0, 8)
sign_chart(g'', 0, 8)
```
### Second derivative test
@@ -491,14 +498,16 @@ sign_chart(h'', 0, 8)
Concave up functions are "opening" up, and often clearly $U$-shaped, though that is not necessary. At a relative minimum, where there is a $U$-shape, the graph will be concave up; conversely at a relative maximum, where the graph has a downward $\cap$-shape, the function will be concave down. This observation becomes:
::: {.callout-note icon=false}
## The second derivative test
> The **second derivative test**: If $c$ is a critical point of $f(x)$ with $f''(c)$ existing in a neighborhood of $c$, then
>
> * $f$ will have a relative maximum at the critical point $c$ if $f''(c) > 0$,
> * $f$ will have a relative minimum at the critical point $c$ if $f''(c) < 0$, and
> * *if* $f''(c) = 0$ the test is *inconclusive*.
If $c$ is a critical point of $f(x)$ with $f''(c)$ existing in a neighborhood of $c$, then
* $f$ will have a relative minimum at the critical point $c$ if $f''(c) > 0$,
* $f$ will have a relative maximum at the critical point $c$ if $f''(c) < 0$, and
* *if* $f''(c) = 0$ the test is *inconclusive*.
:::
If $f''(c)$ is positive in an interval about $c$, then $f''(c) > 0$ implies the function is concave up at $x=c$. In turn, concave up implies the derivative is increasing so must go from negative to positive at the critical point.
@@ -735,6 +744,90 @@ choices=[
answ = 3
radioq(choices, answ)
```
###### Question
The function
$$
f(x) =
\begin{cases}
\frac{x}{2} + x^2 \sin(\frac{\pi}{x}) & x \neq 0\\
0 & x = 0
\end{cases}
$$
is graphed below over $[-1/3, 1/3]$.
```{julia}
#| echo: false
plt = let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
## f'(0) > 0 but not increasing
f(x) = x/2 + x^2 * sinpi(1/x)
g(x) = x/2 - x^2
a, b = -1/3, 1/3
xs = range(a, b, 10_000)
ys = f.(xs)
y0,y1 = extrema(ys)
plot(; empty_style..., aspect_ratio=:equal)
plot!([a,b],[0,0]; axis_style...)
plot!([0,0], [y0,y1]; axis_style...)
plot!(xs, f.(xs); line=(:black, 1))
plot!(xs, x -> x/2 + x^2; line=(:gray, 1, :dot))
plot!(xs, x -> x/2 - x^2; line=(:gray, 1, :dot))
plot!(xs, x -> x/2; line=(:gray, 1))
a1 = (1/4 + 1/5)/2
a2 = -(1*1/3 + 4*1/4)/5
annotate!([
(a1, g(a1), text(L"\frac{x}{2} - x^2", 10, :top)),
(a1, f(a1), text(L"\frac{x}{2} + x^2", 10, :bottom)),
(-1/6, f(1/6), text(L"\frac{x}{2} + x^2\sin(\frac{\pi}{x})", 10, :bottom))
])
plot!([-1/6, -1/13.5], [f(1/6), f(-1/13.5)]; axis_style...)
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
This function has a derivative at $0$ that is *positive*
```{julia}
f(x) = x == 0 ? 0 : x/2 + x^2 * sinpi(1/x)
@syms h
limit((f(0+h) - f(0))/h, h=>0; dir="+-")
```
Is the function increasing **around** $0$?
(The derivative away from $0$ is given by:
```{julia}
@syms x
diff(f(x), x)
```
```{julia}
#| echo: false
choices = ["Yes", "No"]
answer = 1
buttonq(choices, answer; explanation=raw"""
The slope of the tangent line away from $0$ oscillates from positive to negative at every rational number of the form $1/n$ due to the $\cos(\pi/x)$ term, so it is neither going just up or down around $0$. (This example comes from @Angenent.)
""")
```
###### Question
@@ -764,6 +857,76 @@ answ = 4
radioq(choices, answ)
```
###### Question
Consider the following figure of a graph of $f$:
```{julia}
#| echo: false
let
gr()
ex(x) = x * tanh(exp(x))
a, b = -5, 1
plot(ex, a, b, legend=false,
axis=([], false),
line=(:black, 2)
)
plot!([a-.1, b+.1], [0,0], line=(:gray,1), arrow=true, side=:head)
zs = find_zeros(ex, (a, b))
cps = find_zeros(ex', (a, b))
ips = find_zeros(ex'', (a, b))
scatter!(zs, ex.(zs), fill=(:black,), marker=(8, :circle))
scatter!(cps, ex.(cps), fill=(:green,), marker=(8, :diamond))
scatter!(ips, ex.(ips), fill=(:brown3,), marker=(8,:star5))
end
```
```{julia}
#| echo: false
plotly()
nothing
```
The black circle denotes what?
```{julia}
#| hold: true
#| echo: false
choices = [raw"A zero of $f$",
raw"A critical point of $f$",
raw"An inflection point of $f$"]
answ = 1
radioq(choices, answ)
```
The green diamond denotes what?
```{julia}
#| hold: true
#| echo: false
choices = [raw"A zero of $f$",
raw"A critical point of $f$",
raw"An inflection point of $f$"]
answ = 2
radioq(choices, answ)
```
The red stars denotes what?
```{julia}
#| hold: true
#| echo: false
choices = [raw"Zeros of $f$",
raw"Critical points of $f$",
raw"Inflection points of $f$"]
answ = 3
radioq(choices, answ)
```
###### Question
@@ -1031,7 +1194,8 @@ This accurately summarizes how the term is used outside of math books. Does it a
#| echo: false
choices = ["Yes. Same words, same meaning",
"""No, but it is close. An inflection point is when the *acceleration* changes from positive to negative, so if "results" are about how a company's rate of change is changing, then it is in the ballpark."""]
radioq(choices, 2)
answ = 2
radioq(choices, answ)
```
###### Question
@@ -1045,5 +1209,6 @@ The function $f(x) = x^3 + x^4$ has a critical point at $0$ and a second derivat
#| echo: false
choices = ["As ``x^3`` has no extrema at ``x=0``, neither will ``f``",
"As ``x^4`` is of higher degree than ``x^3``, ``f`` will be ``U``-shaped, as ``x^4`` is."]
radioq(choices, 1)
answ = 1
radioq(choices, answ)
```

View File

@@ -1,4 +1,4 @@
# Implicit Differentiation
# Implicit differentiation
{{< include ../_common_code.qmd >}}
@@ -93,6 +93,42 @@ In general though, we may not be able to solve for $y$ in terms of $x$. What the
The idea is to *assume* that $y$ is representable by some function of $x$. This makes sense, moving on the curve from $(x,y)$ to some nearby point, means changing $x$ will cause some change in $y$. This assumption is only made *locally* - basically meaning a complicated graph is reduced to just a small, well-behaved, section of its graph.
::: {#fig-well-behaved-section}
```{julia}
#| echo: false
let
gr()
a = 1
k = 2
F(x,y) = a * (x + a)*(x^2 + y^2) - k*x^2
xs = range(-3/2, 3/2, 100)
ys = range(-2, 2, 100)
contour(xs, ys, F; levels=[0],
axis=([], nothing),
line=(:black, 1),
framestyle=:none, legend=false)
x₀, y₀ = 3/4, -0.2834733547569205
m = (-a^2*x₀ - 3*a*x₀^2/2 - a*y₀^2/2 + k*x₀)/(a*(a + x₀)*y₀)
plot!(x -> y₀ + m*(x - x₀), x₀-0.5, x₀ + 0.5; line=(:gray, 2))
plot!(x -> -x*sqrt(-(a^2 + a*x - k)/(a*(a + x))), -1/8,0.99;
line=(:black,4))
scatter!([x₀], [y₀]; marker=(:circle,5,:yellow))
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Graph of an equation with a well behaved section emphasized. The tangent line can be found by finding a formula for this well behaved section and differentiating *or* by implicit differentiation simply by assuming a form for the implicit function.
:::
With this assumption, asking what $dy/dx$ is has an obvious meaning - what is the slope of the tangent line to the graph at $(x,y)$. (The assumption eliminates the question of what a tangent line would mean when a graph self intersects.)
@@ -120,7 +156,7 @@ This says the slope of the tangent line depends on the point $(x,y)$ through the
As a check, we compare to what we would have found had we solved for $y= \sqrt{1 - x^2}$ (for $(x,y)$ with $y \geq 0$). We would have found: $dy/dx = 1/2 \cdot 1/\sqrt{1 - x^2} \cdot (-2x)$. Which can be simplified to $-x/y$. This should show that the method above - assuming $y$ is a function of $x$ and differentiating - is not only more general, but can even be easier.
The name - *implicit differentiation* - comes from the assumption that $y$ is implicitly defined in terms of $x$. According to the [Implicit Function Theorem](http://en.wikipedia.org/wiki/Implicit_function_theorem) the above method will work provided the curve has sufficient smoothness near the point $(x,y)$.
The name - *implicit differentiation* - comes from the assumption that $y$ is implicitly defined in terms of $x$. According to the [Implicit Function Theorem](http://en.wikipedia.org/wiki/Implicit_function_theorem) the above method will work provided the curve has sufficient smoothness near the point $(x,y)$. (Continuously differentiable and non vanishing derivative in $y$.)
##### Examples
@@ -140,10 +176,16 @@ For $a = 2, b=1$ we have the graph:
#| hold: true
a, b = 2, 1
f(x,y) = x^2*y + a * b * y - a^2 * x
implicit_plot(f)
implicit_plot(f; legend=false)
x₀, y₀ = 0, 0
m = (a^2 - 2x₀*y₀) / (a*b + x₀^2)
plot!(x -> y₀ + m*(x - x₀), -1, 1)
```
We can see that at each point in the viewing window the tangent line exists due to the smoothness of the curve. Moreover, at a point $(x,y)$ the tangent will have slope $dy/dx$ satisfying:
To the plot we added a tangent line at $(0,0)$.
We can see that at each point in the viewing window the tangent line exists due to the smoothness of the curve. To find the slope of the tangent line at a point $(x,y)$ the tangent line will have slope $dy/dx$ satisfying:
$$
@@ -177,7 +219,7 @@ A graph for $a=3$ shows why it has the name it does:
#| hold: true
a = 3
f(x,y) = x^4 - a^2*(x^2 - y^2)
implicit_plot(f)
implicit_plot(f; xticks=-5:5)
```
The tangent line at $(x,y)$ will have slope, $dy/dx$ satisfying:
@@ -341,7 +383,7 @@ The next step is solve for $dy/dx$ - the lone answer to the linear equation - wh
```{julia}
dydx = diff(u(x), x)
ex3 = solve(ex2, dydx)[1] # pull out lone answer with [1] indexing
ex3 = only(solve(ex2, dydx)) # pull out the only answer
```
As this represents an answer in terms of `u(x)`, we replace that term with the original variable:
@@ -369,9 +411,9 @@ Let $a = b = c = d = 1$, then $(1,4)$ is a point on the curve. We can draw a tan
```{julia}
H = ex(a=>1, b=>1, c=>1, d=>1)
x0, y0 = 1, 4
𝒎 = dydx₁(x=>1, y=>4, a=>1, b=>1, c=>1, d=>1)
m = dydx₁(x=>1, y=>4, a=>1, b=>1, c=>1, d=>1)
implicit_plot(lambdify(H); xlims=(-5,5), ylims=(-5,5), legend=false)
plot!(y0 + 𝒎 * (x-x0))
plot!(y0 + m * (x-x0))
```
Basically this includes all the same steps as if done "by hand." Some effort could have been saved in plotting, had values for the parameters been substituted initially, but not doing so shows their dependence in the derivative.
@@ -379,7 +421,7 @@ Basically this includes all the same steps as if done "by hand." Some effort cou
:::{.callout-warning}
## Warning
The use of `lambdify(H)` is needed to turn the symbolic expression, `H`, into a function.
The use of `lambdify(H)` is needed to turn the symbolic expression, `H`, into a function for plotting purposes.
:::
@@ -416,7 +458,7 @@ $$
6x - (6y \frac{dy}{dx} \cdot \frac{dy}{dx} + 3y^2 \frac{d^2y}{dx^2}) = 0.
$$
Again, if must be that $d^2y/dx^2$ appears as a linear factor, so we can solve for it:
Again, it must be that $d^2y/dx^2$ appears as a linear factor, so we can solve for it:
$$
@@ -456,7 +498,7 @@ eqn = K(x,y)
eqn1 = eqn(y => u(x))
dydx = solve(diff(eqn1,x), diff(u(x), x))[1] # 1 solution
d2ydx2 = solve(diff(eqn1, x, 2), diff(u(x),x, 2))[1] # 1 solution
eqn2 = d2ydx2(diff(u(x), x) => dydx, u(x) => y)
eqn2 = subs(d2ydx2, diff(u(x), x) => dydx, u(x) => y)
simplify(eqn2)
```
@@ -517,15 +559,9 @@ This could have been made easier, had we leveraged the result of the previous ex
#### Example: from physics
Many problems are best done with implicit derivatives. A video showing such a problem along with how to do it analytically is [here](http://ocw.mit.edu/courses/mathematics/18-01sc-single-variable-calculus-fall-2010/unit-2-applications-of-differentiation/part-b-optimization-related-rates-and-newtons-method/session-32-ring-on-a-string/).
This video starts with a simple question:
> If you have a rope and heavy ring, where will the ring position itself due to gravity?
This problem illustrates one best done with implicit derivatives. A video showing this problem along with how to do it analytically is [here](http://ocw.mit.edu/courses/mathematics/18-01sc-single-variable-calculus-fall-2010/unit-2-applications-of-differentiation/part-b-optimization-related-rates-and-newtons-method/session-32-ring-on-a-string/).
Well, suppose you hold the rope in two places, which we can take to be $(0,0)$ and $(a,b)$. Then let $(x,y)$ be all the possible positions of the ring that hold the rope taught. Then we have this picture:
@@ -534,19 +570,41 @@ Well, suppose you hold the rope in two places, which we can take to be $(0,0)$ a
```{julia}
#| hold: true
#| echo: false
let
gr()
P = (4,1)
Q = (1, -3)
scatter([0,4], [0,1], legend=false, xaxis=nothing, yaxis=nothing)
plot!([0,1,4],[0,-3,1])
𝑎, 𝑏= .05, .25
plot(;
axis=([],false),
legend=false)
scatter!([0,4], [0,1])
plot!([0,1,4],[0,-3,1]; line=(:black,2))
a, b = .05, .25
ts = range(0, 2pi, length=100)
plot!(1 .+ 𝑎*sin.(ts), -3 .+ 𝑏*cos.(ts), color=:gold)
annotate!((4-0.3,1,"(a,b)"))
plot!([0,1,1],[0,0,-3], color=:gray, alpha=0.25)
plot!([1,1,4],[0,1,1], color=:gray, alpha=0.25)
Δ = 0.15
annotate!([(1/2, 0-Δ, "x"), (5/2, 1 - Δ, "a-x"), (1-Δ, -1, "|y|"), (1+Δ, -1, "b-y")])
plot!(1 .+ a*sin.(ts), -3 .+ b*cos.(ts), line=(:gold,2))
plot!([0,1,1],[0,0,-3], color=:gray, alpha=0.75)
plot!([1,1,4],[0,1,1], color=:gray, alpha=0.75)
Δ = 0.05
annotate!([
(0,0, text(L"(0,0)",:bottom)),
(4,1, text(L"(a,b)",:bottom)),
(1/2, 0, text(L"x",:top)),
(5/2, 1, text(L"a-x", :top)),
(1, -1, text(L"|y|",:right)),
(1+Δ, -1, text(L"b-y",:left)),
(1+2a, -3, text(L"(x,y)",:left))
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Since the length of the rope does not change, we must have for any admissible $(x,y)$ that:
@@ -637,7 +695,7 @@ Okay, now we need to put this value back into our expression for the `x` value a
```{julia}
xstar = N(cps[2](y => ystar, a =>3, b => 3, L => 3))
xstar = N(cps[2](y => ystar, a =>3, b => 3))
```
Our minimum is at `(xstar, ystar)`, as this graphic shows:

View File

@@ -1,4 +1,4 @@
# L'Hospital's Rule
# L'Hospital's rule
{{< include ../_common_code.qmd >}}
@@ -28,17 +28,17 @@ We know this is $1$ using a bound from geometry, but might also guess this is on
$$
\sin(x) = x - \sin(\xi)x^2/2, \quad 0 < \xi < x.
\sin(x) = x - \sin(\xi)\frac{x^2}{2}, \quad 0 < \xi < x.
$$
This would yield:
$$
\lim_{x \rightarrow 0} \frac{\sin(x)}{x} = \lim_{x\rightarrow 0} \frac{x -\sin(\xi) x^2/2}{x} = \lim_{x\rightarrow 0} 1 - \sin(\xi) \cdot x/2 = 1.
\lim_{x \rightarrow 0} \frac{\sin(x)}{x} = \lim_{x\rightarrow 0} \frac{x -\sin(\xi) \frac{x^2}{2}}{x} = \lim_{x\rightarrow 0} 1 - \sin(\xi) \cdot \frac{x}{2} = 1.
$$
This is because we know $\sin(\xi) x/2$ has a limit of $0$, when $|\xi| \leq |x|$.
This is because we know $\sin(\xi) \frac{x}{2}$ has a limit of $0$, when $|\xi| \leq |x|$.
That doesn't look any easier, as we worried about the error term, but if just mentally replaced $\sin(x)$ with $x$ - which it basically is near $0$ - then we can see that the limit should be the same as $x/x$ which we know is $1$ without thinking.
@@ -52,15 +52,18 @@ Wouldn't that be nice? We could find difficult limits just by differentiating th
Well, in fact that is more or less true, a fact that dates back to [L'Hospital](http://en.wikipedia.org/wiki/L%27H%C3%B4pital%27s_rule) - who wrote the first textbook on differential calculus - though this result is likely due to one of the Bernoulli brothers.
::: {.callout-note icon=false}
## L'Hospital's rule
> *L'Hospital's rule*: Suppose:
>
> * that $\lim_{x\rightarrow c+} f(c) =0$ and $\lim_{x\rightarrow c+} g(c) =0$,
> * that $f$ and $g$ are differentiable in $(c,b)$, and
> * that $g(x)$ exists and is non-zero for *all* $x$ in $(c,b)$,
>
> then **if** the following limit exists: $\lim_{x\rightarrow c+}f'(x)/g'(x)=L$ it follows that $\lim_{x \rightarrow c+}f(x)/g(x) = L$.
Suppose:
* that $\lim_{x\rightarrow c+} f(c) =0$ and $\lim_{x\rightarrow c+} g(c) =0$,
* that $f$ and $g$ are differentiable in $(c,b)$, and
* that $g(x)$ exists and is non-zero for *all* $x$ in $(c,b)$,
then **if** the following limit exists: $\lim_{x\rightarrow c+}f'(x)/g'(x)=L$ it follows that $\lim_{x \rightarrow c+}f(x)/g(x) = L$.
:::
That is *if* the right limit of $f(x)/g(x)$ is indeterminate of the form $0/0$, but the right limit of $f'(x)/g'(x)$ is known, possibly by simple continuity, then the right limit of $f(x)/g(x)$ exists and is equal to that of $f'(x)/g'(x)$.
@@ -112,7 +115,7 @@ $$
\lim_{x \rightarrow 0} \frac{e^x - e^{-x}}{x}.
$$
It too is of the indeterminate form $0/0$. The derivative of the top is $e^x + e^{-x}$, which is $2$ when $x=0$, so the ratio of $f'(0)/g'(0)$ is seen to be $2$ By continuity, the limit of the ratio of the derivatives is $2$. Then by L'Hospital's rule, the limit above is $2$.
It too is of the indeterminate form $0/0$. The derivative of the top is $e^x + e^{-x}$, which is $2$ when $x=0$, so the ratio of $f'(0)/g'(0)$ is seen to be $2$. By continuity, the limit of the ratio of the derivatives is $2$. Then by L'Hospital's rule, the limit above is $2$.
* Sometimes, L'Hospital's rule must be applied twice. Consider this limit:
@@ -308,23 +311,25 @@ L'Hospital's rule generalizes to other indeterminate forms, in particular the in
The value $c$ in the limit can also be infinite. Consider this case with $c=\infty$:
$$
\begin{align*}
\lim_{x \rightarrow \infty} \frac{f(x)}{g(x)} &=
\lim_{x \rightarrow 0} \frac{f(1/x)}{g(1/x)}
\end{align*}
$$
L'Hospital's limit applies as $x \rightarrow 0$, so we differentiate to get:
$$
\begin{align*}
\lim_{x \rightarrow 0} \frac{[f(1/x)]'}{[g(1/x)]'}
&= \lim_{x \rightarrow 0} \frac{f'(1/x)\cdot(-1/x^2)}{g'(1/x)\cdot(-1/x^2)}\\
&= \lim_{x \rightarrow 0} \frac{f'(1/x)}{g'(1/x)}\\
&= \lim_{x \rightarrow \infty} \frac{f'(x)}{g'(x)},
\end{align*}
$$
*assuming* the latter limit exists, L'Hospital's rule assures the equality
@@ -379,10 +384,10 @@ the first equality by L'Hospital's rule, as the second limit exists.
Indeterminate forms of the type $0 \cdot \infty$, $0^0$, $\infty^\infty$, $\infty - \infty$ can be re-expressed to be in the form $0/0$ or $\infty/\infty$ and then L'Hospital's theorem can be applied.
###### Example: rewriting $0 \cdot \infty$
##### Example: rewriting $0 \cdot \infty$
What is the limit $x \log(x)$ as $x \rightarrow 0+$? The form is $0\cdot \infty$, rewriting, we see this is just:
What is the limit of $x \log(x)$ as $x \rightarrow 0+$? The form is $0\cdot \infty$, rewriting, we see this is just:
$$
@@ -396,10 +401,10 @@ $$
\lim_{x \rightarrow 0+}\frac{1/x}{-1/x^2} = \lim_{x \rightarrow 0+} -x = 0.
$$
###### Example: rewriting $0^0$
##### Example: rewriting $0^0$
What is the limit $x^x$ as $x \rightarrow 0+$? The expression is of the form $0^0$, which is indeterminate. (Even though floating point math defines the value as $1$.) We can rewrite this by taking a log:
What is the limit of $x^x$ as $x \rightarrow 0+$? The expression is of the form $0^0$, which is indeterminate. (Even though floating point math defines the value as $1$.) We can rewrite this by taking a log:
$$
@@ -415,11 +420,12 @@ Be just saw that $\lim_{x \rightarrow 0+}\log(x)/(1/x) = 0$. So by the rules for
A limit $\lim_{x \rightarrow c} f(x) - g(x)$ of indeterminate form $\infty - \infty$ can be reexpressed to be of the from $0/0$ through the transformation:
$$
\begin{align*}
f(x) - g(x) &= f(x)g(x) \cdot (\frac{1}{g(x)} - \frac{1}{f(x)}) \\
&= \frac{\frac{1}{g(x)} - \frac{1}{f(x)}}{\frac{1}{f(x)g(x)}}.
\end{align*}
$$
Applying this to
@@ -438,7 +444,7 @@ $$
\lim_{x \rightarrow 1} \frac{x\log(x)-(x-1)}{(x-1)\log(x)}
$$
In `SymPy` we have (using italic variable names to avoid a problem with the earlier use of `f` as a function):
In `SymPy` we have:
```{julia}
@@ -814,7 +820,7 @@ $$
#| hold: true
#| echo: false
choices = [
"``e^{-2/\\pi}``",
"``e^{2/\\pi}``",
"``{2\\pi}``",
"``1``",
"``0``",

View File

@@ -11,6 +11,7 @@ using CalculusWithJulia
using Plots
plotly()
using SymPy
using Roots
using TaylorSeries
using DualNumbers
```
@@ -183,18 +184,50 @@ In each of these cases, a more complicated non-linear function is well approxim
```{julia}
#| hold: true
#| echo: false
f(x) = sin(x)
a, b = -1/4, pi/2
#| label: fig-tangent-dy-dx
#| fig-cap: "Graph with tangent line layered on"
let
gr()
f(x) = sin(x)
a, b = -1/4, pi/2
p = plot(f, a, b, legend=false);
plot!(p, x->x, a, b);
plot!(p, [0,1,1], [0, 0, 1], color=:brown);
plot!(p, [1,1], [0, sin(1)], color=:green, linewidth=4);
annotate!(p, collect(zip([1/2, 1+.075, 1/2-1/8], [.05, sin(1)/2, .75], ["Δx", "Δy", "m=dy/dx"])));
p = plot(f, a, b, legend=false,
line=(3, :royalblue),
axis=([], false)
);
plot!(p, x->x, a, b);
plot!(p, [0,1,1], [0, 0, 1], color=:brown);
plot!(p, [1,1], [0, sin(1)], color=:green, linewidth=4);
x₀ = 1.15
δ = 0.1
plot!(p, [x₀,x₀,1], [sin(1)/2-δ,0,0], line=(:black, 1, :dash), arrow=true)
plot!(p, [x₀,x₀,1], [sin(1)/2+δ,1, 1], line=(:black, 1, :dash), arrow=true)
plot!(p, [1/2 - 0.8δ, 0], [-δ, -δ]*3/4, line=(:black, 1, :dash), arrow=true)
plot!(p, [1/2 + 0.8δ, 1], [-δ, -δ]*3/4, line=(:black, 1, :dash), arrow=true)
scatter!([0], [0], marker=(5, :mediumorchid3))
annotate!(p, [
(0, f(0), text(L"(c, f(c))", :bottom, :right)),
(1/2, 0, text(L"\Delta x", :bottom)),
(1/2, 0, text(L"dx", :top)),
(1-0.02, sin(1)/2, text(L"Δ y", :right)),
(x₀, sin(1)/2, text(L"dy")),
(2/3, 2/3, text(L"m = \frac{dy}{dx} \approx \frac{\Delta y}{\Delta x}",
:bottom, rotation=33)) # why 33 and not 45?
])
p
end
```
The plot shows the tangent line with slope $dy/dx$ and the actual change in $y$, $\Delta y$, for some specified $\Delta x$. The small gap above the sine curve is the error were the value of the sine approximated using the drawn tangent line. We can see that approximating the value of $\Delta y = \sin(c+\Delta x) - \sin(c)$ with the often easier to compute $(dy/dx) \cdot \Delta x = f'(c)\Delta x$ - for small enough values of $\Delta x$ - is not going to be too far off provided $\Delta x$ is not too large.
```{julia}
#| echo: false
plotly()
nothing
```
The plot in @fig-tangent-dy-dx shows a tangent line with slope $dy/dx$ and the actual change in $y$, $\Delta y$, for some specified $\Delta x$ at a point $(c,f(c))$. The small gap above the sine curve is the error were the value of the sine approximated using the drawn tangent line. We can see that approximating the value of $\Delta y = \sin(c+\Delta x) - \sin(c)$ with the often easier to compute $(dy/dx) \cdot \Delta x = f'(c)\Delta x$ - for small enough values of $\Delta x$ - is not going to be too far off provided $\Delta x$ is not too large.
This approximation is known as linearization. It can be used both in theoretical computations and in practical applications. To see how effective it is, we look at some examples.
@@ -467,7 +500,8 @@ To see formally why the remainder is as it is, we recall the mean value theorem
$$
\text{error} = h(x) - h(0) = (g(x) - g(0)) \frac{h'(e)}{g'(e)} = x^2 \cdot \frac{1}{2} \cdot \frac{f'(e) - f'(0)}{e} =
\text{error} = h(x) - h(0) = (g(x) - g(0)) \frac{h'(e)}{g'(e)} =
(x^2 - 0) \cdot \frac{f'(e) - f'(0)}{2e} =
x^2 \cdot \frac{1}{2} \cdot f''(\xi).
$$
@@ -510,21 +544,23 @@ $$
Suppose $f(x)$ and $g(x)$ are represented by their tangent lines about $c$, respectively:
$$
\begin{align*}
f(x) &= f(c) + f'(c)(x-c) + \mathcal{O}((x-c)^2), \\
g(x) &= g(c) + g'(c)(x-c) + \mathcal{O}((x-c)^2).
\end{align*}
$$
Consider the sum, after rearranging we have:
$$
\begin{align*}
f(x) + g(x) &= \left(f(c) + f'(c)(x-c) + \mathcal{O}((x-c)^2)\right) + \left(g(c) + g'(c)(x-c) + \mathcal{O}((x-c)^2)\right)\\
&= \left(f(c) + g(c)\right) + \left(f'(c)+g'(c)\right)(x-c) + \mathcal{O}((x-c)^2).
\end{align*}
$$
The two big "Oh" terms become just one as the sum of a constant times $(x-c)^2$ plus a constant time $(x-c)^2$ is just some other constant times $(x-c)^2$. What we can read off from this is the term multiplying $(x-c)$ is just the derivative of $f(x) + g(x)$ (from the sum rule), so this too is a tangent line approximation.
@@ -533,14 +569,16 @@ The two big "Oh" terms become just one as the sum of a constant times $(x-c)^2$
Is it a coincidence that a basic algebraic operation with tangent lines approximations produces a tangent line approximation? Let's try multiplication:
$$
\begin{align*}
f(x) \cdot g(x) &= [f(c) + f'(c)(x-c) + \mathcal{O}((x-c)^2)] \cdot [g(c) + g'(c)(x-c) + \mathcal{O}((x-c)^2)]\\
&=[f(c) + f'(c)(x-c)] \cdot [g(c) + g'(c)(x-c)] + (f(c) + f'(c)(x-c)) \cdot \mathcal{O}((x-c)^2) + (g(c) + g'(c)(x-c)) \cdot \mathcal{O}((x-c)^2) + [\mathcal{O}((x-c)^2)]^2\\
&=[f(c) + f'(c)(x-c)] \cdot [g(c) + g'(c)(x-c)] \\
&+ (f(c) + f'(c)(x-c)) \cdot \mathcal{O}((x-c)^2) + (g(c) + g'(c)(x-c)) \cdot \mathcal{O}((x-c)^2) + [\mathcal{O}((x-c)^2)]^2\\
&= [f(c) + f'(c)(x-c)] \cdot [g(c) + g'(c)(x-c)] + \mathcal{O}((x-c)^2)\\
&= f(c) \cdot g(c) + [f'(c)\cdot g(c) + f(c)\cdot g'(c)] \cdot (x-c) + [f'(c)\cdot g'(c) \cdot (x-c)^2 + \mathcal{O}((x-c)^2)] \\
&= f(c) \cdot g(c) + [f'(c)\cdot g(c) + f(c)\cdot g'(c)] \cdot (x-c) + \mathcal{O}((x-c)^2)
\end{align*}
$$
The big "oh" notation just sweeps up many things including any products of it *and* the term $f'(c)\cdot g'(c) \cdot (x-c)^2$. Again, we see from the product rule that this is just a tangent line approximation for $f(x) \cdot g(x)$.
@@ -614,7 +652,7 @@ Automatic differentiation (forward mode) essentially uses this technique. A "dua
```{julia}
Dual(0, 1)
x = Dual(0, 1)
```
Then what is $x$? It should reflect both $(\sin(0), \cos(0))$ the latter being the derivative of $\sin$. We can see this is *almost* what is computed behind the scenes through:
@@ -622,11 +660,13 @@ Then what is $x$? It should reflect both $(\sin(0), \cos(0))$ the latter being t
```{julia}
#| hold: true
x = Dual(0, 1)
@code_lowered sin(x)
```
This output of `@code_lowered` can be confusing, but this simple case needn't be. Working from the end we see an assignment to a variable named `%7` of `Dual(%3, %6)`. The value of `%3` is `sin(x)` where `x` is the value `0` above. The value of `%6` is `cos(x)` *times* the value `1` above (the `xp`), which reflects the *chain* rule being used. (The derivative of `sin(u)` is `cos(u)*du`.) So this dual number encodes both the function value at `0` and the derivative of the function at `0`.)
This output of `@code_lowered` can be confusing, but this simple case needn't be, as we know what to look for: we need to evaluate `sin` at `1` and carry along the derivative `cos(x)` **times** the derivative at `x`.
The `sin` is computed in `%6` and is passed to `Dual` in `%13` as the first arguments. The `cos` is computed in `%11` and then *multiplied* in `%` by `xp`, which holds the derivative information about `x`. This is passed as the second argument to `Dual` in `%13`.
Similarly, we can see what happens to `log(x)` at `1` (encoded by `Dual(1,1)`):
@@ -638,7 +678,166 @@ x = Dual(1, 1)
@code_lowered log(x)
```
We can see the derivative again reflects the chain rule, it being given by `1/x * xp` where `xp` acts like `dx` (from assignments `%5` and `%4`). Comparing the two outputs, we see only the assignment to `%5` differs, it reflecting the derivative of the function.
We again see `log(x)` being evaluated in line `%6`. The derivative evaluated at `x` is done in line `%11` and this is multiplied by `xp` in line `%12`.
## Curvature
The curvature of a function will be a topic in a later section on differentiable vector calculus, but the concept of linearization can be used to give an earlier introduction.
The tangent line linearizes the function, it being the best linear approximation to the graph of the function at the point. The slope of the tangent line is the limit of the slopes of different secant lines. Consider now, the orthogonal concept, the *normal line* at a point. This is a line perpendicular to the tangent line that goes through the point on the curve.
At a point $(c,f(c))$ the slope of the normal line is $-1/f'(c)$.
Following [Kirby C. Smith](https://doi.org/10.2307/2687102), consider two nearby points on the curve of $f$ and suppose we take the two normal lines at $x=c$ and $x=c+h$. These two curves will intersect if the lines are not parallel. To ensure this, assueme that in some neighborhood of $c$, $f'(c)$ is increasing.
The two normal lines are:
$$
\begin{align*}
y &= f(c) - \frac{1}{f'(c)}(x-c)\\
y &= f(c+h) - \frac{1}{f'(c+h)}(x-(c+h))\\
\end{align*}
$$
Rearranging, we have
$$
\begin{align*}
-f'(c)(y-f(c)) &= x-c\\
-f'(c+h)(y-f(c+h)) &= x-(c+h)
\end{align*}
$$
Call $R$ the intersection point of the two normal lines:
```{julia}
#| echo: false
using Roots
let
gr()
f(x) = x^4
fp(x) = 4x^3
c = 1/4
h = 1/4
nlc(x) = f(c) - 1/fp(c) * (x - c)
nlch(x) = f(c+h) - 1/fp(c+h) * (x-(c+h))
canvas() = plot(axis=([],false), legend=false, aspect_ratio=:equal)
canvas()
plot!(f, 0, 3/4; line=(3,))
plot!(nlc; ylim=(-1/4, 1))
plot!(nlch; ylim=(-1/4, 1))
Rx = find_zero(x -> nlc(x) - nlch(x), (-10, 10))
scatter!([c,c+h], f.([c, c+h]))
scatter!([Rx], [nlc(Rx)])
annotate!([(c, f(c), L"(c,f(c))",:top),
(c+h, f(c+h), L"(c+h, f(c+h))",:bottom),
(Rx, nlc(Rx), L"R",:left)])
end
```
```{julia}
#| echo: false
plotly()
nothing
```
What happens to $R$ as $h \rightarrow 0$?
We can symbolically solve to see:
```{julia}
@syms 𝑓() 𝑓p() 𝑓pp() x y c
n1 = -𝑓p(c)*(y-𝑓(c)) ~ x - c
n2 = -𝑓p(c+)*(y-𝑓(c+)) ~ x - (c+)
R = solve((n1, n2), (x, y))
```
Taking limits of each term as $h$ goes to zero we have after some notation-simplfying substitution:
```{julia}
R = Dict(k => limit(R[k], =>0) for k in (x,y))
Rx = R[x](limit((𝑓(c+)-𝑓(c))/, =>0) => 𝑓p(c),
limit((𝑓p(c+)-𝑓p(c))/, =>0) => 𝑓pp(c))
```
and
```{julia}
Ry = R[y](limit((𝑓(c+)-𝑓(c))/, =>0) => 𝑓p(c),
limit((𝑓p(c+)-𝑓p(c))/, =>0) => 𝑓pp(c))
```
The squared distance, $r^2$, of $R$ to $(c,f(c))$ is then:
```{julia}
simplify((Rx-c)^2 + (Ry-𝑓(c))^2)
```
Or
$$
r^2 = \frac{(f'(c)^2 + 1)^3}{f''(c)^2}.
$$
This formula for $r$ is known as the radius of curvature of $f$ -- the radius of the *circle* that best approximates the function at the point. That is, this value reflects the curvature of $f$ supplementing the tangent line or best *linear* approximation to the graph of $f$ at the point.
```{julia}
#| echo: false
let
gr()
f(x) = x^4
fp(x) = 4x^3
fpp(x) = 12x^2
c = 1/4
h = 1/4
nlc(x) = f(c) - 1/fp(c) * (x - c)
nlch(x) = f(c+h) - 1/fp(c+h) * (x-(c+h))
canvas() = plot(axis=([],false), legend=false, aspect_ratio=:equal)
canvas()
plot!(f, -1/4, 3/4; line=(3,))
tl(x) = f(c) + f'(c)*(x-c)
plot!(tl, ylim=(-1/4, 3/2); line=(2, :dot))
Rx, Ry = c - fp(c)^3 / fpp(c) - fp(c)/fpp(c), f(c) + (fp(c)^2+1)/fpp(c)
r = (fp(c)^2 + 1)^(3/2) / abs(fpp(c))
scatter!([c], f.([c]))
scatter!([Rx], [nlc(Rx)])
annotate!([(c, f(c), L"(c,f(c))",:top),
(Rx, nlc(Rx), L"R",:left)])
Delta = pi/10
theta = range(3pi/2 - Delta, 2pi - 3Delta, length=100)
xs, ys = cos.(theta), sin.(theta)
plot!(Rx .+ r.*xs, Ry .+ r.*ys)
x0s, y0s = [Rx,Rx .+ r * first(xs)],[Ry,Ry .+ r * first(ys)]
xns, yns = [Rx,Rx .+ r * last(xs)],[Ry,Ry .+ r * last(ys)]
xcs, ycs = [Rx,c],[Ry,f(c)]
sty = (2, :0.25, :dash)
plot!(x0s, y0s; line=sty);
plot!(xcs, ycs; line=sty);
plot!(xns, yns; line=sty)
end
```
```{julia}
#| echo: false
plotly()
nothing
```
## Questions
@@ -803,13 +1002,14 @@ numericq(abs(answ))
The [Birthday problem](https://en.wikipedia.org/wiki/Birthday_problem) computes the probability that in a group of $n$ people, under some assumptions, that no two share a birthday. Without trying to spoil the problem, we focus on the calculus specific part of the problem below:
$$
\begin{align*}
p
&= \frac{365 \cdot 364 \cdot \cdots (365-n+1)}{365^n} \\
&= \frac{365(1 - 0/365) \cdot 365(1 - 1/365) \cdot 365(1-2/365) \cdot \cdots \cdot 365(1-(n-1)/365)}{365^n}\\
&= (1 - \frac{0}{365})\cdot(1 -\frac{1}{365})\cdot \cdots \cdot (1-\frac{n-1}{365}).
\end{align*}
$$
Taking logarithms, we have $\log(p)$ is

View File

@@ -1,4 +1,4 @@
# The mean value theorem for differentiable functions.
# The mean value theorem for differentiable functions
{{< include ../_common_code.qmd >}}
@@ -92,9 +92,12 @@ Lest you think that continuous functions always have derivatives except perhaps
We have defined an *absolute maximum* of $f(x)$ over an interval to be a value $f(c)$ for a point $c$ in the interval that is as large as any other value in the interval. Just specifying a function and an interval does not guarantee an absolute maximum, but specifying a *continuous* function and a *closed* interval does, by the extreme value theorem.
> *A relative maximum*: We say $f(x)$ has a *relative maximum* at $c$ if there exists *some* interval $I=(a,b)$ with $a < c < b$ for which $f(c)$ is an absolute maximum for $f$ and $I$.
::: {.callout-note icon=false}
## A relative maximum
We say $f(x)$ has a *relative maximum* at $c$ if there exists *some* interval $I=(a,b)$ with $a < c < b$ for which $f(c)$ is an absolute maximum for $f$ and $I$.
:::
The difference is a bit subtle, for an absolute maximum the interval must also be specified, for a relative maximum there just needs to exist some interval, possibly really small, though it must be bigger than a point.
@@ -139,12 +142,16 @@ For a continuous function $f(x)$, call a point $c$ in the domain of $f$ where ei
We can combine Bolzano's extreme value theorem with Fermat's insight to get the following:
::: {.callout-note icon=false}
## Absolute maxima characterization
> A continuous function on $[a,b]$ has an absolute maximum that occurs at a critical point $c$, $a < c < b$, or an endpoint, $a$ or $b$.
A continuous function on $[a,b]$ has an absolute maximum that occurs at a critical point $c$, $a < c < b$, or an endpoint, $a$ or $b$.
A similar statement holds for an absolute minimum.
:::
A similar statement holds for an absolute minimum. This gives a restricted set of places to look for absolute maximum and minimum values - all the critical points and the endpoints.
The above gives a restricted set of places to look for absolute maximum and minimum values - all the critical points and the endpoints.
It is also the case that all relative extrema occur at a critical point, *however* not all critical points correspond to relative extrema. We will see *derivative tests* that help characterize when that occurs.
@@ -263,10 +270,19 @@ Here the maximum occurs at an endpoint. The critical point $c=0.67\dots$ does no
Let $f(x)$ be differentiable on $(a,b)$ and continuous on $[a,b]$. Then the absolute maximum occurs at an endpoint or where the derivative is $0$ (as the derivative is always defined). This gives rise to:
::: {.callout-note icon=false}
## [Rolle's](http://en.wikipedia.org/wiki/Rolle%27s_theorem) theorem
> *[Rolle's](http://en.wikipedia.org/wiki/Rolle%27s_theorem) theorem*: For $f$ differentiable on $(a,b)$ and continuous on $[a,b]$, if $f(a)=f(b)$, then there exists some $c$ in $(a,b)$ with $f'(c) = 0$.
For $f$ differentiable on $(a,b)$ and continuous on $[a,b]$, if $f(a)=f(b)$, then there exists some $c$ in $(a,b)$ with $f'(c) = 0$.
:::
::: {#fig-l-hospital-144}
![Figure from L'Hospital's calculus book](figures/lhopital-144.png)
Figure from L'Hospital's calculus book showing Rolle's theorem where $c=E$ in the labeling.
:::
This modest observation opens the door to many relationships between a function and its derivative, as it ties the two together in one statement.
@@ -311,38 +327,74 @@ We are driving south and in one hour cover 70 miles. If the speed limit is 65 mi
The mean value theorem is a direct generalization of Rolle's theorem.
::: {.callout-note icon=false}
## Mean value theorem
> *Mean value theorem*: Let $f(x)$ be differentiable on $(a,b)$ and continuous on $[a,b]$. Then there exists a value $c$ in $(a,b)$ where $f'(c) = (f(b) - f(a)) / (b - a)$.
Let $f(x)$ be differentiable on $(a,b)$ and continuous on $[a,b]$. Then there exists a value $c$ in $(a,b)$ where
$$
f'(c) = (f(b) - f(a)) / (b - a).
$$
:::
This says for any secant line between $a < b$ there will be a parallel tangent line at some $c$ with $a < c < b$ (all provided $f$ is differentiable on $(a,b)$ and continuous on $[a,b]$).
This graph illustrates the theorem. The orange line is the secant line. A parallel line tangent to the graph is guaranteed by the mean value theorem. In this figure, there are two such lines, rendered using red.
@fig-mean-value-theorem illustrates the theorem. The secant line between $a$ and $b$ is dashed. For this function there are two values of $c$ where the slope of the tangent line is seen to be the same as the slope of this secant line. At least one is guaranteed by the theorem.
```{julia}
#| hold: true
#| echo: false
f(x) = x^3 - x
a, b = -2, 1.75
m = (f(b) - f(a)) / (b-a)
cps = find_zeros(x -> f'(x) - m, a, b)
#| label: fig-mean-value-theorem
let
# mean value theorem
gr()
f(x) = x^3 -4x^2 + 3x - 1
a, b = -3/4, 3+3/4
plot(; axis=([], nothing),
legend=false,
xlims=(-1.1,4),
framestyle=:none)
y₀ = 0.3 + f(-1)
plot!(f, -1, 4; line=(:black, 2))
plot!([-1.1, 4], y₀*[1,1]; line=(:black, 1), arrow=true, head=:top)
p,q = (a,f(a)), (b, f(b))
scatter!([p,q]; marker=(:circle, 4, :red))
plot!([p,q]; line=(:gray, 2, :dash))
m = (f(b) - f(a))/(b-a)
c₁, c₂ = find_zeros(x -> f'(x) - m, (a,b))
Δ = 2/3
for c ∈ (c₁, c₂)
plot!(tangent(f,c), c-Δ, c+Δ; line=(:gray, 2))
plot!([(c, y₀), (c, f(c))]; line=(:gray, 1, :dash))
end
for c ∈ (a,b)
plot!([(c, y₀), (c, f(c))]; line=(:gray, 1))
end
annotate!([
(a, y₀, text(L"a", :top)),
(b, y₀, text(L"b", :top)),
(c₁, y₀, text(L"c_1", :top)),
(c₂, y₀, text(L"c_2", :top)),
])
current()
p = plot(f, a-1, b+1, linewidth=3, legend=false)
plot!(x -> f(a) + m*(x-a), a-1, b+1, linewidth=3, color=:orange)
scatter!([a,b], [f(a), f(b)])
annotate!([(a, f(a), text("a", :bottom)),
(b, f(b), text("b", :bottom))])
for cp in cps
plot!(x -> f(cp) + f'(cp)*(x-cp), a-1, b+1, color=:red)
end
scatter!(cps, f.(cps))
subsscripts = collect("₀₁₂₃₄₅₆₇₈₉")
annotate!([(cp, f(cp), text("c"*subsscripts[i], :bottom)) for (i,cp) ∈ enumerate(cps)])
p
```
```{julia}
#| echo: false
plotly()
nothing
```
Like Rolle's theorem this is a guarantee that something exists, not a recipe to find it. In fact, the mean value theorem is just Rolle's theorem applied to:
@@ -425,13 +477,20 @@ Suppose it is known that $f'(x)=0$ on some interval $I$ and we take any $a < b$
### The Cauchy mean value theorem
[Cauchy](http://en.wikipedia.org/wiki/Mean_value_theorem#Cauchy.27s_mean_value_theorem) offered an extension to the mean value theorem above. Suppose both $f$ and $g$ satisfy the conditions of the mean value theorem on $[a,b]$ with $g(b)-g(a) \neq 0$, then there exists at least one $c$ with $a < c < b$ such that
[Cauchy](http://en.wikipedia.org/wiki/Mean_value_theorem#Cauchy.27s_mean_value_theorem) offered an extension to the mean value theorem above.
::: {.callout-note icon=false}
## Cauchy mean value theorem
Suppose both $f$ and $g$ satisfy the conditions of the mean value theorem on $[a,b]$ with $g(b)-g(a) \neq 0$, then there exists at least one $c$ with $a < c < b$ such that
$$
f'(c) = g'(c) \cdot \frac{f(b) - f(a)}{g(b) - g(a)}.
$$
:::
The proof follows by considering $h(x) = f(x) - r\cdot g(x)$, with $r$ chosen so that $h(a)=h(b)$. Then Rolle's theorem applies so that there is a $c$ with $h'(c)=0$, so $f'(c) = r g'(c)$, but $r$ can be seen to be $(f(b)-f(a))/(g(b)-g(a))$, which proves the theorem.
@@ -479,7 +538,7 @@ function parametric_fns_graph(n)
xlim=(-1.1,1.1), ylim=(-pi/2-.1, pi/2+.1))
scatter!(plt, [f(ts[end])], [g(ts[end])], color=:orange, markersize=5)
val = @sprintf("% 0.2f", ts[end])
annotate!(plt, [(0, 1, "t = $val")])
annotate!(plt, [(0, 1, L"t = %$val")])
end
caption = L"""

View File

@@ -127,12 +127,13 @@ Though the derivative is related to the slope of the secant line, that is in the
Let $\epsilon_{n+1} = x_{n+1}-\alpha$, where $\alpha$ is assumed to be the *simple* zero of $f(x)$ that the secant method converges to. A [calculation](https://math.okstate.edu/people/binegar/4513-F98/4513-l08.pdf) shows that
$$
\begin{align*}
\epsilon_{n+1} &\approx \frac{x_n-x_{n-1}}{f(x_n)-f(x_{n-1})} \frac{(1/2)f''(\alpha)(\epsilon_n-\epsilon_{n-1})}{x_n-x_{n-1}} \epsilon_n \epsilon_{n-1}\\
& \approx \frac{f''(\alpha)}{2f'(\alpha)} \epsilon_n \epsilon_{n-1}\\
&= C \epsilon_n \epsilon_{n-1}.
\end{align*}
$$
The constant `C` is similar to that for Newton's method, and reveals potential troubles for the secant method similar to those of Newton's method: a poor initial guess (the initial error is too big), the second derivative is too large, the first derivative too flat near the answer.
@@ -185,7 +186,7 @@ Here we use `SymPy` to identify the degree-$2$ polynomial as a function of $y$,
@syms y hs[0:2] xs[0:2] fs[0:2]
H(y) = sum(hᵢ*(y - fs[end])^i for (hᵢ,i) ∈ zip(hs, 0:2))
eqs = [H(fᵢ) ~ xᵢ for (xᵢ, fᵢ) ∈ zip(xs, fs)]
eqs = tuple((H(fᵢ) ~ xᵢ for (xᵢ, fᵢ) ∈ zip(xs, fs))...)
ϕ = solve(eqs, hs)
hy = subs(H(y), ϕ)
```
@@ -279,41 +280,6 @@ We can see it in action on the sine function. Here we pass in $\lambda$, but i
chandrapatla(sin, 3, 4, λ3, verbose=true)
```
```{julia}
#| output: false
#=
The condition `Φ^2 < ξ < 1 - (1-Φ)^2` can be visualized. Assume `a,b=0,1`, `fa,fb=-1/2,1`, Then `c < a < b`, and `fc` has the same sign as `fa`, but what values of `fc` will satisfy the inequality?
XX```{julia}
ξ(c,fc) = (a-b)/(c-b)
Φ(c,fc) = (fa-fb)/(fc-fb)
Φl(c,fc) = Φ(c,fc)^2
Φr(c,fc) = 1 - (1-Φ(c,fc))^2
a,b = 0, 1
fa,fb = -1/2, 1
region = Lt(Φl, ξ) & Lt(ξ,Φr)
plot(region, xlims=(-2,a), ylims=(-3,0))
XX```
When `(c,fc)` is in the shaded area, the inverse quadratic step is chosen. We can see that `fc < fa` is needed.
For these values, this area is within the area where a implicit quadratic step will result in a value between `a` and `b`:
XX```{julia}
l(c,fc) = λ3(fa,fb,fc,a,b,c)
region₃ = ImplicitEquations.Lt(l,b) & ImplicitEquations.Gt(l,a)
plot(region₃, xlims=(-2,0), ylims=(-3,0))
XX```
There are values in the parameter space where this does not occur.
=#
nothing
```
## Tolerances
@@ -349,10 +315,10 @@ One way to think about this is the difference between `x` and the next largest f
For the specific example, `abs(b-a) <= 2eps(m)` means that the gap between `a` and `b` is essentially 2 floating point values from the $x$ value with the smallest $f(x)$ value.
For bracketing methods that is about as good as you can get. However, once floating values are understood, the absolute best you can get for a bracketing interval would be
For bracketing methods that is about as good as you can get. However, once floating point values are understood, the absolute best you can get for a bracketing interval would be
* along the way, a value `f(c)` is found which is *exactly* `0.0`
* along the way, a value `f(c)` is found which evaluates *exactly* to `0.0`
* the endpoints of the bracketing interval are *adjacent* floating point values, meaning the interval can not be bisected and `f` changes sign between the two values.
@@ -368,6 +334,8 @@ chandrapatla(fu, -9, 1, λ3)
Here the issue is `abs(b-a)` is tiny (of the order `1e-119`) but `eps(m)` is even smaller.
> For checking if $x_n \approx x_{n+1}$ both a relative and absolute error should be used unless something else is known.
For non-bracketing methods, like Newton's method or the secant method, different criteria are useful. There may not be a bracketing interval for `f` (for example `f(x) = (x-1)^2`) so the second criteria above might need to be restated in terms of the last two iterates, $x_n$ and $x_{n-1}$. Calling this difference $\Delta = |x_n - x_{n-1}|$, we might stop if $\Delta$ is small enough. As there are scenarios where this can happen, but the function is not at a zero, a check on the size of $f$ is needed.
@@ -381,7 +349,7 @@ First if `f(x_n)` is `0.0` then it makes sense to call `x_n` an *exact zero* of
However, there may never be a value with `f(x_n)` exactly `0.0`. (The value of `sin(1pi)` is not zero, for example, as `1pi` is an approximation to $\pi$, as well the `sin` of values adjacent to `float(pi)` do not produce `0.0` exactly.)
Suppose `x_n` is the closest floating number to $\alpha$, the zero. Then the relative rounding error, $($ `x_n` $- \alpha)/\alpha$, will be a value $\delta$ with $\delta$ less than `eps()`.
Suppose `x_n` is the closest floating point number to $\alpha$, the zero. Then the relative rounding error, $($ `x_n` $- \alpha)/\alpha$, will be a value $\delta$ with $\delta$ less than `eps()`.
How far then can `f(x_n)` be from $0 = f(\alpha)$?
@@ -398,10 +366,11 @@ $$
f(x_n) \approx f(\alpha) + f'(\alpha) \cdot (\alpha\delta) = f'(\alpha) \cdot \alpha \delta
$$
So we should consider `f(x_n)` an *approximate zero* when it is on the scale of $f'(\alpha) \cdot \alpha \delta$.
So we should consider `f(x_n)` an *approximate zero* when it is on the scale of $f'(\alpha) \cdot \alpha \delta$. That $\alpha$ factor means we consider a *relative* tolerance for `f`.
> For checking if $f(x_n) \approx 0$ both a relative and absolute error should be used---the relative error involving the size of $x_n$.
That $\alpha$ factor means we consider a *relative* tolerance for `f`. Also important when `x_n` is close to `0`, is the need for an *absolute* tolerance, one not dependent on the size of `x`. So a good condition to check if `f(x_n)` is small is
A good condition to check if `f(x_n)` is small is
`abs(f(x_n)) <= abs(x_n) * rtol + atol`, or `abs(f(x_n)) <= max(abs(x_n) * rtol, atol)`
@@ -426,6 +395,96 @@ So a modified criteria for convergence might look like:
It is not uncommon to assign `rtol` to have a value like `sqrt(eps())` to account for accumulated floating point errors and the factor of $f'(\alpha)$, though in the `Roots` package it is set smaller by default.
### Conditioning and stability
In Part III of @doi:10.1137/1.9781611977165 we find language of numerical analysis useful to formally describe the zero-finding problem. Key concepts are errors, conditioning, and stability. These give some theoretical justification for the tolerances above.
Abstractly a *problem* is a mapping, $F$, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm (denoted with $\lVert\cdot\rVert$) is a generalization of the absolute value and gives quantitative meaning to terms like small and large.
> A *well-conditioned* problem is one with the property that all small perturbations of $x$ lead to only small changes in $F(x)$.
This sense of "small" is measured through a *condition number*.
If we let $\delta_x$ be a small perturbation of $x$ then $\delta_F = F(x + \delta_x) - F(x)$.
The *forward error* is $\lVert\delta_F\rVert = \lVert F(x+\delta_x) - F(x)\rVert$, the *relative forward error* is $\lVert\delta_F\rVert/\lVert F\rVert = \lVert F(x+\delta_x) - F(x)\rVert/ \lVert F(x)\rVert$.
The *backward error* is $\lVert\delta_x\rVert$, the *relative backward error* is $\lVert\delta_x\rVert / \lVert x\rVert$.
The *absolute condition number* $\hat{\kappa}$ is worst case of this ratio $\lVert\delta_F\rVert/ \lVert\delta_x\rVert$ as the perturbation size shrinks to $0$.
The relative condition number $\kappa$ divides $\lVert\delta_F\rVert$ by $\lVert F(x)\rVert$ and $\lVert\delta_x\rVert$ by $\lVert x\rVert$ before taking the ratio.
A *problem* is a mathematical concept, an *algorithm* the computational version. Algorithms may differ for many reasons, such as floating point errors, tolerances, etc. We use notation $\tilde{F}$ to indicate the algorithm.
The absolute error in the algorithm is $\lVert\tilde{F}(x) - F(x)\rVert$, the relative error divides by $\lVert F(x)\rVert$. A good algorithm would have smaller relative errors.
An algorithm is called *stable* if
$$
\frac{\lVert\tilde{F}(x) - F(\tilde{x})\rVert}{\lVert F(\tilde{x})\rVert}
$$
is *small* for *some* $\tilde{x}$ relatively near $x$, $\lVert\tilde{x}-x\rVert/\lVert x\rVert$.
> A *stable* algorithm gives nearly the right answer to nearly the right question.
(The answer it gives is $\tilde{F}(x)$, the nearly right question: what is $F(\tilde{x})$?)
A related concept is an algorithm $\tilde{F}$ for a problem $F$ is *backward stable* if for each $x \in X$,
$$
\tilde{F}(x) = F(\tilde{x})
$$
for some $\tilde{x}$ where $\lVert\tilde{x} - x\rVert/\lVert x\rVert$ is small.
> "A backward stable algorithm gives exactly the right answer to nearly the right question."
The concepts are related by Trefethen and Bao's Theorem 15.1 which says for a backward stable algorithm the relative error $\lVert\tilde{F}(x) - F(x)\rVert/\lVert F(x)\rVert$ is small in a manner proportional to the relative condition number.
Applying this to the zero-finding we follow @doi:10.1137/1.9781611975086.
To be specific, the problem, $F$, is finding a zero of a function $f$ starting at an initial point $x_0$. The data is $(f, x_0)$, the solution is $r$ a zero of $f$.
Take the algorithm as Newton's method. Any implementation must incorporate tolerances, so this is a computational approximation to the problem. The data is the same, but technically we use $\tilde{f}$ for the function, as any computation is dependent on machine implementations. The output is $\tilde{r}$ an *approximate* zero.
Suppose for sake of argument that $\tilde{f}(x) = f(x) + \epsilon$, $f$ has a continuous derivative, and $r$ is a root of $f$ and $\tilde{r}$ is a root of $\tilde{f}$. Then by linearization:
$$
\begin{align*}
0 &= \tilde{f}(\tilde r) \\
&= f(r + \delta) + \epsilon\\
&\approx f(r) + f'(r)\delta + \epsilon\\
&= 0 + f'(r)\delta + \epsilon
\end{align*}
$$
Rearranging gives $\lVert\delta/\epsilon\rVert \approx 1/\lVert f'(r)\rVert$. But the $|\delta|/|\epsilon|$ ratio is related to the the condition number:
> The absolute condition number is $\hat{\kappa}_r = |f'(r)|^{-1}$.
The error formula in Newton's method measuring the distance between the actual root and an approximation includes the derivative in the denominator, so we see large condition numbers are tied into possibly larger errors.
Now consider $g(x) = f(x) - f(\tilde{r})$. Call $f(\tilde{r})$ the residual. We have $g$ is near $f$ if the residual is small. The algorithm will solve $(g, x_0)$ with $\tilde{r}$, so with a small residual an exact solution to an approximate question will be found. Driscoll and Braun state
> The backward error in a root estimate is equal to the residual.
Practically these two observations lead to
* If there is a large condition number, it may not be possible to find an approximate root near the real root.
* A tolerance in an algorithm should consider both the size of $x_{n} - x_{n-1}$ and the residual $f(x_n)$.
For the first observation, the example of Wilkinson's polynomial is often used where $f(x) = (x-1)\cdot(x-2)\cdot \cdots\cdot(x-20)$. When expanded this function has exactness issues of typical floating point values, the condition number is large and some of the roots found are quite different from the mathematical values.
The second observation follows from $f(x_n)$ monitoring the backward error and the product of the condition number and the backward error monitoring the forward error. This product is on the order of $|f(x_n)/f'(x_n)|$ or $|x_{n+1} - x_n|$.
## Questions

View File

@@ -69,7 +69,7 @@ x₃ = (babylon ∘ babylon ∘ babylon)(2//1)
x₃, x₃^2.0
```
This is now accurate to the sixth decimal point. That is about as far as we, or the Bablyonians, would want to go by hand. Using rational numbers quickly grows out of hand. The next step shows the explosion.
This is now accurate to the sixth decimal point. That is about as far as we, or the Babylonians, would want to go by hand. Using rational numbers quickly grows out of hand. The next step shows the explosion.
```{julia}
@@ -178,15 +178,18 @@ x4, f(x4), f(x3)
We see now that $f(x_4)$ is within machine tolerance of $0$, so we call $x_4$ an *approximate zero* of $f(x)$.
::: {.callout-note icon=false}
## Newton's method
> **Newton's method:** Let $x_0$ be an initial guess for a zero of $f(x)$. Iteratively define $x_{i+1}$ in terms of the just generated $x_i$ by:
>
> $$
> x_{i+1} = x_i - f(x_i) / f'(x_i).
> $$
>
> Then for reasonable functions and reasonable initial guesses, the sequence of points converges to a zero of $f$.
Let $x_0$ be an initial guess for a zero of $f(x)$. Iteratively define $x_{i+1}$ in terms of the just generated $x_i$ by:
$$
x_{i+1} = x_i - f(x_i) / f'(x_i).
$$
Then for reasonable functions and reasonable initial guesses, the sequence of points converges to a zero of $f$.
:::
On the computer, we know that actual convergence will likely never occur, but accuracy to a certain tolerance can often be achieved.
@@ -206,7 +209,12 @@ In practice, the algorithm is implemented not by repeating the update step a fix
:::{.callout-note}
## Note
Newton looked at this same example in 1699 (B.T. Polyak, *Newton's method and its use in optimization*, European Journal of Operational Research. 02/2007; 181(3):1086-1096.) though his technique was slightly different as he did not use the derivative, *per se*, but rather an approximation based on the fact that his function was a polynomial (though identical to the derivative). Raphson (1690) proposed the general form, hence the usual name of the Newton-Raphson method.
Newton looked at this same example in 1699 (B.T. Polyak, *Newton's method and its use in optimization*, European Journal of Operational Research. 02/2007; 181(3):1086-1096.; and Deuflhard *Newton Methods for Nonlinear Problems: Affine Invariance and Adaptive Algorithms*) though his technique was slightly different as he did not use the derivative, *per se*, but rather an approximation based on the fact that his function was a polynomial.
We can read that he guessed the answer was ``2 + p``, as there is a sign change between $2$ and $3$. Newton put this guess into the polynomial to get after simplification ``p^3 + 6p^2 + 10p - 1``. This has an **approximate** zero found by solving the linear part ``10p-1 = 0``. Taking ``p = 0.1`` he then can say the answer looks like ``2 + p + q`` and repeat to get ``q^3 + 6.3q^2 + 11.23q + 0.061 = 0``. Again taking just the linear part estimates `q = -0.005431...`. After two steps the estimate is `2.094568...`. This can be continued by expressing the answer as ``2 + p + q + r`` and then solving for an estimate for ``r``.
Raphson (1690) proposed a simplification avoiding the computation of new polynomials, hence the usual name of the Newton-Raphson method. Simpson introduced derivatives into the formulation and systems of equations.
:::
@@ -216,40 +224,49 @@ Newton looked at this same example in 1699 (B.T. Polyak, *Newton's method and it
##### Example: visualizing convergence
This graphic demonstrates the method and the rapid convergence:
@fig-newtons-method demonstrates the method and the rapid convergence:
```{julia}
#| echo: false
function newtons_method_graph(n, f, a, b, c)
nothing
function newtons_method_graph(n, f, a, b, c; label=false)
xstars = [c]
xs = [c]
ys = [0.0]
plt = plot(f, a, b, legend=false, size=fig_size)
plt = plot(f, a, b, legend=false, size=fig_size,
line = (:royalblue, 3),
axis = ([], false)
)
plot!(plt, [a, b], [0,0], color=:black)
ts = range(a, stop=b, length=50)
for i in 1:n
x0 = xs[end]
x1 = x0 - f(x0)/D(f)(x0)
x1 = x0 - f(x0)/f'(x0)
push!(xstars, x1)
append!(xs, [x0, x1])
append!(ys, [f(x0), 0])
end
plot!(plt, xs, ys, color=:orange)
scatter!(plt, xstars, 0*xstars, color=:orange, markersize=5)
if label
subs = collect("₁₂₃₄₅₆₇₈₉")
labs = ["x$(subs[i])" for i in eachindex(xstars)]
annotate!(collect(zip(xstars, 0*xstars, labs,[:bottom for _ in xstars])))
end
plt
end
nothing
```
```{julia}
#| hold: true
#| echo: false
#| cache: true
#| label: fig-newtons-method
### {{{newtons_method_example}}}
gr()
caption = """
@@ -262,7 +279,7 @@ n = 6
fn, a, b, c = x->log(x), .15, 2, .2
anim = @animate for i=1:n
newtons_method_graph(i-1, fn, a, b, c)
newtons_method_graph(i-1, fn, a, b, c; label=true)
end
imgfile = tempname() * ".gif"
@@ -392,6 +409,24 @@ x, f(x)
To machine tolerance the answer is a zero, even though the exact answer is irrational and all finite floating point values can be represented as rational numbers.
##### Example non-polynomial
The first example by Newton of applying the method to a non-polynomial function was solving an equation from astronomy: $x - e \sin(x) = M$, where $e$ is an eccentric anomaly and $M$ a mean anomaly. Newton used polynomial approximations for the trigonometric functions, here we can solve directly.
Let $e = 1/2$ and $M = 3/4$. With $f(x) = x - e\sin(x) - M$ then $f'(x) = 1 - e \cos(x)$. Starting at 1, Newton's method for 3 steps becomes:
```{julia}
ec, M = 0.5, 0.75
f(x) = x - ec * sin(x) - M
fp(x) = 1 - ec * cos(x)
x = 1
x = x - f(x) / fp(x)
x = x - f(x) / fp(x)
x = x - f(x) / fp(x)
x, f(x)
```
##### Example
@@ -429,7 +464,6 @@ end
So it takes $8$ steps to get an increment that small and about `10` steps to get to full convergence.
##### Example division as multiplication
@@ -456,7 +490,7 @@ $$
x_{i+1} = x_i - (1/x_i - q)/(-1/x_i^2) = -qx^2_i + 2x_i.
$$
Now for $q$ in the interval $[1/2, 1]$ we want to get a *good* initial guess. Here is a claim. We can use $x_0=48/17 - 32/17 \cdot q$. Let's check graphically that this is a reasonable initial approximation to $1/q$:
Now for $q$ in the interval $[1/2, 1]$ we want to get a *good* initial guess. Here is a claim: we can use $x_0=48/17 - 32/17 \cdot q$. Let's check graphically that this is a reasonable initial approximation to $1/q$:
```{julia}
@@ -662,7 +696,7 @@ If $M$ were just a constant and we suppose $e_0 = 10^{-1}$ then $e_1$ would be l
To identify $M$, let $\alpha$ be the zero of $f$ to be approximated. Assume
* The function $f$ has at continuous second derivative in a neighborhood of $\alpha$.
* The function $f$ has a continuous second derivative in a neighborhood of $\alpha$.
* The value $f'(\alpha)$ is *non-zero* in the neighborhood of $\alpha$.
@@ -686,7 +720,7 @@ $$
For this value, we have
$$
\begin{align*}
x_{i+1} - \alpha
&= \left(x_i - \frac{f(x_i)}{f'(x_i)}\right) - \alpha\\
@@ -696,6 +730,7 @@ x_{i+1} - \alpha
\right)\\
&= \frac{1}{2}\frac{f''(\xi)}{f'(x_i)} \cdot(x_i - \alpha)^2.
\end{align*}
$$
That is
@@ -830,7 +865,7 @@ The function $f(x) = x^{20} - 1$ has two bad behaviours for Newton's
method: for $x < 1$ the derivative is nearly $0$ and for $x>1$ the
second derivative is very big. In this illustration, we have an
initial guess of $x_0=8/9$. As the tangent line is fairly flat, the
next approximation is far away, $x_1 = 1.313\dots$. As this guess is
next approximation is far away, $x_1 = 1.313\dots$. As this guess
is much bigger than $1$, the ratio $f(x)/f'(x) \approx
x^{20}/(20x^{19}) = x/20$, so $x_i - f(x_i)/f'(x_i) \approx (19/20)x_i$
yielding slow, linear convergence until $f''(x_i)$ is moderate. For
@@ -998,7 +1033,7 @@ Let $f(x) = x^2 - 3^x$. This has derivative $2x - 3^x \cdot \log(3)$. Starting w
f(x) = x^2 - 3^x;
fp(x) = 2x - 3^x*log(3);
val = Roots.newton(f, fp, 0);
numericq(val, 1e-14)
numericq(val, 1e-1)
```
###### Question
@@ -1389,7 +1424,7 @@ yesnoq("no")
###### Question
Quadratic convergence of Newton's method only applies to *simple* roots. For example, we can see (using the `verbose=true` argument to the `Roots` package's `newton` method, that it only takes $4$ steps to find a zero to $f(x) = \cos(x) - x$ starting at $x_0 = 1$. But it takes many more steps to find the same zero for $f(x) = (\cos(x) - x)^2$.
Quadratic convergence of Newton's method only applies to *simple* roots. For example, we can see (using the `verbose=true` argument to the `Roots` package's `newton` method), that it only takes $4$ steps to find a zero to $f(x) = \cos(x) - x$ starting at $x_0 = 1$. But it takes many more steps to find the same zero for $f(x) = (\cos(x) - x)^2$.
How many?
@@ -1419,7 +1454,7 @@ implicit_plot(f, xlims=(-2,2), ylims=(-2,2), legend=false)
Can we find which point on its graph has the largest $y$ value?
This would be straightforward *if* we could write $y(x) = \dots$, for then we would simply find the critical points and investiate. But we can't so easily solve for $y$ interms of $x$. However, we can use Newton's method to do so:
This would be straightforward *if* we could write $y(x) = \dots$, for then we would simply find the critical points and investigate. But we can't so easily solve for $y$ interms of $x$. However, we can use Newton's method to do so:
```{julia}

View File

@@ -70,7 +70,7 @@ function perimeter_area_graphic_graph(n)
size=fig_size,
xlim=(0,10), ylim=(0,10))
scatter!(plt, [w], [h], color=:orange, markersize=5)
annotate!(plt, [(w/2, h/2, "Area=$(round(w*h,digits=1))")])
annotate!(plt, [(w/2, h/2, L"Area$=\; %$(round(w*h,digits=1))$")])
plt
end
@@ -79,7 +79,7 @@ caption = """
Some possible rectangles that satisfy the constraint on the perimeter and their area.
"""
n = 6
n = 5
anim = @animate for i=1:n
perimeter_area_graphic_graph(i-1)
end
@@ -187,8 +187,11 @@ ts = range(0, stop=pi, length=50)
x1,y1 = 4, 4.85840
x2,y2 = 3, 6.1438
delta = 4
p = plot(delta .+ x1*[0, 1,1,0], y1*[0,0,1,1], linetype=:polygon, fillcolor=:blue, legend=false)
plot!(p, x2*[0, 1,1,0], y2*[0,0,1,1], linetype=:polygon, fillcolor=:blue)
p = plot(delta .+ x1*[0, 1,1,0], y1*[0,0,1,1];
linetype=:polygon, fillcolor=:blue, legend=false,
aspect_ratio=:equal)
plot!(p, x2*[0, 1,1,0], y2*[0,0,1,1];
linetype=:polygon, fillcolor=:blue)
plot!(p, delta .+ x1/2 .+ x1/2*cos.(ts), y1.+x1/2*sin.(ts), linetype=:polygon, fillcolor=:red)
plot!(p, x2/2 .+ x2/2*cos.(ts), y2 .+ x2/2*sin.(ts), linetype=:polygon, fillcolor=:red)
@@ -302,20 +305,20 @@ We could also do the above problem symbolically with the aid of `SymPy`. Here ar
```{julia}
@syms 𝐰::real 𝐡::real
@syms w₀::real h₀::real
𝐀₀ = 𝐰 * 𝐡 + pi * (𝐰/2)^2 / 2
𝐏erim = 2*𝐡 + 𝐰 + pi * 𝐰/2
𝐡₀ = solve(𝐏erim - 20, 𝐡)[1]
𝐀₁ = 𝐀₀(𝐡 => 𝐡₀)
𝐰₀ = solve(diff(𝐀₁,𝐰), 𝐰)[1]
A₀ = w₀ * h₀ + pi * (w₀/2)^2 / 2
Perim = 2*h₀ + w₀ + pi * w₀/2
h₁ = solve(Perim - 20, h₀)[1]
A₁ = A₀(h₀ => h₁)
w₁ = solve(diff(A₁,w₀) ~ 0, w₀)[1]
```
We know that `𝐰₀` is the maximum in this example from our previous work. We shall see soon, that just knowing that the second derivative is negative at `𝐰₀` would suffice to know this. Here we check that condition:
We know that `w₀` is the maximum in this example from our previous work. We shall see soon, that just knowing that the second derivative is negative at `w₀` would suffice to know this. Here we check that condition:
```{julia}
diff(𝐀₁, 𝐰, 𝐰)(𝐰 => 𝐰₀)
diff(A₁, w₀, w₀)(w₀ => w₁)
```
As an aside, compare the steps involved above for a symbolic solution to those of previous work for a numeric solution:
@@ -392,14 +395,29 @@ The figure shows a ladder of length $l_1 + l_2$ that got stuck - it was too long
```{julia}
#| hold: true
#| echo: false
p = plot([0, 0, 15], [15, 0, 0], color=:blue, legend=false)
plot!(p, [5, 5, 15], [15, 8, 8], color=:blue)
plot!(p, [0,14.53402874075368], [12.1954981558864, 0], linewidth=3)
plot!(p, [0,5], [8,8], color=:orange)
plot!(p, [5,5], [0,8], color=:orange)
annotate!(p, [(13, 1/2, "θ"),
(2.5, 11, "l₂"), (10, 5, "l₁"), (2.5, 7.0, "l₂ ⋅ cos(θ)"),
(5.1, 4, "l₁ ⋅ sin(θ)")])
let
gr()
p = plot([0, 0, 15], [15, 0, 0],
xticks = [0,5, 15],
yticks = [0,8, 12],
line=(:blue, 2),
legend=false)
plot!(p, [5, 5, 15], [15, 8, 8]; line=(:blue,2))
plot!(p, [0,14.53402874075368], [12.1954981558864, 0], linewidth=3)
plot!(p, [0,5], [8,8], color=:orange)
plot!(p, [5,5], [0,8], color=:orange)
annotate!(p, [(13, 1/2, L"\theta"),
(2.5, 11, L"l_2"),
(10, 5, L"l_1"),
(2.5, 7.0, L"l_2 \cos(\theta)"),
(5.1, 4, text(L"l_1 \sin(\theta)", :top,rotation=90))])
end
```
```{julia}
#| echo: false
plotly()
nothing
```
We approach this problem in reverse. It is easy to see when a ladder is too long. It gets stuck at some angle $\theta$. So for each $\theta$ we find that ladder length that is just too long. Then we find the minimum length of all these ladders that are too long. If a ladder is this length or more it will get stuck for some angle. However, if it is less than this length it will not get stuck. So to maximize a ladder length, we minimize a different function. Neat.
@@ -614,7 +632,7 @@ We see two terms: one with $x=L$ and another quadratic. For the simple case $r_0
```{julia}
solve(q(r1=>r0), x)
solve(q(r1=>r0) ~ 0, x)
```
Well, not so fast. We need to check the other endpoint, $x=0$:
@@ -632,28 +650,28 @@ Now, if, say, travel above the line is half as slow as travel along, then $2r_0
```{julia}
out = solve(q(r1 => 2r0), x)
out = solve(q(r1 => 2r0) ~ 0, x)
```
It is hard to tell which would minimize time without more work. To check a case ($a=1, L=2, r_0=1$) we might have
```{julia}
x_straight = t(r1 =>2r0, b=>0, x=>out[1], a=>1, L=>2, r0 => 1) # for x=L
x_straight = subs(t, r1 =>2r0, b=>0, x=>out[1], a=>1, L=>2, r0 => 1) # for x=L
```
Compared to the smaller ($x=\sqrt{3}a/3$):
```{julia}
x_angle = t(r1 =>2r0, b=>0, x=>out[2], a=>1, L=>2, r0 => 1)
x_angle = subs(t, r1 =>2r0, b=>0, x=>out[2], a=>1, L=>2, r0 => 1)
```
What about $x=0$?
```{julia}
x_bent = t(r1 =>2r0, b=>0, x=>0, a=>1, L=>2, r0 => 1)
x_bent = subs(t, r1 =>2r0, b=>0, x=>0, a=>1, L=>2, r0 => 1)
```
The value of $x=\sqrt{3}a/3$ minimizes time:
@@ -671,7 +689,7 @@ Will this approach always be true? Consider different parameters, say we switch
```{julia}
pts = [0, out...]
m,i = findmin([t(r1 =>2r0, b=>0, x=>u, a=>2, L=>1, r0 => 1) for u in pts]) # min, index
m,i = findmin([subs(t, r1 =>2r0, b=>0, x=>u, a=>2, L=>1, r0 => 1) for u in pts]) # min, index
m, pts[i]
```
@@ -681,7 +699,7 @@ Here traveling directly to the point $(L,0)$ is fastest. Though travel is slower
## Unbounded domains
Maximize the function $xe^{-(1/2) x^2}$ over the interval $[0, \infty)$.
Maximize the function $xe^{-x^2}$ over the interval $[0, \infty)$.
Here the extreme value theorem doesn't technically apply, as we don't have a closed interval. However, **if** we can eliminate the endpoints as candidates, then we should be able to convince ourselves the maximum must occur at a critical point of $f(x)$. (If not, then convince yourself for all sufficiently large $M$ the maximum over $[0,M]$ occurs at a critical point, not an endpoint. Then let $M$ go to infinity.
@@ -834,10 +852,12 @@ A rancher with $10$ meters of fence wishes to make a pen adjacent to an existing
```{julia}
#| hold: true
#| echo: false
p = plot(; legend=false, aspect_ratio=:equal, axis=nothing, border=:none)
p = plot(; legend=false, aspect_ratio=:equal, axis=nothing, border=:none)
plot!([0,10, 10, 0, 0], [0,0,10,10,0]; linewidth=3)
plot!(p, [10,14,14,10], [2, 2, 8,8]; linewidth = 1)
annotate!(p, [(15, 5, "x"), (12,1, "y")])
annotate!(p, [(14-0.1, 5, text("x", :right)), (12,2, text("y",:bottom))])
p
```
@@ -997,7 +1017,7 @@ A rain gutter is constructed from a 30" wide sheet of tin by bending it into thi
2 * (1/2 * 10*cos(pi/4) * 10 * sin(pi/4)) + 10*sin(pi/4) * 10
```
Find a value in degrees that gives the maximum. (The first task is to write the area in terms of $\theta$.
Find a value in degrees that gives the maximum. (The first task is to write the area in terms of $\theta$.)
```{julia}
@@ -1049,7 +1069,7 @@ plot!(p, [0, 30,30,0], [0,10,30,0], color=:orange)
annotate!(p, [(x,y,l) for (x,y,l) in zip([15, 5, 31, 31], [1.5, 3.5, 5, 20], ["x=30", "θ", "10", "20"])])
```
What value of $x$ gives the largest angle $\theta$? (In degrees.)
What value of the largest angle $\theta$ that $x$ gives? (In degrees.)
```{julia}
@@ -1094,7 +1114,7 @@ radioq(choices, answ)
##### Question
Let $x_1$, $x_2$, $x_n$ be a set of unspecified numbers in a data set. Form the expression $s(x) = (x-x_1)^2 + \cdots (x-x_n)^2$. What is the smallest this can be (in $x$)?
Let $x_1$, $x_2$, $\dots, x_n$ be a set of unspecified numbers in a data set. Form the expression $s(x) = (x-x_1)^2 + \cdots + (x-x_n)^2$. What is the smallest this can be (in $x$)?
We approach this using `SymPy` and $n=10$
@@ -1108,7 +1128,7 @@ s(x) = sum((x-xi)^2 for xi in xs)
cps = solve(diff(s(x), x), x)
```
Run the above code. Baseed on the critical points found, what do you guess will be the minimum value in terms of the values $x_1$, $x_2, \dots$?
Run the above code. Based on the critical points found, what do you guess will be the minimum value in terms of the values $x_1$, $x_2, \dots$?
```{julia}
@@ -1117,7 +1137,7 @@ Run the above code. Baseed on the critical points found, what do you guess will
choices=[
"The mean, or average, of the values",
"The median, or middle number, of the values",
L"The square roots of the values squared, $(x_1^2 + \cdots x_n^2)^2$"
L"The square roots of the values squared, $(x_1^2 + \cdots + x_n^2)^2$"
]
answ = 1
radioq(choices, answ)
@@ -1126,7 +1146,7 @@ radioq(choices, answ)
###### Question
Minimize the function $f(x) = 2x + 3/x$ over $(0, \infty)$.
Find $x$ to minimize the function $f(x) = 2x + 3/x$ over $(0, \infty)$.
```{julia}
@@ -1190,7 +1210,7 @@ The width is:
w(h) = 12_000 / h
S(w, h) = (w- 2*8) * (h - 2*32)
S(h) = S(w(h), h)
hstar =find_zero(D(S), 500)
hstar = find_zero(D(S), 200)
wstar = w(hstar)
numericq(wstar)
```
@@ -1204,7 +1224,7 @@ The height is?
w(h) = 12_000 / h
S(w, h) = (w- 2*8) * (h - 2*32)
S(h) = S(w(h), h)
hstar =find_zero(D(S), 500)
hstar = find_zero(D(S), 200)
numericq(hstar)
```
@@ -1353,7 +1373,12 @@ p = 1/2
x = a/p
plot!(plt, [0, b*(1+p), 0, 0], [0, 0, a+x, 0])
plot!(plt, [b,b,0,0],[0,a,a,0])
annotate!(plt, [(b/2,0, "b"), (0,a/2,"a"), (0,a+x/2,"x"), (b+b*p/2,0,"bp")])
annotate!(plt, [
(b/2,0, text("b",:top)),
(0,a/2, text("a",:right)),
(0,a+x/2, text("x",:right)),
(b+b*p/2,0, text("bp",:top))
])
plt
```
@@ -1372,11 +1397,12 @@ solve(x/b ~ (x+a)/(b + b*p), x)
With $x = a/p$ we get by Pythagorean's theorem that
$$
\begin{align*}
c^2 &= (a + a/p)^2 + (b + bp)^2 \\
&= a^2(1 + \frac{1}{p})^2 + b^2(1+p)^2.
\end{align*}
$$
The ladder problem minimizes $c$ or equivalently $c^2$.
@@ -1481,7 +1507,7 @@ a = find_zero(y', 1)
numericq(a)
```
Numerically find the value of $a$ that minimizes the length of the line seqment $PQ$.
Numerically find the value of $a$ that minimizes the length of the line segment $PQ$.
```{julia}

View File

@@ -18,7 +18,7 @@ using SymPy
---
Related rates problems involve two (or more) unknown quantities that are related through an equation. As the two variables depend on each other, also so do their rates - change with respect to some variable which is often time, though exactly how remains to be discovered. Hence the name "related rates."
Related rates problems involve two (or more) unknown quantities that are related through an equation. As the two variables depend on each other, also so do their rates - change with respect to some variable which is often time. Exactly how remains to be discovered. Hence the name "related rates."
#### Examples
@@ -27,7 +27,7 @@ Related rates problems involve two (or more) unknown quantities that are related
The following is a typical "book" problem:
> A screen saver displays the outline of a $3$ cm by $2$ cm rectangle and then expands the rectangle in such a way that the $2$ cm side is expanding at the rate of $4$ cm/sec and the proportions of the rectangle never change. How fast is the area of the rectangle increasing when its dimensions are $12$ cm by $8$ cm? [Source.](http://oregonstate.edu/instruct/mth251/cq/Stage9/Practice/ratesProblems.html)
> A *vintage* screen saver displays the outline of a $3$ cm by $2$ cm rectangle and then expands the rectangle in such a way that the $2$ cm side is expanding at the rate of $4$ cm/sec and the proportions of the rectangle never change. How fast is the area of the rectangle increasing when its dimensions are $12$ cm by $8$ cm? [Source.](http://oregonstate.edu/instruct/mth251/cq/Stage9/Practice/ratesProblems.html)
@@ -125,7 +125,7 @@ w(t) = 2 + 4*t
```{julia}
h(t) = 3/2 * w(t)
h(t) = 3 * w(t) / 2
```
This means again that area depends on $t$ through this formula:
@@ -198,6 +198,50 @@ A ladder, with length $l$, is leaning against a wall. We parameterize this probl
If the ladder starts to slip away at the base, but remains in contact with the wall, express the rate of change of $h$ with respect to $t$ in terms of $db/dt$.
```{julia}
#| echo: false
let
gr()
l = 12
b = 6
h = sqrt(l^2 - b^2)
plot(;
axis=([],false),
legend=false,
aspect_ratio=:equal)
P,Q = (0,h),(b,0)
w = 0.2
S = Shape([-w,0,0,-w],[0,0,h+1,h+1])
plot!(S; fillstyle=:/, fillcolor=:gray80, fillalpha=0.5)
R = Shape([-w,b+2,b+2,-w],[-w,-w,0,0])
plot!(R, fill=(:gray, 0.25))
plot!([P,Q]; line=(:black, 2))
scatter!([P,Q])
b = b + 3/2
h = sqrt(l^2 - b^2)
plot!([b,b],[0,0]; arrow=true, side=:head, line=(:blue, 3))
plot!([0,0], [h,h]; arrow=true, side=:head, line=(:blue, 3))
annotate!([
(b,-w,text(L"(b(t),0)",:top)),
(-w, h, text(L"(0,h(t))", :bottom, rotation=90)),
(b/2, h/2, text(L"L", rotation = -atand(h,b), :bottom))
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
We have from implicitly differentiating in $t$ the equation $l^2 = h^2 + b^2$, noting that $l$ is a constant, that:
@@ -236,7 +280,7 @@ As $b$ goes to $l$, $h$ goes to $0$, so $b/h$ blows up. Unless $db/dt$ goes to $
:::{.callout-note}
## Note
Often, this problem is presented with $db/dt$ having a constant rate. In this case, the ladder problem defies physics, as $dh/dt$ eventually is faster than the speed of light as $h \rightarrow 0+$. In practice, were $db/dt$ kept at a constant, the ladder would necessarily come away from the wall. The trajectory would follow that of a tractrix were there no gravity to account for.
Often, this problem is presented with $db/dt$ having a constant rate. In this case, the ladder problem defies physics, as $dh/dt$ eventually is faster than the speed of light as $h \rightarrow 0+$. In practice, were $db/dt$ kept at a constant, the ladder would necessarily come away from the wall.
:::
@@ -247,12 +291,15 @@ Often, this problem is presented with $db/dt$ having a constant rate. In this ca
```{julia}
#| hold: true
#| echo: false
#| eval: false
caption = "A man and woman walk towards the light."
imgfile = "figures/long-shadow-noir.png"
ImageFile(:derivatives, imgfile, caption)
```
![A man and woman walk towards the light](./figures/long-shadow-noir.png)
Shadows are a staple of film noir. In the photo, suppose a man and a woman walk towards a street light. As they approach the light the length of their shadow changes.
@@ -340,7 +387,7 @@ This can be solved for the unknown: $dx/dt = 50/20$.
A batter hits a ball toward third base at $75$ ft/sec and runs toward first base at a rate of $24$ ft/sec. At what rate does the distance between the ball and the batter change when $2$ seconds have passed?
We will answer this with `SymPy`. First we create some symbols for the movement of the ball towards third base, `b(t)`, the runner toward first base, `r(t)`, and the two velocities. We use symbolic functions for the movements, as we will be differentiating them in time:
We will answer this symbolically. First we create some symbols for the movement of the ball towards third base, `b(t)`, the runner toward first base, `r(t)`, and the two velocities. We use symbolic functions for the movements, as we will be differentiating them in time:
```{julia}

View File

@@ -10,15 +10,6 @@ This section uses the `TermInterface` add-on package.
using TermInterface
```
```{julia}
#| echo: false
const frontmatter = (
title = "Symbolic derivatives",
description = "Calculus with Julia: Symbolic derivatives",
tags = ["CalculusWithJulia", "derivatives", "symbolic derivatives"],
);
```
---

View File

@@ -1,4 +1,4 @@
# Taylor Polynomials and other Approximating Polynomials
# Taylor polynomials, series, and approximating polynomials
{{< include ../_common_code.qmd >}}
@@ -42,12 +42,14 @@ gr()
taylor(f, x, c, n) = series(f, x, c, n+1).removeO()
function make_taylor_plot(u, a, b, k)
k = 2k
plot(u, a, b, title="plot of T_$k", linewidth=5, legend=false, size=fig_size, ylim=(-2,2.5))
if k == 1
plot!(zero, range(a, stop=b, length=100))
else
plot!(taylor(u, x, 0, k), range(a, stop=b, length=100))
end
plot(u, a, b;
title = L"plot of $T_{%$k}$",
line = (:black, 3),
legend = false,
size = fig_size,
ylim = (-2,2.5))
fn = k == 1 ? zero : taylor(u, x, 0, k)
plot!(fn, range(a, stop=b, length=100); line=(:red,2))
end
@@ -76,7 +78,7 @@ ImageFile(imgfile, caption)
## The secant line and the tangent line
We approach this general problem **much** more indirectly than is needed. We introducing notations that are attributed to Newton and proceed from there. By leveraging `SymPy` we avoid tedious computations and *hopefully* gain some insight.
Heads up: we approach this general problem **much** more indirectly than is needed by introducing notations that are attributed to Newton and proceed from there. By leveraging `SymPy` we avoid tedious computations and *hopefully* gain some insight.
Suppose $f(x)$ is a function which is defined in a neighborhood of $c$ and has as many continuous derivatives as we care to take at $c$.
@@ -102,7 +104,10 @@ $$
tl(x) = f(c) + f'(c) \cdot(x - c).
$$
The key is the term multiplying $(x-c)$ for the secant line is an approximation to the related term for the tangent line. That is, the secant line approximates the tangent line, which is the linear function that best approximates the function at the point $(c, f(c))$. This is quantified by the *mean value theorem* which states under our assumptions on $f(x)$ that there exists some $\xi$ between $x$ and $c$ for which:
The key is the term multiplying $(x-c)$---for the secant line this is an approximation to the related term for the tangent line. That is, the secant line approximates the tangent line, which is the linear function that best approximates the function at the point $(c, f(c))$.
This is quantified by the *mean value theorem* which states under our assumptions on $f(x)$ that there exists some $\xi$ between $x$ and $c$ for which:
$$
@@ -115,15 +120,16 @@ The term "best" is deserved, as any other straight line will differ at least in
(This is a consequence of Cauchy's mean value theorem with $F(c) = f(c) - f'(c)\cdot(c-x)$ and $G(c) = (c-x)^2$
$$
\begin{align*}
\frac{F'(\xi)}{G'(\xi)} &=
\frac{f'(\xi) - f''(\xi)(\xi-x) - f(\xi)\cdot 1}{2(\xi-x)} \\
\frac{f'(\xi) - f''(\xi)(\xi-x) - f'(\xi)\cdot 1}{2(\xi-x)} \\
&= -f''(\xi)/2\\
&= \frac{F(c) - F(x)}{G(c) - G(x)}\\
&= \frac{f(c) - f'(c)(c-x) - (f(x) - f'(x)(x-x))}{(c-x)^2 - (x-x)^2} \\
&= \frac{f(c) + f'(c)(x-c) - f(x)}{(x-c)^2}
\end{align*}
$$
That is, $f(x) = f(c) + f'(c)(x-c) + f''(\xi)/2\cdot(x-c)^2$, or $f(x)-tl(x)$ is as described.)
@@ -152,15 +158,16 @@ As in the linear case, there is flexibility in the exact points chosen for the i
---
Now, we take a small detour to define some notation. Instead of writing our two points as $c$ and $c+h,$ we use $x_0$ and $x_1$. For any set of points $x_0, x_1, \dots, x_n$, define the **divided differences** of $f$ inductively, as follows:
Now, we take a small detour to define some notation. Instead of writing our two points as $c$ and $c+h,$ we use $x_0$ and $x_1$. For any set of points $x_0, x_1, \dots, x_n$, recursively define the Newton **divided differences** of $f$ inductively, as follows:
$$
\begin{align*}
f[x_0] &= f(x_0) \\
f[x_0, x_1] &= \frac{f[x_1] - f[x_0]}{x_1 - x_0}\\
\cdots &\\
f[x_0, x_1, x_2, \dots, x_n] &= \frac{f[x_1, \dots, x_n] - f[x_0, x_1, x_2, \dots, x_{n-1}]}{x_n - x_0}.
\end{align*}
$$
We see the first two values look familiar, and to generate more we just take certain ratios akin to those formed when finding a secant line.
@@ -187,7 +194,7 @@ function divided_differences(f, x, xs...)
end
```
In the following, by adding a `getindex` method, we enable the `[]` notation of Newton to work with symbolic functions, like `u()` defined below, which is used in place of $f$:
In the following---even though it is *type piracy*---by adding a `getindex` method, we enable the `[]` notation of Newton to work with symbolic functions, like `u()` defined below, which is used in place of $f$:
```{julia}
@@ -197,66 +204,56 @@ Base.getindex(u::SymFunction, xs...) = divided_differences(u, xs...)
ex = u[c, c+h]
```
We can take a limit and see the familiar (yet differently represented) value of $u'(c)$:
A limit as $h\rightarrow 0$ would show a value of $u'(c)$.
```{julia}
limit(ex, h => 0)
```
The choice of points is flexible. Here we use $c-h$ and $c+h$:
```{julia}
limit(u[c-h, c+h], h=>0)
```
Now, let's look at:
```{julia}
ex₂ = u[c, c+h, c+2h]
simplify(ex₂)
```
Not so bad after simplification. The limit shows this to be an approximation to the second derivative divided by $2$:
If multiply by $2$ and simplify, a discrete approximation for the second derivative---the second order forward [difference equation](http://tinyurl.com/n4235xy)---is seen:
```{julia}
limit(ex₂, h => 0)
simplify(2ex₂)
```
(The expression is, up to a divisor of $2$, the second order forward [difference equation](http://tinyurl.com/n4235xy), a well-known approximation to $f''$.)
This relationship between higher-order divided differences and higher-order derivatives generalizes. This is expressed in this [theorem](http://tinyurl.com/zjogv83):
> Suppose $m=x_0 < x_1 < x_2 < \dots < x_n=M$ are distinct points. If $f$ has $n$ continuous derivatives then there exists a value $\xi$, where $m < \xi < M$, satisfying:
:::{.callout-note}
## Mean value theorem for Divided differences
Suppose $m=x_0 < x_1 < x_2 < \dots < x_n=M$ are distinct points. If $f$ has $n$ continuous derivatives then there exists a value $\xi$, where $m < \xi < M$, satisfying:
$$
f[x_0, x_1, \dots, x_n] = \frac{1}{n!} \cdot f^{(n)}(\xi).
$$
:::
This immediately applies to the above, where we parameterized by $h$: $x_0=c, x_1=c+h, x_2 = c+2h$. For then, as $h$ goes to $0$, it must be that $m, M \rightarrow c$, and so the limit of the divided differences must converge to $(1/2!) \cdot f^{(2)}(c)$, as $f^{(2)}(\xi)$ converges to $f^{(2)}(c)$.
A proof based on Rolle's theorem appears in the appendix.
## Quadratic approximations; interpolating polynomials
Why the fuss? The answer comes from a result of Newton on *interpolating* polynomials. Consider a function $f$ and $n+1$ points $x_0$, $x_1, \dots, x_n$. Then an interpolating polynomial is a polynomial of least degree that goes through each point $(x_i, f(x_i))$. The [Newton form](https://en.wikipedia.org/wiki/Newton_polynomial) of such a polynomial can be written as:
$$
\begin{align*}
f[x_0] &+ f[x_0,x_1] \cdot (x-x_0) + f[x_0, x_1, x_2] \cdot (x-x_0) \cdot (x-x_1) + \\
& \cdots + f[x_0, x_1, \dots, x_n] \cdot (x-x_0)\cdot \cdots \cdot (x-x_{n-1}).
\end{align*}
$$
The case $n=0$ gives the value $f[x_0] = f(c)$, which can be interpreted as the slope-$0$ line that goes through the point $(c,f(c))$.
@@ -442,7 +439,7 @@ This can be solved to give this relationship:
$$
\frac{d^2\theta}{dt^2} = - \frac{g}{R}\theta.
\frac{d^2\theta}{dt^2} = \frac{g}{R}\theta.
$$
The solution to this "equation" can be written (in some parameterization) as $\theta(t)=A\cos \left(\omega t+\phi \right)$. This motion is the well-studied simple [harmonic oscillator](https://en.wikipedia.org/wiki/Harmonic_oscillator), a model for a simple pendulum.
@@ -485,21 +482,28 @@ On inspection, it is seen that this is Newton's method applied to $f'(x)$. This
Starting with the Newton form of the interpolating polynomial of smallest degree:
$$
\begin{align*}
f[x_0] &+ f[x_0,x_1] \cdot (x - x_0) + f[x_0, x_1, x_2] \cdot (x - x_0)\cdot(x-x_1) + \\
& \cdots + f[x_0, x_1, \dots, x_n] \cdot (x-x_0) \cdot \cdots \cdot (x-x_{n-1}).
\end{align*}
and taking $x_i = c + i\cdot h$, for a given $n$, we have in the limit as $h > 0$ goes to zero that coefficients of this polynomial converge to the coefficients of the *Taylor Polynomial of degree n*:
$$
f(c) + f'(c)\cdot(x-c) + \frac{f''(c)}{2!}(x-c)^2 + \cdots + \frac{f^{(n)}(c)}{n!} (x-c)^n.
$$
This polynomial will be the best approximation of degree $n$ or less to the function $f$, near $c$. The error will be given - again by an application of the Cauchy mean value theorem:
and taking $x_i = c + i\cdot h$, for a given $n$, we have in the limit as $h > 0$ goes to zero that coefficients of this polynomial converge:
:::{.callout-note}
## Taylor polynomial of degree $n$
Suppose $f(x)$ has $n+1$ derivatives (continuous on $c$ and $x$), then
$$
T_n(x) = f(c) + f'(c)\cdot(x-c) + \frac{f''(c)}{2!}(x-c)^2 + \cdots + \frac{f^{(n)}(c)}{n!} (x-c)^n,
$$
will be the best approximation of degree $n$ or less to $f$, near $c$.
The error will be given - again by an application of the Cauchy mean value theorem:
$$
@@ -507,9 +511,10 @@ $$
$$
for some $\xi$ between $c$ and $x$.
:::
The Taylor polynomial for $f$ about $c$ of degree $n$ can be computed by taking $n$ derivatives. For such a task, the computer is very helpful. In `SymPy` the `series` function will compute the Taylor polynomial for a given $n$. For example, here is the series expansion to 10 terms of the function $\log(1+x)$ about $c=0$:
The Taylor polynomial for $f$ about $c$ of degree $n$ can be computed by taking $n$ derivatives. For such a task, the computer is very helpful. In `SymPy` the `series` function will compute the Taylor polynomial for a given $n$. For example, here is the series expansion to $10$ terms of the function $\log(1+x)$ about $c=0$:
```{julia}
@@ -548,7 +553,7 @@ The output of `series` includes a big "Oh" term, which identifies the scale of t
:::{.callout-note}
## Note
A Taylor polynomial of degree $n$ consists of $n+1$ terms and an error term. The "Taylor series" is an *infinite* collection of terms, the first $n+1$ matching the Taylor polynomial of degree $n$. The fact that series are *infinite* means care must be taken when even talking about their existence, unlike a Tyalor polynomial, which is just a polynomial and exists as long as a sufficient number of derivatives are available.
A Taylor polynomial of degree $n$ consists of $n+1$ terms and an error term. The "Taylor series" (below) is an *infinite* collection of terms, the first $n+1$ matching the Taylor polynomial of degree $n$. The fact that series are *infinite* means care must be taken when even talking about their existence, unlike a Taylor polynomial, which is just a polynomial and exists as long as a sufficient number of derivatives are available.
:::
@@ -715,7 +720,7 @@ The height of a [GPS satellite](http://www.gps.gov/systems/gps/space/) is about
```{julia}
Hₛ = 12250 * 1609.34 # 1609 meters per mile
Hₛ = 12550 * 1609.34 # 1609 meters per mile
HRₛ = Hₛ/R
Prealₛ = P0 * (1 + HRₛ)^(3/2)
@@ -747,7 +752,7 @@ Finally, we show how to use the `Unitful` package. This package allows us to def
m, mi, kg, s, hr = u"m", u"mi", u"kg", u"s", u"hr"
G = 6.67408e-11 * m^3 / kg / s^2
H = uconvert(m, 12250 * mi) # unit convert miles to meter
H = uconvert(m, 12550 * mi) # unit convert miles to meter
R = uconvert(m, 3959 * mi)
M = 5.972e24 * kg
@@ -788,7 +793,7 @@ This is re-expressed as $2s + s \cdot p$ with $p$ given by:
```{julia}
cancel((a_b - 2s)/s)
p = cancel((a_b - 2s)/s)
```
Now, $2s = m - s\cdot m$, so the above can be reworked to be $\log(1+m) = m - s\cdot(m-p)$.
@@ -797,36 +802,28 @@ Now, $2s = m - s\cdot m$, so the above can be reworked to be $\log(1+m) = m - s\
(For larger values of $m$, a similar, but different approximation, can be used to minimize floating point errors.)
How big can the error be between this *approximations* and $\log(1+m)$? We plot to see how big $s$ can be:
How big can the error be between this *approximations* and $\log(1+m)$? The expression $m/(2+m)$ increases for $m > 0$, so, on this interval $s$ is as big as
```{julia}
@syms v
plot(v/(2+v), sqrt(2)/2 - 1, sqrt(2)-1)
```
This shows, $s$ is as big as
```{julia}
Max = (v/(2+v))(v => sqrt(2) - 1)
Max = (x/(2+x))(x => sqrt(2) - 1)
```
The error term is like $2/19 \cdot \xi^{19}$ which is largest at this value of $M$. Large is relative - it is really small:
```{julia}
(2/19)*Max^19
(2/19) * Max^19
```
Basically that is machine precision. Which means, that as far as can be told on the computer, the value produced by $2s + s \cdot p$ is about as accurate as can be done.
To try this out to compute $\log(5)$. We have $5 = 2^2(1+0.25)$, so $k=2$ and $m=0.25$.
We try this out to compute $\log(5)$. We have $5 = 2^2(1+ 1/4)$, so $k=2$ and $m=1/4$.
```{julia}
k, m = 2, 0.25
k, m = 2, 1/4
s = m / (2+m)
pₗ = 2 * sum(s^(2i)/(2i+1) for i in 1:8) # where the polynomial approximates the logarithm...
@@ -850,23 +847,25 @@ The actual code is different, as the Taylor polynomial isn't used. The Taylor p
For notational purposes, let $g(x)$ be the inverse function for $f(x)$. Assume *both* functions have a Taylor polynomial expansion:
$$
\begin{align*}
f(x_0 + \Delta_x) &= f(x_0) + a_1 \Delta_x + a_2 (\Delta_x)^2 + \cdots a_n (\Delta_x)^n + \dots\\
g(y_0 + \Delta_y) &= g(y_0) + b_1 \Delta_y + b_2 (\Delta_y)^2 + \cdots b_n (\Delta_y)^n + \dots
f(x_0 + \Delta_x) &= f(x_0) + a_1 \Delta_x + a_2 (\Delta_x)^2 + \cdots + a_n (\Delta_x)^n + \dots\\
g(y_0 + \Delta_y) &= g(y_0) + b_1 \Delta_y + b_2 (\Delta_y)^2 + \cdots + b_n (\Delta_y)^n + \dots
\end{align*}
$$
Then using $x = g(f(x))$, we have expanding the terms and using $\approx$ to drop the $\dots$:
$$
\begin{align*}
x_0 + \Delta_x &= g(f(x_0 + \Delta_x)) \\
&\approx g(f(x_0) + \sum_{j=1}^n a_j (\Delta_x)^j) \\
&\approx g(f(x_0)) + \sum_{i=1}^n b_i \left(\sum_{j=1}^n a_j (\Delta_x)^j \right)^i \\
&\approx x_0 + \sum_{i=1}^{n-1} b_i \left(\sum_{j=1}^n a_j (\Delta_x)^j\right)^i + b_n \left(\sum_{j=1}^n a_j (\Delta_x)^j\right)^n
\end{align*}
$$
That is:
@@ -889,7 +888,7 @@ $$
(This is following [Liptaj](https://vixra.org/pdf/1703.0295v1.pdf)).
We will use `SymPy` to take this limit for the first `4` derivatives. Here is some code that expands $x + \Delta_x = g(f(x_0 + \Delta_x))$ and then uses `SymPy` to solve:
We will use `SymPy` to take this limit for the first `4` derivatives. Here is some code that expands $x_0 + \Delta_x = g(f(x_0 + \Delta_x))$ and then uses `SymPy` to solve:
```{julia}
@@ -936,6 +935,46 @@ eqns[2]
The `solve` function is used to identify $g^{(n)}$ represented in terms of lower-order derivatives of $g$. These values have been computed and stored and are then substituted into `ϕ`. Afterwards a limit is taken and the answer recorded.
## Taylor series
Recall a *power series* has the form $\sum_{n=0}^\infty a_n (x-c)^n$. Power series have a radius of convergence ($|x - c| < r$) for which the series converges and diverges when $|x-c| > r$.
The Taylor polynomial formula can be extended to a formal power series with through
$$
a_n = \frac{f^{n}(c)}{n!}.
$$
If $f(x)$ is equal to the power series within the radius of convergence derivatives of $f(x)$ can be computed by term-by-term differentiation of the power series. The resulting power series will have the same radius of convergence.
Consider the Taylor series for $\sin(x)$ and $\cos(x)$ about $0$:
$$
\begin{align*}
\sin(x) &= x - \frac{x^3}{3!} + \frac{x^5}{5!} + \cdots + (-1)^k\frac{x^{2k+1}}{(2k+1)!} + \cdots ...\\
\cos(x) &= 1 - \frac{x^2}{2!} + \frac{x^4}{4!} + \cdots + (-1)^k\frac{x^{2k}}{(2k)!} + \cdots ...\\
\end{align*}
$$
These both have infinite radius of convergence. Differentiating the power series of $\sin(x)$ term by term gives the power series for $\cos(x)$ as
$$
\left[(-1)^k \frac{x^{2k+1}}{(2k+1)!} \right]' =
(-1)^k \frac{(2k+1) x^{2k}}{(2k+1)!} =
(-1)^k \frac{x^{2k}}{(2k)!}.
$$
Similarly, as the power series for $\sinh(x)$ and $\cosh(x)$ are the same as the above without the alternating signs produced by the $(-1)^k$ term, the term-by-term differentiation of the power series of $\sinh(x)$ produces $\cosh(x)$ and, in this case, vice versa.
The power series for $e^x$ about $0$ has terms $a_k=x^k/k!$. Differentating gives $kx^{k-1}/k! = x^{k-1}/(k-1)!$. The equivalence of the power series for $e^x$ with its term-by-term differentiation requires a simple shift of indices.
The power series for $1/(1-x)$ has terms $a_i = x^i$ for $i \geq 0$. The radius of convergence is $1$. Differentiating term-by-term yields a power series for $1/(1-x)^2$ is $a_i = (i+1)x^i$ for $i \geq 0$, which will have a radius of convergence of $1$ as well.
There are examples (the typical one being $f(x) = e^{-1/x^2}$, defined at $0$ to be $0$) where the function has infinitely many derivatives but the power series and the function are not equal beyond a point. In this example, the function is so flat at $0$ that all its derivatives at $0$ are $0$.
## Questions
@@ -1201,13 +1240,16 @@ $$
$$
h(x)=b_0 + b_1 (x-x_n) + b_2(x-x_n)(x-x_{n-1}) + \cdots + b_n (x-x_n)(x-x_{n-1})\cdot\cdots\cdot(x-x_1).
\begin{align*}
h(x)&=b_0 + b_1 (x-x_n) + b_2(x-x_n)(x-x_{n-1}) + \cdots \\
&+ b_n (x-x_n)(x-x_{n-1})\cdot\cdots\cdot(x-x_1).
\end{align*}
$$
These two polynomials are of degree $n$ or less and have $u(x) = h(x)-g(x)=0$, by uniqueness. So the coefficients of $u(x)$ are $0$. We have that the coefficient of $x^n$ must be $a_n-b_n$ so $a_n=b_n$. Our goal is to express $a_n$ in terms of $a_{n-1}$ and $b_{n-1}$. Focusing on the $x^{n-1}$ term, we have:
$$
\begin{align*}
b_n(x-x_n)(x-x_{n-1})\cdot\cdots\cdot(x-x_1)
&- a_n\cdot(x-x_0)\cdot\cdots\cdot(x-x_{n-1}) \\
@@ -1215,6 +1257,7 @@ b_n(x-x_n)(x-x_{n-1})\cdot\cdots\cdot(x-x_1)
a_n [(x-x_1)\cdot\cdots\cdot(x-x_{n-1})] [(x- x_n)-(x-x_0)] \\
&= -a_n \cdot(x_n - x_0) x^{n-1} + p_{n-2},
\end{align*}
$$
where $p_{n-2}$ is a polynomial of at most degree $n-2$. (The expansion of $(x-x_1)\cdot\cdots\cdot(x-x_{n-1}))$ leaves $x^{n-1}$ plus some lower degree polynomial.) Similarly, we have $a_{n-1}(x-x_0)\cdot\cdots\cdot(x-x_{n-2}) = a_{n-1}x^{n-1} + q_{n-2}$ and $b_{n-1}(x-x_n)\cdot\cdots\cdot(x-x_2) = b_{n-1}x^{n-1}+r_{n-2}$. Combining, we get that the $x^{n-1}$ term of $u(x)$ is

View File

@@ -1,3 +1,7 @@
---
engine: julia
---
# Differential vector calculus
This section discussions generalizations of the derivative to functions which have more than one input and/or one output.

View File

@@ -1,4 +1,5 @@
[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
CalculusWithJulia = "a2e0e22d-7d4c-5312-9169-8b992201a882"
Contour = "d38c429a-6771-53c6-b99e-75d170b6e991"
@@ -7,10 +8,17 @@ DifferentialEquations = "0c46a032-eb83-5123-abaf-570d42b7fbaa"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
PlotlyBase = "a03496cd-edff-5a9b-9e67-9cda94a718b5"
PlotlyKaleido = "f2990250-8cf9-495f-b13a-cce12b45703c"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuadGK = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
ScatteredInterpolation = "3f865c0f-6dca-5f4d-999b-29fe1e7e3c92"
SplitApplyCombine = "03a91e81-4c3e-53e1-a0a4-9c0c8f19dd66"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -9,6 +9,7 @@ files = (
"scalar_functions",
"scalar_functions_applications",
"vector_fields",
"matrix_calculus_notes.qmd",
"plots_plotting",
)

View File

@@ -0,0 +1,931 @@
# Matrix calculus
This section illustrates a more general setting for taking derivatives, that unifies the different expositions taken prior.
::: {.callout-note appearance="minimal"}
## Based on Bright, Edelman, and Johnson's notes
This section has essentially no original contribution, as it basically samples material from the notes [Matrix Calculus (for Machine Learning and Beyond)](https://arxiv.org/abs/2501.14787) by Paige Bright, Alan Edelman, and Steven G. Johnson. Their notes cover material taught in a course at MIT. Support materials for their course in `Julia` are available at [https://github.com/mitmath/matrixcalc/tree/main](https://github.com/mitmath/matrixcalc/tree/main). For more details and examples, please refer to the source.
:::
## Review
We have seen several "derivatives" of a function, based on the number of inputs and outputs. The first one was for functions $f: R \rightarrow R$.
In this case, we saw that $f$ has a derivative at $c$ if this limit exists:
$$
\lim_{h \rightarrow 0}\frac{f(c + h) - f(c)}{h}.
$$
The derivative as a function of $x$ uses this rule for any $x$ in the domain.
Common notation is:
$$
f'(x) = \frac{dy}{dx} = \lim_{h \rightarrow 0}\frac{f(x + h) - f(x)}{h}
$$
(when the limit exists).
This limit gets re-expressed in different ways:
* linearization writes $f(x+\Delta x) - f(x) \approx f'(x)\Delta x$, where $\Delta x$ is a small displacement from $x$. The reason there isn't equality is the unwritten higher order terms that vanish in a limit.
* Alternate limits. Another way of writing this is in terms of explicit smaller order terms:
$$
(f(x+h) - f(x)) - f'(x)h = \mathscr{o}(h),
$$
which means if we divide both sides by $h$ and take the limit, we will get $0$ on the right and the relationship on the left.
* Differential notation simply writes this as $dy = f'(x)dx$. Focusing on $f$ and not $y=f(x)$, we might write
$$
df = f(x+dx) - f(x) = f'(x) dx.
$$
In the above, $df$ and $dx$ are differentials, made rigorous by a limit, which hides the higher order terms.
We will see all the derivatives encountered so far can be similarly expressed as this last characterization.
### Univariate, vector-valued
For example, when $f: R \rightarrow R^m$ was a vector-valued function the derivative was defined similarly through a limit of $(f(t + \Delta t) - f(t))/{\Delta t}$, where each component needed to have a limit. This can be rewritten through $f(t + dt) - f(t) = f'(t) dt$, again using differentials to avoid the higher order terms.
### Multivariate, scalar-valued
When $f: R^n \rightarrow R$ is a scalar-valued function with vector inputs, differentiability was defined by a gradient existing with $f(c+h) - f(c) - \nabla{f}(c) \cdot h$ being $\mathscr{o}(\|h\|)$. In other words $df = f(c + dh) - f(c) = \nabla{f}(c) \cdot dh$. The gradient has the same shape as $c$, a column vector. If we take the row vector (e.g. $f'(c) = \nabla{f}(c)^T$) then again we see $df = f(c+dh) - f(c) = f'(c) dh$, where the last term uses matrix multiplication of a row vector times a column vector.
### Multivariate, vector-valued
Finally, when $f:R^n \rightarrow R^m$, the Jacobian was defined and characterized by
$\| f(x + dx) - f(x) - J_f(x)dx \|$ being $\mathscr{o}(\|dx\|)$. Again, we can express this through $df = f(x + dx) - f(x) = f'(x)dx$ where $f'(x) = J_f(x)$.
### Vector spaces
The generalization of the derivative involves linear operators which are defined for vector spaces.
A [vector space](https://en.wikipedia.org/wiki/Vector_space) is a set of mathematical objects which can be added together and also multiplied by a scalar. Vectors of similar size, as previously discussed, are the typical example, with vector addition and scalar multiplication already defined. Matrices of similar size (and some subclasses) also form a vector space.
Additionally, many other set of objects form vector spaces. Certain families of functions form examples such as: polynomial functions of degree $n$ or less; continuous functions, or functions with a certain number of derivatives. The last two are infinite dimensional; our focus here is on finite dimensional vector spaces.
Let's take differentiable functions as an example. These form a vector space as the derivative of a linear combination of differentiable functions is defined through the simplest derivative rule: $[af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'$. If $f$ and $g$ are differentiable, then so is $af(x)+bg(x)$.
A finite vector space is described by a *basis*---a minimal set of vectors needed to describe the space, after consideration of linear combinations. For some typical vector spaces, this is the set of special vectors with $1$ as one of the entries, and $0$ otherwise.
A key fact about a basis for a finite vector space is every vector in the vector space can be expressed *uniquely* as a linear combination of the basis vectors. The set of numbers used in the linear combination, along with an order to the basis, means an element in a finite vector space can be associated with a unique coordinate vector.
Vectors and matrices have properties that are generalizations of the real numbers. As vectors and matrices form vector spaces, the concept of addition of vectors and matrices is defined, as is scalar multiplication. Additionally, we have seen:
* The dot product between two vectors of the same length is defined easily ($v\cdot w = \Sigma_i v_i w_i$). It is coupled with the length as $\|v\|^2 = v\cdot v$.
* Matrix multiplication is defined for two properly sized matrices. If $A$ is $m \times k$ and $B$ is $k \times n$ then $AB$ is a $m\times n$ matrix with $(i,j)$ term given by the dot product of the $i$th row of $A$ (viewed as a vector) and the $j$th column of $B$ (viewed as a vector). Matrix multiplication is associative but *not* commutative. (E.g. $(AB)C = A(BC)$ but $AB$ and $BA$ need not be equal, or even defined, as the shapes may not match up.)
* A square matrix $A$ has an *inverse* $A^{-1}$ if $AA^{-1} = A^{-1}A = I$, where $I$ is the identity matrix (a matrix which is zero except on its diagonal entries, which are all $1$). Square matrices may or may not have an inverse. A matrix without an inverse is called *singular*.
* Viewing a vector as a matrix is possible. The association chosen here is common and is through a *column* vector.
* The *transpose* of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so $v\cdot w = v^T w$, where we use a superscript $T$ for the transpose. The transpose of a product, is the product of the transposes---reversed: $(AB)^T = B^T A^T$; the transpose of a transpose is an identity operation: $(A^T)^T = A$; the inverse of a transpose is the transpose of the inverse: $(A^{-1})^T = (A^T)^{-1}$.
* Matrices for which $A = A^T$ are called symmetric.
* The *adjoint* of a matrix is related to the transpose, only complex conjugates are also taken. When a matrix has real components, the adjoint and transpose are identical operations.
* The trace of a square matrix is just the sum of its diagonal terms
* The determinant of a square matrix is involved to compute, but was previously seen to have a relationship to the volume of a certain parallellpiped.
These operations have different inputs and outputs: the determinant and trace take a (square) matrix and return a scalar; the inverse takes a square matrix and returns a square matrix (when defined); the transpose and adjoint take a rectangular matrix and return a rectangular matrix.
In addition to these, there are a few other key operations on matrices described in the following.
### Linear operators
The @BrightEdelmanJohnson notes cover differentiation of functions in this uniform manner extending the form by treating derivatives more generally as *linear operators*.
A [linear operator](https://en.wikipedia.org/wiki/Operator_(mathematics)) is a mathematical object which satisfies
$$
f[\alpha v + \beta w] = \alpha f[v] + \beta f[w].
$$
where the $\alpha$ and $\beta$ are scalars, and $v$ and $w$ come from a *vector space*.
Taking the real numbers as a vector space, then regular multiplication is a linear operation, as $c \cdot (ax + by) = a\cdot(cx) + b\cdot(cy)$ using the distributive and commutative properties.
Taking $n$-dimensional vectors as vector space, matrix multiplication by an $n \times n$ matrix on the left will be a linear operator as $M(av + bw) = a(Mv) + b(Mw)$, using distribution and the commutative properties of scalar multiplication.
We saw differential functions form a vector space, the derivative is a linear operator, as $[af(x) + bg(x)]' = af'(x) + bg'(x)$.
::: {.callout-note appearance="minimal"}
## The use of `[]`
The referenced notes identify $f'(x) dx$ as $f'(x)[dx]$, the latter emphasizing $f'(x)$ acts on $dx$ and the notation is not commutative (e.g., it is not $dx f'(x)$). The use of $[]$ is to indicate that $f'(x)$ "acts" on $dx$ in a linear manner. It may be multiplication, matrix multiplication, or something else. Parentheses are not used which might imply function application or multiplication.
:::
## The derivative as a linear operator
We take the view that a derivative is a linear operator where $df = f(x+dx) + f(x) = f'(x)[dx]$.
In writing $df = f(x + dx) - f(x) = f'(x)[dx]$ generically, some underlying facts are left implicit: $dx$ has the same shape as $x$ (so can be added) and there is an underlying concept of distance and size that allows the above to be made rigorous. This may be an absolute value or a norm.
##### Example: directional derivatives
Suppose $f: R^n \rightarrow R$, a scalar-valued function of a vector. Then the directional derivative at $x$ in the direction $v$ was defined for a scalar $\alpha$ by:
$$
\frac{\partial}{\partial \alpha}f(x + \alpha v) \mid_{\alpha = 0} =
\lim_{\Delta\alpha \rightarrow 0} \frac{f(x + \Delta\alpha v) - f(x)}{\Delta\alpha}.
$$
This rate of change in the direction of $v$ can be expressed through the linear operator $f'(x)$ via
$$
df = f(x + d\alpha v) - f(x) = f'(x) [d\alpha v] = d\alpha f'(x)[v],
$$
using linearity to move the scalar multiplication by $d\alpha$ outside the action of the linear operator. This connects the partial derivative at $x$ in the direction of $v$ with $f'(x)$:
$$
\frac{\partial}{\partial \alpha}f(x + \alpha v) \mid_{\alpha = 0} =
f'(x)[v].
$$
Not only does this give a connection in notation with the derivative, it naturally illustrates how the derivative as a linear operator can act on non-infinitesimal values, in this case on $v$.
Previously, we wrote $\nabla f \cdot v$ for the directional derivative, where the gradient is a column vector.
The above uses the identification $f' = (\nabla f)^T$.
For $f: R^n \rightarrow R$ we have $df = f(x + dx) - f(x) = f'(x) [dx]$ is a scalar, so if $dx$ is a column vector, $f'(x)$ is a row vector with the same number of components (just as $\nabla f$ is a column vector with the same number of components). The operation $f'(x)[dx]$ is just matrix multiplication, which is a linear operation.
##### Example: derivative of a matrix expression
@BrightEdelmanJohnson include this example to show that the computation of derivatives using components can be avoided. Consider $f(x) = x^T A x$ where $x$ is a vector in $R^n$ and $A$ is an $n\times n$ matrix. This type of expression is common.
Then $f: R^n \rightarrow R$ and its derivative can be computed:
$$
\begin{align*}
df &= f(x + dx) - f(x)\\
&= (x + dx)^T A (x + dx) - x^TAx \\
&= \textcolor{blue}{x^TAx} + dx^TA x + x^TAdx + \textcolor{red}{dx^T A dx} - \textcolor{blue}{x^TAx}\\
&= dx^TA x + x^TAdx \\
&= (dx^TAx)^T + x^TAdx \\
&= x^T A^T dx + x^T A dx\\
&= x^T(A^T + A) dx
\end{align*}
$$
The term $dx^t A dx$ is dropped, as it is higher order (goes to zero faster), it containing two $dx$ terms.
In the second to last step, an identity operation (taking the transpose of the scalar quantity) is taken to simplify the algebra. Finally, as $df = f'(x)[dx]$ the identity of $f'(x) = x^T(A^T+A)$ is made, or taking transposes $\nabla f(x) = (A + A^T)x$.
Compare the elegance above with the component version, even though simplified, which still requires a specification of the size to carry the following out:
```{julia}
using SymPy
@syms x[1:3]::real A[1:3, 1:3]::real
u = x' * A * x
grad_u = [diff(u, xi) for xi in x]
```
Compare to the formula for the gradient just derived:
```{julia}
grad_u_1 = (A + A')*x
```
The two are, of course, equal
```{julia}
all(a == b for (a,b) ∈ zip(grad_u, grad_u_1))
```
##### Example: derivative of matrix application
For $f: R^n \rightarrow R^m$, @BrightEdelmanJohnson give an example of computing the Jacobian without resorting to component wise computations. Let $f(x) = Ax$ with $A$ being a $m \times n$ matrix, it follows that
$$
\begin{align*}
df &= f(x + dx) - f(x)\\
&= A(x + dx) - Ax\\
&= Adx\\
&= f'(x)[dx].
\end{align*}
$$
The Jacobian is the linear operator $A$ acting on $dx$. (Seeing that $Adx = f'(x)[dx]$ implies $f'(x)=A$ comes as this action is true for any $dx$, hence the actions must be the same.)
## Differentation rules
Various differentiation rules are still available such as the sum, product, and chain rules.
### Sum and product rules for the derivative
Using the differential notation---which implicitly ignores higher order terms as they vanish in a limit---the sum and product rules can be derived.
For the sum rule, let $f(x) = g(x) + h(x)$. Then
$$
\begin{align*}
df &= f(x + dx) - f(x) \\
&= f'(x)[dx]\\
&= \left(g(x+dx) + h(x+dx)\right) - \left(g(x) + h(x)\right)\\
&= \left(g(x + dx) - g(x)\right) + \left(h(x + dx) - h(x)\right)\\
&= g'(x)[dx] + h'(x)[dx]\\
&= \left(g'(x) + h'(x)\right)[dx]
\end{align*}
$$
Comparing we get $f'(x)[dx] = (g'(x) + h'(x))[dx]$ or $f'(x) = g'(x) + h'(x)$. (The last two lines above show how the new linear operator $g'(x) + h'(x)$ is defined on a value, but adding the application for each.
The sum rule has the same derivation as was done with univariate, scalar functions. Similarly for the product rule.
The product rule for $f(x) = g(x)h(x)$ comes as:
$$
\begin{align*}
df &= f(x + dx) - f(x) \\
&= g(x+dx)h(x + dx) - g(x) h(x)\\
&= \left(g(x) + g'(x)[dx]\right)\left(h(x) + h'(x) [dx]\right) - g(x) h(x) \\
&= \textcolor{blue}{g(x)h(x)} + g'(x) [dx] h(x) + g(x) h'(x) [dx] + \textcolor{red}{g'(x)[dx] h'(x) [dx]} - \textcolor{blue}{g(x) h(x)}\\
&= \left(g'(x)[dx]\right)h(x) + g(x)\left(h'(x) [dx]\right)\\
&= dg h + g dh
\end{align*}
$$
**after** dropping the higher order term and cancelling $gh$ terms of opposite signs in the fourth row.
##### Example
These two rules can be used to directly show the last two examples.
First, if $f(x) = Ax$ and $A$ is a constant, then:
$$
df = (dA)x + A(dx) = 0x + A dx = A dx,
$$
Next, to differentiate $f(x) = x^TAx$:
$$
\begin{align*}
df &= dx^T (Ax) + x^T d(Ax) \\
&= (dx^T (Ax))^T + x^T A dx \\
&= x^T A^T dx + x^T A dx \\
&= x^T(A^T + A) dx
\end{align*}
$$
In the second line the transpose of the scalar quantity $x^TAdx$ it taken to simplify the expression and the first calculation is used.
When $A^T = A$ ($A$ is symmetric) this simplifies to a more familiar looking $2x^TA$, but we see that this requires assumptions not needed in the scalar case.
##### Example
@BrightEdelmanJohnson consider what in `Julia` is `.*`. That is the operation:
$$
v .* w =
\begin{bmatrix}
v_1w_1 \\
v_2w_2 \\
\vdots\\
v_nw_n
\end{bmatrix}
=
\begin{bmatrix}
v_1 & 0 & \cdots & 0 \\
0 & v_2 & \cdots & 0 \\
& & \vdots & \\
0 & 0 & \cdots & v_n
\end{bmatrix}
\begin{bmatrix}
w_1 \\
w_2 \\
\vdots\\
w_n
\end{bmatrix}
= \text{diag}(v) w.
$$
They compute the derivative of $f(x) = A(x .* x)$ for some fixed matrix $A$ of the proper size.
We can see by the product rule that $d (\text{diag}(v)w) = d(\text{diag}(v)) w + \text{diag}(v) dw = (dx) .* w + x .* dw$. So
$df = A(dx .* x + x .* dx) = 2A(x .* dx)$, as $.*$ is commutative by its definition. Writing this as $df = 2A(x .* dx) = 2A(\text{diag}(x) dx) = (2A\text{diag}(x)) dx$, we identify $f'(x) = 2A\text{diag}(x)$.
This operation is called the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) and it extends to matrices and arrays.
::: {.callout-note appearance="minimal"}
## Numerator layout
The Wikipedia page on [matrix calculus](https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions) has numerous such "identities" for derivatives of different common matrix/vector expressions. As vectors are viewed as column vectors; the "numerator layout" identities apply.
:::
### The chain rule
Like the product rule, the chain rule is shown by @BrightEdelmanJohnson in this notation with $f(x) = g(h(x))$:
$$
\begin{align*}
df &= f(x + dx) - f(x)\\
&= g(h(x + dx)) - g(h(x))\\
&= g(h(x) + h'(x)[dx]) - g(h(x))\\
&= g(h(x)) + g'(h(x))[h'(x)[dx]] - g(h(x))\\
&= g'(h(x)) [h'(x) [dx]]\\
&= (g'(h(x)) h'(x)) [dx]
\end{align*}
$$
The operator $f'(x)= g'(h(x)) h'(x)$ is a product of matrices.
### Computational differences with expressions from the chain rule
Of note here is the application of the chain rule to three (or more compositions) where $c:R^n \rightarrow R^j$, $b:R^j \rightarrow R^k$, and $a:R^k \rightarrow R^m$:
If $f(x) = a(b(c(x)))$ then the derivative is:
$$
f'(x) = a'(b(c(x))) b'(c(x)) c'(x),
$$
which can be expressed as three matrix multiplications two ways:
$$
f' = (a'b')c' \text{ or } f' = a'(b'c')
$$
Multiplying left to right (the first) is called reverse mode; multiplying right to left (the second) is called forward mode. The distinction becomes important when considering the computational cost of the multiplications.
* If $f: R^n \rightarrow R^m$ has $n$ much bigger than $1$ and $m=1$, then it is much faster to do left to right multiplication (many more inputs than outputs)
* if $f:R^n \rightarrow R^m$ has $n=1$ and $m$ much bigger than one, the it is faster to do right to left multiplication (many outputs than inputs)
The reason comes down to the shape of the matrices. To see, we need to know that matrix multiplication of an $m \times q$ matrix times a $q \times n$ matrix takes an order of $mqn$ operations.
When $m=1$, the derivative is a product of matrices of size $n\times j$, $j\times k$, and $k \times 1$ yielding a matrix of size $n \times 1$ matching the function dimension.
The operations involved in multiplication from left to right can be quantified. The first operation takes $njk$ operation leaving an $n\times k$ matrix, the next multiplication then takes another $nk1$ operations or $njk + nk$ together.
Whereas computing from the right to left is first $jk1$ operations leaving a $j \times 1$ matrix. The next operation would take another $nk1$ operations. In total:
* left to right is $njk + nk$ = $nk \cdot (j + 1)$.
* right to left is $jk + jn = j\cdot (k+n)$.
When $j=k$, say, we can compare and see the second is a factor less in terms of operations. This can be quite significant in higher dimensions.
##### Example
Using the `BenchmarkTools` package, we can check the time to compute various products:
```{julia}
using BenchmarkTools
n,j,k,m = 20,15,10,1
@btime A*(B*C) setup=(A=rand(n,j); B=rand(j,k); C=rand(k,m));
@btime (A*B)*C setup=(A=rand(n,j); B=rand(j,k); C=rand(k,m));
```
The latter computation is about 1.5 times slower.
Whereas the relationship is changed when the first matrix is skinny and the last is not:
```{julia}
@btime A*(B*C) setup=(A=rand(m,k); B=rand(k,j); C=rand(j,n));
@btime (A*B)*C setup=(A=rand(m,k); B=rand(k,j); C=rand(j,n));
```
----
In calculus, we typically have $n$ and $m$ are $1$, $2$,or $3$. But that need not be the case, especially if differentiation is over a parameter space.
## Derivatives of matrix functions
What is the the derivative of $f(A) = A^2$?
The function $f$ takes a $n\times n$ matrix and returns a matrix of the same size.
This derivative can be derived directly from the *product rule*:
$$
\begin{align*}
df &= d(A^2) = d(AA)\\
&= dA A + A dA
\end{align*}
$$
That is $f'(A)$ is the operator $f'(A)[\delta A] = A \delta A + \delta A A$. (This is not $2A\delta A$, as $A$ may not commute with $\delta A$.)
### Vectorization of a matrix
Alternatively, we can identify $A$ through its
components, as a vector in $R^{n^2}$ and then leverage the Jacobian.
One such identification is vectorization---consecutively stacking the
column vectors into a single vector. In `Julia` the `vec` function does this
operation:
```{julia}
@syms A[1:2, 1:2]
vec(A)
```
The stacking by column follows how `Julia` stores matrices and how `Julia` references entries in a matrix by linear index:
```{julia}
vec(A) == [A[i] for i in eachindex(A)]
```
With this vectorization operation, $f$ may be viewed as
$\tilde{f}:R^{n^2} \rightarrow R^{n^2}$ through:
$$
\tilde{f}(\text{vec}(A)) = \text{vec}(f(A))
$$
We use `SymPy` to compute the Jacobian of this vector valued function.
```{julia}
@syms A[1:3, 1:3]::real
f(x) = x^2
J = vec(f(A)).jacobian(vec(A)) # jacobian of f̃
```
We do this via linear algebra first, then see a more elegant manner following the notes.
A course in linear algebra shows that any linear operator on a finite vector space can be represented as a matrix. The basic idea is to represent what the operator does to each *basis* element and put these values as columns of the matrix.
In this $3 \times 3$ case, the linear operator works on an object with $9$ slots and returns an object with $9$ slots, so the matrix will be $9 \times 9$.
The basis elements are simply the matrices with a $1$ in spot $(i,j)$ and zero elsewhere. Here we generate them through a function:
```{julia}
basis(i,j,A) = (b=zeros(Int, size(A)...); b[i,j] = 1; b)
JJ = [vec(basis(i,j,A)*A + A*basis(i,j,A)) for j in 1:3 for i in 1:3]
```
The elements of `JJ` show the representation of each of the $9$ basis elements under the linear transformation.
To construct the matrix representing the linear operator, we need to concatenate these horizontally as column vectors
```{julia}
JJ = hcat(JJ...)
```
The matrix $JJ$ is identical to $J$, above:
```{julia}
all(j == jj for (j, jj) in zip(J, JJ))
```
### Kronecker products
But how can we see the Jacobian, $J$, from the linear operator $f'(A)[\delta A] = \delta A A + A \delta A$?
To make this less magical, a related operation to `vec` is defined.
The $\text{vec}$ function takes a matrix and stacks its columns.
The $\text{vec}$ function can turn a matrix into a vector, so it can be used for finding the Jacobian, as above. However the shape of the matrix is lost, as are the fundamental matrix operations, like multiplication.
The [Kronecker product](https://en.wikipedia.org/wiki/Kronecker_product) replicates values making a bigger matrix. That is, if $A$ and $B$ are matrices, the Kronecker product replaces each value in $A$ with that value times $B$, making a bigger matrix, as each entry in $A$ is replaced by an entry with size $B$.
Formally,
$$
A \otimes B =
\begin{bmatrix}
a_{11}B & a_{12}B & \cdots & a_{1n}B \\
a_{21}B & a_{22}B & \cdots & a_{2n}B \\
&\vdots & & \\
a_{m1}B & a_{m2}B & \cdots & a_{mn}B
\end{bmatrix}
$$
The function `kron` forms this product:
```{julia}
@syms A[1:2, 1:3] B[1:3, 1:4]
kron(A, B) # same as hcat((vcat((A[i,j]*B for i in 1:2)...) for j in 1:3)...)
```
The $m\times n$ matrix $A$ and $j \times k$ matrix $B$ has a Kronecker product with size $mj \times nk$.
The Kronecker product has a certain algebra, including:
* transposes: $(A \otimes B)^T = A^T \otimes B^T$
* orthogonal: $(A\otimes B)^T = (A\otimes B)$ if both $A$ and $B$ has the same property
* trace (sum of diagonal): $\text{tr}(A \otimes B) = \text{tr}(A)\text{tr}(B)$.* determinants: $\det(A\otimes B) = \det(A)^m \det(B)^n$, where $A$ is $n\times n$, $B$ is $m \times m$.
* inverses: $(A \otimes B)^{-1} = (A^{-1}) \otimes (B^{-1})$
* multiplication: $(A\otimes B)(C \otimes D) = (AC) \otimes (BD)$
The main equation coupling `vec` and `kron` is the fact that if $A$, $B$, and $C$ have appropriate sizes, then:
$$
(A \otimes B) \text{vec}(C) = \text{vec}(B C A^T).
$$
Appropriate sizes for $A$, $B$, and $C$ are determined by the various products in $BCA^T$.
If $A$ is $m \times n$ and $B$ is $r \times s$, then since $BC$ is defined, $C$ has $s$ rows, and since $CA^T$ is defined, $C$ must have $n$ columns, as $A^T$ is $n \times m$, so $C$ must be $s\times n$. Checking this is correct on the other side, $A \times B$ would be size $mr \times ns$ and $\vec{C}$ would be size $sn$, so that product works, size wise.
The referred to notes have an explanation for this formula, but we only confirm it with an example using $m=n=2$ and $r=s=3$:
```{julia}
@syms A[1:2, 1:2]::real B[1:3, 1:3]::real C[1:3, 1:2]::real
L, R = kron(A,B)*vec(C), vec(B*C*A')
all(l == r for (l, r) ∈ zip(L, R))
```
----
Now to use this relationship to recognize $df = A dA + dA A$ with the Jacobian computed from $\text{vec}(f(a))$.
We have $\text{vec}(A dA + dA A) = \text{vec}(A dA) + \text{vec}(dA A)$, by obvious linearity of $\text{vec}$. Now inserting an identity matrix, $I$, which is symmteric, in a useful spot we have:
$$
\text{vec}(A dA) = \text{vec}(A dA I^T) = (I \otimes A) \text{vec}(dA),
$$
and
$$
\text{vec}(dA A) = \text{vec}(I dA (A^T)^T) = (A^T \otimes I) \text{vec}(dA).
$$
This leaves
$$
\text{vec}(A dA + dA A) =
\left((I \otimes A) + (A^T \otimes I)\right) \text{vec}(dA)
$$
We should then get the Jacobian we computed from the following:
```{julia}
@syms A[1:3, 1:3]::real
using LinearAlgebra: I
J = vec(A^2).jacobian(vec(A))
JJ = kron(I(3), A) + kron(A', I(3))
all(j == jj for (j,jj) in zip(J,JJ))
```
This technique can also be used with other powers, say $f(A) = A^3$, where the resulting $df = A^2 dA + A dA A + dA A^2$ is one answer that can be compared to a Jacobian through
$$
\begin{align*}
df &= \text{vec}(A^2 dA I^T) + \text{vec}(A dA A) + \text{vec}(I dA A^2)\\
&= (I \otimes A^2)\text{vec}(dA) + (A^T \otimes A) \text{vec}(dA) + ((A^T)^2 \otimes I) \text{vec}(dA)
\end{align*}
$$
The above shows how to relate the derivative of a matrix function to
the Jacobian of a vectorized function, but only for illustration. It
is certainly not necessary to express the derivative of $f$ in terms of
the derivative of its vectorized counterpart.
##### Example: derivative of the matrix inverse
What is the derivative of $f(A) = A^{-1}$? The same technique used to find the derivative of the inverse of a univariate, scalar-valued function is useful.
Starting with $I = AA^{-1}$ and noting $dI$ is $0$ we have
$$
\begin{align*}
0 &= d(AA^{-1})\\
&= dAA^{-1} + A d(A^{-1})
\end{align*}
$$
So, $d(A^{-1}) = -A^{-1} dA A^{-1}$.
This could be re-expressed as a linear operator through
$$
\text{vec}(dA^{-1}) =
\left((A^{-1})^T \otimes A^{-1}\right) \text{vec}(dA)
= \left((A^T)^{-1} \otimes A^{-1}\right) \text{vec}(dA).
$$
##### Example: derivative of the matrix determinant
Let $f(A) = \text{det}(A)$. What is the derivative?
First, the determinant of a square, $n\times n$, matrix $A$ is a scalar summary of $A$. There are different means to compute the determinant, but this recursive one in particular is helpful here:
$$
\text{det}(A) = a_{1j}C_{1j} + a_{2j}C_{2j} + \cdots a_{nj}C_{nj}
$$
for any $j$. The *cofactor* $C_{ij}$ is the determinant of the $(n-1)\times(n-1)$ matrix with the $i$th row and $j$th column deleted times $(-1)^{i+j}$.
To find the *gradient* of $f$, we differentiate by each of the $A_{ij}$ variables, and so
$$
\frac{\partial\text{det}(A)}{\partial A_{ij}} =
\frac{\partial (a_{1j}C_{1j} + a_{2j}C_{2j} + \cdots a_{nj}C_{nj})}{\partial A_{ij}} =
C_{ij},
$$
as each cofactor in the expansion has no dependence on $A_{ij}$ as the cofactor removes the $i$th row and $j$th column.
So the gradient is the matrix of cofactors.
@BrightEdelmanJohnson also give a different proof, starting with this observation:
$$
\text{det}(I + dA) - \text{det}(I) = \text{tr}(dA).
$$
Assuming that, then by the fact $\text{det}(AB) = \text{det}(A)\text{det}(B)$:
$$
\begin{align*}
\text{det}(A + A(A^{-1}dA)) - \text{det}(A) &= \text{det}(A)\cdot(\text{det}(I+ A^{-1}dA) - \text{det}(I)) \\
&= \text{det}(A) \text{tr}(A^{-1}dA)\\
&= \text{tr}(\text{det}(A)A^{-1}dA).
\end{align*}
$$
This agrees through a formula to compute the inverse of a matrix through its cofactor matrix divided by its determinant.
That the trace gets involved, can be seen from this computation, which shows the only first-order terms are from the diagonal sum:
```{julia}
using LinearAlgebra
@syms dA[1:2, 1:2]
det(I + dA) - det(I)
```
## The adjoint method
The chain rule brings about a series of products. The adjoint method illustrated by @BrightEdelmanJohnson and summarize below, shows how to approach the computation of the series in a direction that minimizes the computational cost, illustrating why reverse mode is preferred to forward mode when a scalar function of several variables is considered.
@BrightEdelmanJohnson consider the derivative of
$$
g(p) = f(A(p)^{-1} b)
$$
This might arise from applying a scalar-valued $f$ to the solution of $Ax = b$, where $A$ is parameterized by $p$. The number of parameters might be quite large, so how the resulting computation is organized might effect the computational costs.
The chain rule gives the following computation to find the derivative (or gradient):
$$
\begin{align*}
dg
&= f'(x)[dx]\\
&= f'(x) [d(A(p)^{-1} b)]\\
&= f'(x)[-A(p)^{-1} dA A(p)^{-1} b + 0]\\
&= -\textcolor{red}{f'(x) A(p)^{-1}} dA\textcolor{blue}{A(p)^{-1}[b]}.
\end{align*}
$$
By setting $v^T = f'(x)A(p)^{-1}$ and writing $x = A(p)^{-1}[b]$ this becomes
$$
dg = -v^T dA x.
$$
This product of three terms can be computed in two directions:
*From left to right:*
First $v$ is found by solving $v^T = f'(x) A^{-1}$ through
the solving of
$v = (A^{-1})^T (f'(x))^T = (A^T)^{-1} \nabla(f)$
or by solving $A^T v = \nabla f$. This is called the *adjoint* equation.
The partial derivatives in $p$ of $g$ are related to each partial derivative of $dA$ through:
$$
\frac{\partial g}{\partial p_k} = -v^T\frac{\partial A}{\partial p_k} x,
$$
as the scalar factor commutes through. With $v$ and $x$ solved for (via the adjoint equation and from solving $Ax=b$) the partials in $p_k$ are computed with dot products. There are just two costly operations.
*From right to left:*
The value of $x$ can be solved for, as above, but computing the value of
$$
\frac{\partial g}{\partial p_k} =
-f'(x) \left(A^{-1} \frac{\partial A}{\partial p_k} x \right)
$$
requires a costly solve of $A^{-1}\frac{\partial A}{\partial p_k} x$ for each $p_k$, and $p$ may have many components. This is the difference: left to right only has the solve of the one adjoint equation.
As mentioned above, the reverse mode offers advantages when there are many input parameters ($p$) and a single output parameter.
##### Example
Suppose $x(p)$ solves some system of equations $h(x(p),p) = 0$ in $R^n$ ($n$ possibly just $1$) and $g(p) = f(x(p))$ is some non-linear transformation of $x$. What is the derivative of $g$ in $p$?
Suppose the *implicit function theorem* applies to $h(x,p) = 0$, that is *locally* the response $x(p)$ has a derivative, and moreover by the chain rule
$$
0 = \frac{\partial h}{\partial p} dp + \frac{\partial h}{\partial x} dx.
$$
Solving the above for $dx$ gives:
$$
dx = -\left(\frac{\partial h}{\partial x}\right)^{-1} \frac{\partial h}{\partial p} dp.
$$
The chain rule applied to $g(p) = f(x(p))$ then yields
$$
dg = f'(x) dx = - f'(x) \left(\frac{\partial h}{\partial x}\right)^{-1} \frac{\partial h}{\partial p} dp = -v^T\frac{\partial h}{\partial p} dp,
$$
by setting
$$
v^T = f'(x) \left(\frac{\partial h}{\partial x}\right)^{-1}.
$$
Here $v$ can be solved for by taking adjoints (as before). Let $A = \partial h/\partial x$, then $v^T = f'(x) A^{-1}$ or $v = (A^{-1})^T (f'(x))^t= (A^T)^{-1} \nabla f$. That is $v$ solves $A^Tv=\nabla f$. As before it would take two solves to get both $g$ and its gradient.
## Second derivatives, Hessian
We reference a theorem presented by @CarlssonNikitinTroedssonWendt for exposition with some modification
::: {.callout-note appearance="minimal"}
Theorem 1. Let $f:X \rightarrow Y$, where $X,Y$ are finite dimensional *inner product* spaces with elements in $R$. Suppose $f$ is smooth (a certain number of derivatives). Then for each $x$ in $X$ there exists a unique linear operator, $f'(x)$, and a unique *bilinear* *symmetric* operator $f'': X \oplus X \rightarrow Y$ such that
$$
f(x + \delta x) = f(x) + f'(x)[\delta x] + \frac{1}{2}f''(x)[\delta x, \delta x] + \mathscr(||\delta x ||^2).
$$
:::
New terms include *bilinear*, *symmetric*, and *inner product*. An operator ($X\oplus X \rightarrow Y$) is bilinear if it is a linear operator in each of its two arguments. Such an operator is *symmetric* if interchanging its two arguments makes no difference in its output. Finally, an *inner product* space is one with a generalization of the dot product. An inner product takes two vectors $x$ and $y$ and returns a scalar; it is denoted $\langle x,y\rangle$; and has properties of symmetry, linearity, and non-negativity ($\langle x,x\rangle \geq 0$, and equal $0$ only if $x$ is the zero vector.) Inner products can be used to form a norm (or length) for a vector through $||x||^2 = \langle x,x\rangle$.
We reference this, as the values denoted $f'$ and $f''$ are *unique*. So if we identify them one way, we have identified them.
Specializing to $X=R^n$ and $Y=R^1$, we have, $f'=\nabla f^T$ and $f''$ is the Hessian.
Take $n=2$. Previously we wrote a formula for Taylor's theorem for $f:R^n \rightarrow R$ that with $n=2$ has with $x=\langle x_1,x_2\rangle$:
$$
\begin{align*}
f(x + dx) &= f(x) +
\frac{\partial f}{\partial x_1} dx_1 + \frac{\partial f}{\partial x_2} dx_2\\
&{+} \frac{1}{2}\left(
\frac{\partial^2 f}{\partial x_1^2}dx_1^2 +
\frac{\partial^2 f}{\partial x_1 \partial x_2}dx_1dx_2 +
\frac{\partial^2 f}{\partial x_2^2}dx_2^2
\right) + \mathscr{o}(dx).
\end{align*}
$$
We can see $\nabla{f} \cdot dx = f'(x) dx$ to tidy up part of the first line, and more over the second line can be seen to be a matrix product:
$$
[dx_1 dx_2]
\begin{bmatrix}
\frac{\partial^2 f}{\partial x_1^2} &
\frac{\partial^2 f}{\partial x_1 \partial x_2}\\
\frac{\partial^2 f}{\partial x_2 \partial x_1} &
\frac{\partial^2 f}{\partial x_2^2}
\end{bmatrix}
\begin{bmatrix}
dx_1\\
dx_2
\end{bmatrix}
= dx^T H dx,
$$
$H$ being the *Hessian* with entries $H_{ij} = \frac{\partial f}{\partial x_i \partial x_j}$.
This formula---$f(x+dx)-f(x) \approx f'(x)dx + dx^T H dx$---is valid for any $n$, showing $n=2$ was just for ease of notation when expressing in the coordinates and not as matrices.
By uniqueness, we have under these assumptions that the Hessian is *symmetric* and the expression $dx^T H dx$ is a *bilinear* form, which we can identify as $f''(x)[dx,dx]$.
That the Hessian is symmetric could also be derived under these assumptions by directly computing that the mixed partials can have their order exchanged. But in this framework, as explained by @BrightEdelmanJohnson (and shown later) it is a result of the underlying vector space having an addition that is commutative (e.g. $u+v = v+u$).
The mapping $(u,v) \rightarrow u^T A v$ for a matrix $A$ is bilinear. For a fixed $u$, it is linear as it can be viewed as $(u^TA)[v]$ and matrix multiplication is linear. Similarly for a fixed $v$.
@BrightEdelmanJohnson extend this characterization to a broader setting.
We have for some function $f$
$$
df = f(x + dx) - f(x) = f'(x)[dx]
$$
Then if $d\tilde{x}$ is another differential change with the same shape as $x$ we can look at the differential of $f'(x)$:
$$
d(f') = f'(x + d\tilde{x}) - f'(x) = f''(x)[d\tilde{x}]
$$
Now, $d(f')$ has the same shape as $f'$, a linear operator, hence $d(f')$ is also a linear operator. Acting on $dx$, we have
$$
d(f')[dx] = f''(x)[d\tilde{x}][dx] = f''(x)[d\tilde{x}, dx].
$$
The last equality a definition. As $f''$ is linear in the the application to $d\tilde{x}$ and also linear in application to $dx$, $f''(x)$ is a bilinear operator.
Moreover, the following shows it is *symmetric*:
$$
\begin{align*}
f''(x)[d\tilde{x}][dx] &= (f'(x + d\tilde{x}) - f'(x))[dx]\\
&= f'(x + d\tilde{x})[dx] - f'(x)[dx]\\
&= (f(x + d\tilde{x} + dx) - f(x + d\tilde{x})) - (f(x+dx) - f(x))\\
&= (f(x + dx + d\tilde{x}) - f(x + dx)) - (f(x + d\tilde{x}) - f(x))\\
&= f'(x + dx)[d\tilde{x}] - f'(x)[d\tilde{x}]\\
&= f''(x)[dx][d\tilde{x}]
\end{align*}
$$
So $f''(x)[d\tilde{x},dx] = f''(x)[dx, d\tilde{x}]$. The key is the commutivity of vector addition to say $dx + d\tilde{x} = d\tilde{x} + dx$ in the third line.
##### Example: Hessian is symmetric
As mentioned earlier, the Hessian is the matrix arising from finding the second derivative of a multivariate, scalar-valued function $f:R^n \rightarrow R$. As a bilinear form on a finite vector space, it can be written as $\tilde{x}^T A x$. As this second derivative is symmetric, and this value above a scalar, it follows that $\tilde{x}^T A x = \tilde{x}^T A^T x$. That is $H = A$ must also be symmetric from general principles.
##### Example: second derivative of $x^TAx$
Consider an expression from earlier $f(x) = x^T A x$ for some constant $A$.
We have seen that $f' = (\nabla f)^T = x^T(A+A^T)$. That is $\nabla f = (A^T+A)x$ is linear in $x$. The Jacobian of $\nabla f$ is the Hessian, $H = f'' = A + A^T$.
##### Example: second derivative of $\text{det}(A)$
Consider $f(A) = \text{det}(A)$. We saw previously that:
$$
\begin{align*}
\text{tr}(A + B) &= \text{tr}(A) + \text{tr}(B)\\
\text{det}(A + dA') &= \text{det}(A) + \text{det}(A)\text{tr}(A^{-1}dA')\\
(A + dA') &= A^{-1} - A^{-1} dA' A^{-1}
\end{align*}
$$
These are all used to simplify:
$$
\begin{align*}
\text{det}(A+dA')&\text{tr}((A + dA')^{-1} dA) - \text{det}(A) \text{tr}(A^{-1}dA) \\
&= \left(
\text{det}(A) + \text{det}(A)\text{tr}(A^{-1}dA')
\right)
\text{tr}((A^{-1} - A^{-1}dA' A^{-1})dA)\\
&\quad{-} \text{det}(A) \text{tr}(A^{-1}dA) \\
&=
\textcolor{blue}{\text{det}(A) \text{tr}(A^{-1}dA)}\\
&\quad{+} \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA) \\
&\quad{-} \text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA)\\
&\quad{-} \textcolor{red}{\text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA' A^{-1}dA)}\\
&\quad{-} \textcolor{blue}{\text{det}(A) \text{tr}(A^{-1}dA)} \\
&= \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA) - \text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA)\\
&\quad{+} \textcolor{red}{\text{third order term}}
\end{align*}
$$
So, after dropping the third-order term, we see:
$$
f''(A)[dA,dA']
= \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA) -
\text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA).
$$

View File

@@ -1,4 +1,4 @@
# Polar Coordinates and Curves
# Polar coordinates and curves
{{< include ../_common_code.qmd >}}
@@ -199,7 +199,7 @@ r(theta) = a * sin(k * theta)
plot_polar(0..pi, r)
```
This graph has radius $0$ whenever $\sin(k\theta) = 0$ or $k\theta =n\pi$. Solving means that it is $0$ at integer multiples of $\pi/k$. In the above, with $k=5$, there will $5$ zeroes in $[0,\pi]$. The entire curve is traced out over this interval, the values from $\pi$ to $2\pi$ yield negative value of $r$, so are related to values within $0$ to $\pi$ via the relation $(r,\pi +\theta) = (-r, \theta)$.
This graph has radius $0$ whenever $\sin(k\theta) = 0$ or $k\theta =n\pi$. Solving means that it is $0$ at integer multiples of $\pi/k$. In the above, with $k=5$, there will $6$ zeroes in $[0,\pi]$. The entire curve is traced out over this interval, the values from $\pi$ to $2\pi$ yield negative value of $r$, so are related to values within $0$ to $\pi$ via the relation $(r,\pi +\theta) = (-r, \theta)$.
##### Example
@@ -226,7 +226,7 @@ The folium has radial part $0$ when $\cos(\theta) = 0$ or $\sin(2\theta) = b/4a$
plot_polar(𝒂0..(pi/2-𝒂0), 𝒓)
```
The second - which is too small to appear in the initial plot without zooming in - with
The second---which is too small to appear in the initial plot without zooming in---with
```{julia}
@@ -321,10 +321,15 @@ As well, see this part of a [Wikipedia](http://en.wikipedia.org/wiki/Polar_coord
Imagine we have $a < b$ and a partition $a=t_0 < t_1 < \cdots < t_n = b$. Let $\phi_i = (1/2)(t_{i-1} + t_{i})$ be the midpoint. Then the wedge of radius $r(\phi_i)$ with angle between $t_{i-1}$ and $t_i$ will have area $\pi r(\phi_i)^2 (t_i-t_{i-1}) / (2\pi) = (1/2) r(\phi_i)^2(t_i-t_{i-1})$, the ratio $(t_i-t_{i-1}) / (2\pi)$ being the angle to the total angle of a circle. Summing the area of these wedges over the partition gives a Riemann sum approximation for the integral $(1/2)\int_a^b r(\theta)^2 d\theta$. This limit of this sum defines the area in polar coordinates.
::: {.callout-note icon=false}
## Area of polar regions
> *Area of polar regions*. Let $R$ denote the region bounded by the curve $r(\theta)$ and bounded by the rays $\theta=a$ and $\theta=b$ with $b-a \leq 2\pi$, then the area of $R$ is given by:
>
> $A = \frac{1}{2}\int_a^b r(\theta)^2 d\theta.$
Let $R$ denote the region bounded by the curve $r(\theta)$ and bounded by the rays $\theta=a$ and $\theta=b$ with $b-a \leq 2\pi$, then the area of $R$ is given by:
$$
A = \frac{1}{2}\int_a^b r(\theta)^2 d\theta.
$$
:::
@@ -412,18 +417,19 @@ The answer is the difference:
The length of the arc traced by a polar graph can also be expressed using an integral. Again, we partition the interval $[a,b]$ and consider the wedge from $(r(t_{i-1}), t_{i-1})$ to $(r(t_i), t_i)$. The curve this wedge approximates will have its arc length approximated by the line segment connecting the points. Expressing the points in Cartesian coordinates and simplifying gives the distance squared as:
$$
\begin{align*}
d_i^2 &= (r(t_i) \cos(t_i) - r(t_{i-1})\cos(t_{i-1}))^2 + (r(t_i) \sin(t_i) - r(t_{i-1})\sin(t_{i-1}))^2\\
&= r(t_i)^2 - 2r(t_i)r(t_{i-1}) \cos(t_i - t_{i-1}) + r(t_{i-1})^2 \\
&\approx r(t_i)^2 - 2r(t_i)r(t_{i-1}) (1 - \frac{(t_i - t_{i-1})^2}{2})+ r(t_{i-1})^2 \quad(\text{as} \cos(x) \approx 1 - x^2/2)\\
&= (r(t_i) - r(t_{i-1}))^2 + r(t_i)r(t_{i-1}) (t_i - t_{i-1})^2.
\end{align*}
$$
As was done with arc length we multiply $d_i$ by $(t_i - t_{i-1})/(t_i - t_{i-1})$ and move the bottom factor under the square root:
$$
\begin{align*}
d_i
&= d_i \frac{t_i - t_{i-1}}{t_i - t_{i-1}} \\
@@ -431,13 +437,19 @@ d_i
\frac{r(t_i)r(t_{i-1}) (t_i - t_{i-1})^2}{(t_i - t_{i-1})^2}} \cdot (t_i - t_{i-1})\\
&= \sqrt{(r'(\xi_i))^2 + r(t_i)r(t_{i-1})} \cdot (t_i - t_{i-1}).\quad(\text{the mean value theorem})
\end{align*}
$$
Adding the approximations to the $d_i$ looks like a Riemann sum approximation to the integral $\int_a^b \sqrt{(r'(\theta)^2) + r(\theta)^2} d\theta$ (with the extension to the Riemann sum formula needed to derive the arc length for a parameterized curve). That is:
::: {.callout-note icon=false}
## Arc length of a polar curve
> *Arc length of a polar curve*. The arc length of the curve described in polar coordinates by $r(\theta)$ for $a \leq \theta \leq b$ is given by:
>
> $\int_a^b \sqrt{r'(\theta)^2 + r(\theta)^2} d\theta.$
The arc length of the curve described in polar coordinates by $r(\theta)$ for $a \leq \theta \leq b$ is given by:
$$
\int_a^b \sqrt{r'(\theta)^2 + r(\theta)^2} d\theta.
$$
:::

View File

@@ -15,6 +15,7 @@ using SymPy
using Roots
using QuadGK
using JSON
using ScatteredInterpolation
```
Also, these methods from the `Contour` package:
@@ -36,12 +37,13 @@ nothing
Consider a function $f: R^n \rightarrow R$. It has multiple arguments for its input (an $x_1, x_2, \dots, x_n$) and only one, *scalar*, value for an output. Some simple examples might be:
$$
\begin{align*}
f(x,y) &= x^2 + y^2\\
g(x,y) &= x \cdot y\\
h(x,y) &= \sin(x) \cdot \sin(y)
\end{align*}
$$
For two examples from real life consider the elevation Point Query Service (of the [USGS](https://nationalmap.gov/epqs/)) returns the elevation in international feet or meters for a specific latitude/longitude within the United States. The longitude can be associated to an $x$ coordinate, the latitude to a $y$ coordinate, and the elevation a $z$ coordinate, and as long as the region is small enough, the $x$-$y$ coordinates can be thought to lie on a plane. (A flat earth assumption.)
@@ -123,7 +125,7 @@ Then we can define an alternative method with just a single variable and use spl
j(v) = j(v...)
```
The we can call `j` with a vector or point:
Then we can call `j` with a vector or point:
```{julia}
@@ -172,7 +174,7 @@ surface(xs, ys, 𝒇)
The `surface` function will generate the surface.
:::{.callout-note}
::: {.callout-note}
## Note
Using `surface` as a function name is equivalent to `plot(xs, ys, f, seriestype=:surface)`.
@@ -386,7 +388,7 @@ For a scalar function, Define a *level curve* as the solutions to the equations
contour(xsₛ, ysₛ, zzsₛ)
```
Were one to walk along one of the contour lines, then there would be no change in elevation. The areas of greatest change in elevation - basically the hills - occur where the different contour lines are closest. In this particular area, there is a river that runs from the upper right through to the lower left and this is flanked by hills.
Were one to walk along one of the contour lines, then there would be no change in elevation. The areas of greatest change in elevation---basically the hills--- occur where the different contour lines are closest. In this particular area, there is a river that runs from the upper right through to the lower left and this is flanked by hills.
The $c$ values for the levels drawn may be specified through the `levels` argument:
@@ -523,12 +525,97 @@ The filled contour layers on the contour lines to a heatmap:
```{julia}
#| hold: true
f(x,y) = exp(-(x^2 + y^2)/5) * sin(x) * cos(y)
xs= ys = range(-pi, pi, length=100)
xs = ys = range(-pi, pi, length=100)
contourf(xs, ys, f)
```
This function has a prominent peak and a prominent valley, around the middle of the viewing window. The nested contour lines indicate this, and the color key can be used to identify which is the peak and which the valley.
##### Example
The description of a function for the contour function is in terms of a grid of $x-y$ values and a function $f$ which gives the height, $z$. In other situations, it might make more sense to have a stream of $x-y-z$ values describing a surface. This might be the case, say, with trying to piece together a topography using a series of GPS track. To do so, one way is to take a regular grid of points and then *interpolate* $z$ values from the existing values.
The `ScatteredInterpolation.jl` package can be used to create a structure that can be used to interpolate points. The necessary pieces are the points, the sampled heights, and a method for the interpolation.
A simple example follows (inspired by a [discourse post](https://discourse.julialang.org/t/plots-contourf-and-plotlyjs-contour-behaviour-with-regard-to-the-x-y-z-input/122897/2)) where the true surface is known so that a comparison can be made is given. The two figures show the contour described by 4 paths through the space, is not as detailed but captures the general shape reasonably well.
```{julia}
f(x,y) = 3*(1-x)^2*exp(-(x^2) - (y+1)^2) -
10*(x/5 - x^3 - y^5)*exp(-x^2-y^2) -
1/3*exp(-(x+1)^2 - y^2)
r(t, a=2) = [a*cbrt(sinpi(t)), a * cbrt(cospi(t))]
ts = range(0, 2, 30)[1:end-1]
pts = vcat([[r(t,a) for t in ts] for a in [1/2, 1, 3/2, 2]]...)
samples = [f(pt...) for pt in pts]
first(zip(pts, samples),5)
```
```{julia}
using ScatteredInterpolation
itp = interpolate(Multiquadratic(), stack(pts), samples)
# make a grid
(xm,xM), (ym,yM) = extrema.(eachrow(stack(pts)))
n, m = 25, 40
xg, yg = range(xm,xM,n), range(ym, yM, m)
X = [s for t in yg, s in xg] #size(X) is (m,n)
Y = [t for t in yg, s in xg] # size(Y) is also (m,n)
gridP = stack(vec([[x, y] for (x,y) in zip(X, Y)]))
gridP = stack(vec([[x, y] for (x,y) in zip(X, Y)])) #2 x m*n - matrix; each column is a grid point
interpolated = evaluate(itp, gridP)
zg = reshape(interpolated, m, n)
p = Plots.contourf(xg, yg, zg; levels=6)
q = Plots.contourf(xg, yg, f)
plot(p,q)
```
##### Example
```{julia}
using Plots
```
##### Example
The arrangement of the data in a heatmap or contour plot depends on the underlying plotting package. A [discourse post](https://discourse.julialang.org/t/wrong-heatmap-orientation-with-plots-jl/124822/9) used this example to illustrate.
The data is a matrix
```{julia}
xy = [1 2; 3 4]
```
which has these colors mapped to their values:
```{julia}
cmap =[:red, :green, :blue, :orange]
cmap[xy]
```
@fig-plots-makie-heatmap shows on the left the image created by this command in `Plots`, and on the right the image created with the same command using `Makie`:
```{julia}
#| eval: false
heatmap(xy; colormap = cols, title="Plots", legend=false)
```
```{julia}
#| echo: false
#| layout-ncol: 2
#| label: fig-plots-makie-heatmap
#| fig-cap: "Orientation of heatmap may vary by plotting package."
p = heatmap(xy; colormap = cmap, title="Plots",
legend=false)
q = heatmap(xy'; colormap = cmap, title="Makie",
legend=false)
plot(p,q,layout=(1,2))
```
`Makie` uses the first dimension for `x` and the second for `y` (the first dimension is down the columns, then across); and `Plots` plots `x` values on the `x` axis etc. with values rising upwards and towards the left.
## Limits
@@ -549,7 +636,7 @@ This says, informally, for any scale about $L$ there is a "ball" about $C$ (not
In the univariate case, it can be useful to characterize a limit at $x=c$ existing if *both* the left and right limits exist and the two are equal. Generalizing to getting close in $R^m$ leads to the intuitive idea of a limit existing in terms of any continuous "path" that approaches $C$ in the $x$-$y$ plane has a limit and all are equal. Let $\gamma$ describe the path, and $\lim_{s \rightarrow t}\gamma(s) = C$. Then $f \circ \gamma$ will be a univariate function. If there is a limit, $L$, then this composition will also have the same limit as $s \rightarrow t$. Conversely, if for *every* path this composition has the *same* limit, then $f$ will have a limit.
The "two path corollary" is a trick to show a limit does not exist - just find two paths where there is a limit, but they differ, then a limit does not exist in general.
The "two path corollary" is a trick to show a limit does not exist---just find two paths where there is a limit, but they differ, then a limit does not exist in general.
### Continuity of scalar functions
@@ -631,23 +718,25 @@ Before answering this, we discuss *directional* derivatives along the simplified
If we compose $f \circ \vec\gamma_x$, we can visualize this as a curve on the surface from $f$ that moves in the $x$-$y$ plane along the line $y=c$. The derivative of this curve will satisfy:
$$
\begin{align*}
(f \circ \vec\gamma_x)'(x) &=
\lim_{t \rightarrow x} \frac{(f\circ\vec\gamma_x)(t) - (f\circ\vec\gamma_x)(x)}{t-x}\\
&= \lim_{t\rightarrow x} \frac{f(t, c) - f(x,c)}{t-x}\\
&= \lim_{h \rightarrow 0} \frac{f(x+h, c) - f(x, c)}{h}.
\end{align*}
$$
The latter expresses this to be the derivative of the function that holds the $y$ value fixed, but lets the $x$ value vary. It is the rate of change in the $x$ direction. There is special notation for this:
$$
\begin{align*}
\frac{\partial f(x,y)}{\partial x} &=
\lim_{h \rightarrow 0} \frac{f(x+h, y) - f(x, y)}{h},\quad\text{and analogously}\\
\frac{\partial f(x,y)}{\partial y} &=
\lim_{h \rightarrow 0} \frac{f(x, y+h) - f(x, y)}{h}.
\end{align*}
$$
These are called the *partial* derivatives of $f$. The symbol $\partial$, read as "partial", is reminiscent of "$d$", but indicates the derivative is only in a given direction. Other notations exist for this:
@@ -685,11 +774,12 @@ Let $f(x,y) = x^2 - 2xy$, then to compute the partials, we just treat the other
Then
$$
\begin{align*}
\frac{\partial (x^2 - 2xy)}{\partial x} &= 2x - 2y\\
\frac{\partial (x^2 - 2xy)}{\partial y} &= 0 - 2x = -2x.
\end{align*}
$$
Combining, gives $\nabla{f} = \langle 2x -2y, -2x \rangle$.
@@ -697,12 +787,13 @@ Combining, gives $\nabla{f} = \langle 2x -2y, -2x \rangle$.
If $g(x,y,z) = \sin(x) + z\cos(y)$, then
$$
\begin{align*}
\frac{\partial g }{\partial x} &= \cos(x) + 0 = \cos(x),\\
\frac{\partial g }{\partial y} &= 0 + z(-\sin(y)) = -z\sin(y),\\
\frac{\partial g }{\partial z} &= 0 + \cos(y) = \cos(y).
\end{align*}
$$
Combining, gives $\nabla{g} = \langle \cos(x), -z\sin(y), \cos(y) \rangle$.
@@ -780,7 +871,7 @@ Another alternative would be to hold one variable constant, and use the `derivat
partial_x(f, y) = x -> ForwardDiff.derivative(u -> f(u,y), x)
```
:::{.callout-note}
::: {.callout-note}
## Note
For vector-valued functions, we can override the syntax `'` using `Base.adjoint`, as `'` is treated as a postfix operator in `Julia` for the `adjoint` operation. The symbol `\\nabla` is also available in `Julia`, but it is not an operator, so can't be used as mathematically written `∇f` (this could be used as a name though). In `CalculusWithJulia` a definition is made so essentially `∇(f) = x -> ForwardDiff.gradient(f, x)`. It does require parentheses to be called, as in `∇(f)`.
@@ -906,7 +997,7 @@ The figure suggests a potential geometric relationship between the gradient and
We see here how the gradient of $f$, $\nabla{f} = \langle f_{x_1}, f_{x_2}, \dots, f_{x_n} \rangle$, plays a similar role as the derivative does for univariate functions.
First, we consider the role of the derivative for univariate functions. The main characterization - the derivative is the slope of the line that best approximates the function at a point - is quantified by Taylor's theorem. For a function $f$ with a continuous second derivative:
First, we consider the role of the derivative for univariate functions. The main characterization---the derivative is the slope of the line that best approximates the function at a point---is quantified by Taylor's theorem. For a function $f$ with a continuous second derivative:
$$
@@ -938,12 +1029,17 @@ where $\epsilon(h) \rightarrow 0$ as $h \rightarrow 0$.
It is this characterization of differentiable that is generalized to define when a scalar function is *differentiable*.
::: {.callout-note icon=false}
## Differentiable
> *Differentiable*: Let $f$ be a scalar function. Then $f$ is [differentiable](https://tinyurl.com/qj8qcbb) at a point $C$ **if** the first order partial derivatives exist at $C$ **and** for $\vec{h}$ going to $\vec{0}$:
>
> $\|f(C + \vec{h}) - f(C) - \nabla{f}(C) \cdot \vec{h}\| = \mathcal{o}(\|\vec{h}\|),$
>
> where $\mathcal{o}(\|\vec{h}\|)$ means that dividing the left hand side by $\|\vec{h}\|$ and taking a limit as $\vec{h}\rightarrow 0$ the limit will be $0$.
Let $f$ be a scalar function. Then $f$ is [differentiable](https://tinyurl.com/qj8qcbb) at a point $C$ **if** the first order partial derivatives exist at $C$ **and** for $\vec{h}$ going to $\vec{0}$:
$$
\|f(C + \vec{h}) - f(C) - \nabla{f}(C) \cdot \vec{h}\| = \mathcal{o}(\|\vec{h}\|),
$$
where $\mathcal{o}(\|\vec{h}\|)$ means that dividing the left hand side by $\|\vec{h}\|$ and taking a limit as $\vec{h}\rightarrow 0$ the limit will be $0$.
:::
@@ -962,8 +1058,12 @@ Later we will see how Taylor's theorem generalizes for scalar functions and inte
In finding a partial derivative, we restricted the surface along a curve in the $x$-$y$ plane, in this case the curve $\vec{\gamma}(t)=\langle t, c\rangle$. In general if we have a curve in the $x$-$y$ plane, $\vec{\gamma}(t)$, we can compose the scalar function $f$ with $\vec{\gamma}$ to create a univariate function. If the functions are "smooth" then this composed function should have a derivative, and some version of a "chain rule" should provide a means to compute the derivative in terms of the "derivative" of $f$ (the gradient) and the derivative of $\vec{\gamma}$ ($\vec{\gamma}'$).
> *Chain rule*: Suppose $f$ is *differentiable* at $C$, and $\vec{\gamma}(t)$ is differentiable at $c$ with $\vec{\gamma}(c) = C$. Then $f\circ\vec{\gamma}$ is differentiable at $c$ with derivative $\nabla f(\vec{\gamma}(c)) \cdot \vec{\gamma}'(c)$.
::: {.callout-note icon=false}
## Chain rule
Suppose $f$ is *differentiable* at $C$, and $\vec{\gamma}(t)$ is differentiable at $c$ with $\vec{\gamma}(c) = C$. Then $f\circ\vec{\gamma}$ is differentiable at $c$ with derivative $\nabla f(\vec{\gamma}(c)) \cdot \vec{\gamma}'(c)$.
:::
This is similar to the chain rule for univariate functions $(f\circ g)'(u) = f'(g(u)) g'(u)$ or $df/dx = df/du \cdot du/dx$. However, when we write out in components there are more terms. For example, for $n=2$ we have with $\vec{\gamma} = \langle x(t), y(t) \rangle$:
@@ -1074,7 +1174,7 @@ atand(mean(slopes))
Which seems about right for a generally uphill trail section, as this is.
In the above example, the data is given in terms of a sample, not a functional representation. Suppose instead, the surface was generated by `f` and the path - in the $x$-$y$ plane - by $\gamma$. Then we could estimate the maximum and average steepness by a process like this:
In the above example, the data is given in terms of a sample, not a functional representation. Suppose instead, the surface was generated by `f` and the path---in the $x$-$y$ plane---by $\gamma$. Then we could estimate the maximum and average steepness by a process like this:
```{julia}
@@ -1217,7 +1317,10 @@ Let $f(x,y) = \sin(x+2y)$ and $\vec{v} = \langle 2, 1\rangle$. The directional d
$$
\nabla{f}\cdot \frac{\vec{v}}{\|\vec{v}\|} = \langle \cos(x + 2y), 2\cos(x + 2y)\rangle \cdot \frac{\langle 2, 1 \rangle}{\sqrt{5}} = \frac{4}{\sqrt{5}} \cos(x + 2y).
\nabla{f}\cdot \frac{\vec{v}}{\|\vec{v}\|} =
\langle \cos(x + 2y), 2\cos(x + 2y)\rangle \cdot
\frac{(\langle 2, 1 \rangle)}{\sqrt{5}} =
\frac{4}{\sqrt{5}} \cos(x + 2y).
$$
##### Example
@@ -1408,17 +1511,18 @@ Let $f(x,y) = x^2 + y^2$ be a scalar function. We have if $G(r, \theta) = \langl
Were this computed through the chain rule, we have:
$$
\begin{align*}
\nabla G_1 &= \langle \frac{\partial r\cos(\theta)}{\partial r}, \frac{\partial r\cos(\theta)}{\partial \theta} \rangle=
\langle \cos(\theta), -r \sin(\theta) \rangle,\\
\nabla G_2 &= \langle \frac{\partial r\sin(\theta)}{\partial r}, \frac{\partial r\sin(\theta)}{\partial \theta} \rangle=
\langle \sin(\theta), r \cos(\theta) \rangle.
\end{align*}
$$
We have $\partial f/\partial x = 2x$ and $\partial f/\partial y = 2y$, which at $G$ are $2r\cos(\theta)$ and $2r\sin(\theta)$, so by the chain rule, we should have
$$
\begin{align*}
\frac{\partial (f\circ G)}{\partial r} &=
\frac{\partial{f}}{\partial{x}}\frac{\partial G_1}{\partial r} +
@@ -1430,6 +1534,7 @@ We have $\partial f/\partial x = 2x$ and $\partial f/\partial y = 2y$, which at
\frac{\partial f}{\partial y}\frac{\partial G_2}{\partial \theta} =
2r\cos(\theta)(-r\sin(\theta)) + 2r\sin(\theta)(r\cos(\theta)) = 0.
\end{align*}
$$
## Higher order partial derivatives
@@ -1467,9 +1572,11 @@ In `SymPy` the variable to differentiate by is taken from left to right, so `dif
We see that `diff(ex, x, y)` and `diff(ex, y, x)` are identical. This is not a coincidence, as by [Schwarz's Theorem](https://tinyurl.com/y7sfw9sx) (also known as Clairaut's theorem) this will always be the case under typical assumptions:
::: {.callout-note icon=false}
## Theorem on mixed partials
> Theorem on mixed partials. If the mixed partials $\partial^2 f/\partial x \partial y$ and $\partial^2 f/\partial y \partial x$ exist and are continuous, then they are equal.
If the mixed partials $\partial^2 f/\partial x \partial y$ and $\partial^2 f/\partial y \partial x$ exist and are continuous, then they are equal.
:::
For higher order mixed partials, something similar to Schwarz's theorem still holds. Say $f:R^n \rightarrow R$ is $C^k$ if $f$ is continuous and all partial derivatives of order $j \leq k$ are continuous. If $f$ is $C^k$, and $k=k_1+k_2+\cdots+k_n$ ($k_i \geq 0$) then

View File

@@ -341,11 +341,12 @@ The level curve $f(x,y)=0$ and the level curve $g(x,y)=0$ may intersect. Solving
To elaborate, consider two linear equations written in a general form:
$$
\begin{align*}
ax + by &= u\\
cx + dy &= v
\end{align*}
$$
A method to solve this by hand would be to solve for $y$ from one equation, replace this expression into the second equation and then solve for $x$. From there, $y$ can be found. A more advanced method expresses the problem in a matrix formulation of the form $Mx=b$ and solves that equation. This form of solving is implemented in `Julia`, through the "backslash" operator. Here is the general solution:
@@ -422,21 +423,23 @@ We look to find the intersection point near $(1,1)$ using Newton's method
We have by linearization:
$$
\begin{align*}
f(x,y) &\approx f(x_n, y_n) + \frac{\partial f}{\partial x}\Delta x + \frac{\partial f}{\partial y}\Delta y \\
g(x,y) &\approx g(x_n, y_n) + \frac{\partial g}{\partial x}\Delta x + \frac{\partial g}{\partial y}\Delta y,
\end{align*}
$$
where $\Delta x = x- x_n$ and $\Delta y = y-y_n$. Setting $f(x,y)=0$ and $g(x,y)=0$, leaves these two linear equations in $\Delta x$ and $\Delta y$:
$$
\begin{align*}
\frac{\partial f}{\partial x} \Delta x + \frac{\partial f}{\partial y} \Delta y &= -f(x_n, y_n)\\
\frac{\partial g}{\partial x} \Delta x + \frac{\partial g}{\partial y} \Delta y &= -g(x_n, y_n).
\end{align*}
$$
One step of Newton's method defines $(x_{n+1}, y_{n+1})$ to be the values $(x,y)$ that make the linearized functions about $(x_n, y_n)$ both equal to $\vec{0}$.
@@ -679,14 +682,18 @@ An *absolute* maximum over $U$, should it exist, would be $f(\vec{a})$ if there
The difference is the same as the one-dimensional case: local is a statement about nearby points only, absolute a statement about all the points in the specified set.
::: {.callout-note icon=false}
## The [Extreme Value Theorem](https://tinyurl.com/yyhgxu8y)
> The [Extreme Value Theorem](https://tinyurl.com/yyhgxu8y) Let $f:R^n \rightarrow R$ be continuous and defined on *closed* set $V$. Then $f$ has a minimum value $m$ and maximum value $M$ over $V$ and there exists at least two points $\vec{a}$ and $\vec{b}$ with $m = f(\vec{a})$ and $M = f(\vec{b})$.
Let $f:R^n \rightarrow R$ be continuous and defined on *closed* set $V$. Then $f$ has a minimum value $m$ and maximum value $M$ over $V$ and there exists at least two points $\vec{a}$ and $\vec{b}$ with $m = f(\vec{a})$ and $M = f(\vec{b})$.
:::
::: {.callout-note icon=false}
## [Fermat](https://tinyurl.com/nfgz8fz)'s theorem on critical points
Let $f:R^n \rightarrow R$ be a continuous function defined on an *open* set $U$. If $x \in U$ is a point where $f$ has a local extrema *and* $f$ is differentiable, then the gradient of $f$ at $x$ is $\vec{0}$.
> [Fermat](https://tinyurl.com/nfgz8fz)'s theorem on critical points. Let $f:R^n \rightarrow R$ be a continuous function defined on an *open* set $U$. If $x \in U$ is a point where $f$ has a local extrema *and* $f$ is differentiable, then the gradient of $f$ at $x$ is $\vec{0}$.
:::
Call a point in the domain of $f$ where the function is differentiable and the gradient is zero a *stationary point* and a point in the domain where the function is either not differentiable or is a stationary point a *critical point*. The local extrema can only happen at critical points by Fermat.
@@ -735,16 +742,16 @@ To identify these through formulas, and not graphically, we could try and use th
The generalization of the *second* derivative test is more concrete though. Recall, the second derivative test is about the concavity of the function at the critical point. When the concavity can be determined as non-zero, the test is conclusive; when the concavity is zero, the test is not conclusive. Similarly here:
::: {.callout-note icon=false}
## The [second](https://en.wikipedia.org/wiki/Second_partial_derivative_test) Partial Derivative Test for $f:R^2 \rightarrow R$.
> The [second](https://en.wikipedia.org/wiki/Second_partial_derivative_test) Partial Derivative Test for $f:R^2 \rightarrow R$.
>
> Assume the first and second partial derivatives of $f$ are defined and continuous; $\vec{a}$ be a critical point of $f$; $H$ is the hessian matrix, $[f_{xx}\quad f_{xy};f_{xy}\quad f_{yy}]$, and $d = \det(H) = f_{xx} f_{yy} - f_{xy}^2$ is the determinant of the Hessian matrix. Then:
>
> * The function $f$ has a local minimum at $\vec{a}$ if $f_{xx} > 0$ *and* $d>0$,
> * The function $f$ has a local maximum at $\vec{a}$ if $f_{xx} < 0$ *and* $d>0$,
> * The function $f$ has a saddle point at $\vec{a}$ if $d < 0$,
> * Nothing can be said if $d=0$.
Assume the first and second partial derivatives of $f$ are defined and continuous; $\vec{a}$ be a critical point of $f$; $H$ is the hessian matrix, $[f_{xx}\quad f_{xy};f_{xy}\quad f_{yy}]$, and $d = \det(H) = f_{xx} f_{yy} - f_{xy}^2$ is the determinant of the Hessian matrix. Then:
* The function $f$ has a local minimum at $\vec{a}$ if $f_{xx} > 0$ *and* $d>0$,
* The function $f$ has a local maximum at $\vec{a}$ if $f_{xx} < 0$ *and* $d>0$,
* The function $f$ has a saddle point at $\vec{a}$ if $d < 0$,
* Nothing can be said if $d=0$.
:::
---
@@ -911,7 +918,7 @@ zs = fₗ.(xs, ys)
scatter3d!(xs, ys, zs)
```
A contour plot also shows that some - and only one - extrema happens on the interior:
A contour plot also shows that some---and only one---extrema happens on the interior:
```{julia}
@@ -960,10 +967,10 @@ We confirm this by looking at the Hessian and noting $H_{11} > 0$:
Hₛ = subs.(hessian(exₛ, [x,y]), x=>xstarₛ[x], y=>xstarₛ[y])
```
As it occurs at $(\bar{x}, \bar{y})$ where $\bar{x} = (x_1 + x_2 + x_3)/3$ and $\bar{y} = (y_1+y_2+y_3)/3$ - the averages of the three values - the critical point is an interior point of the triangle.
As it occurs at $(\bar{x}, \bar{y})$ where $\bar{x} = (x_1 + x_2 + x_3)/3$ and $\bar{y} = (y_1+y_2+y_3)/3$---the averages of the three values---the critical point is an interior point of the triangle.
As mentioned by Strang, the real problem is to minimize $d_1 + d_2 + d_3$. A direct approach with `SymPy` - just replacing `d2` above with the square root fails. Consider instead the gradient of $d_1$, say. To avoid square roots, this is taken implicitly from $d_1^2$:
As mentioned by Strang, the real problem is to minimize $d_1 + d_2 + d_3$. A direct approach with `SymPy`---just replacing `d2` above with the square root fails. Consider instead the gradient of $d_1$, say. To avoid square roots, this is taken implicitly from $d_1^2$:
$$
@@ -1009,7 +1016,7 @@ psₛₗ = [a*u for (a,u) in zip(asₛ₁, usₛ)]
plot!(polygon(psₛₗ)...)
```
Let's see where the minimum distance point is by constructing a plot. The minimum must be on the boundary, as the only point where the gradient vanishes is the origin, not in the triangle. The plot of the triangle has a contour plot of the distance function, so we see clearly that the minimum happens at the point `[0.5, -0.866025]`. On this plot, we drew the gradient at some points along the boundary. The gradient points in the direction of greatest increase - away from the minimum. That the gradient vectors have a non-zero projection onto the edges of the triangle in a direction pointing away from the point indicates that the function `d` would increase if moved along the boundary in that direction, as indeed it does.
Let's see where the minimum distance point is by constructing a plot. The minimum must be on the boundary, as the only point where the gradient vanishes is the origin, not in the triangle. The plot of the triangle has a contour plot of the distance function, so we see clearly that the minimum happens at the point `[0.5, -0.866025]`. On this plot, we drew the gradient at some points along the boundary. The gradient points in the direction of greatest increase---away from the minimum. That the gradient vectors have a non-zero projection onto the edges of the triangle in a direction pointing away from the point indicates that the function `d` would increase if moved along the boundary in that direction, as indeed it does.
```{julia}
@@ -1057,7 +1064,7 @@ The smallest value is when $t=0$ or $t=1$, so at one of the points, as `li` is d
##### Example: least squares
We know that two points determine a line. What happens when there are more than two points? This is common in statistics where a bivariate data set (pairs of points $(x,y)$) are summarized through a linear model $\mu_{y|x} = \alpha + \beta x$, That is the average value for $y$ given a particular $x$ value is given through the equation of a line. The data is used to identify what the slope and intercept are for this line. We consider a simple case - $3$ points. The case of $n \geq 3$ being similar.
We know that two points determine a line. What happens when there are more than two points? This is common in statistics where a bivariate data set (pairs of points $(x,y)$) are summarized through a linear model $\mu_{y|x} = \alpha + \beta x$, That is the average value for $y$ given a particular $x$ value is given through the equation of a line. The data is used to identify what the slope and intercept are for this line. We consider a simple case---$3$ points. The case of $n \geq 3$ being similar.
We have a line $l(x) = \alpha + \beta(x)$ and three points $(x_1, y_1)$, $(x_2, y_2)$, and $(x_3, y_3)$. Unless these three points *happen* to be collinear, they can't possibly all lie on the same line. So to *approximate* a relationship by a line requires some inexactness. One measure of inexactness is the *vertical* distance to the line:
@@ -1069,11 +1076,12 @@ $$
Another might be the vertical squared distance to the line:
$$
\begin{align*}
d2(\alpha, \beta) &= (y_1 - l(x_1))^2 + (y_2 - l(x_2))^2 + (y_3 - l(x_3))^2 \\
&= (y1 - (\alpha + \beta x_1))^2 + (y2 - (\alpha + \beta x_2))^2 + (y3 - (\alpha + \beta x_3))^2
\end{align*}
$$
Another might be the *shortest* distance to the line:
@@ -1110,7 +1118,7 @@ As found, the formulas aren't pretty. If $x_1 + x_2 + x_3 = 0$ they simplify. Fo
subs(outₗₛ[β], sum(xₗₛ) => 0)
```
Let $\vec{x} = \langle x_1, x_2, x_3 \rangle$ and $\vec{y} = \langle y_1, y_2, y_3 \rangle$ this is simply $(\vec{x} \cdot \vec{y})/(\vec{x}\cdot \vec{x})$, a formula that will generalize to $n > 3$. The assumption is not a restriction - it comes about by subtracting the mean, $\bar{x} = (x_1 + x_2 + x_3)/3$, from each $x$ term (and similarly subtract $\bar{y}$ from each $y$ term). A process called "centering."
Let $\vec{x} = \langle x_1, x_2, x_3 \rangle$ and $\vec{y} = \langle y_1, y_2, y_3 \rangle$ this is simply $(\vec{x} \cdot \vec{y})/(\vec{x}\cdot \vec{x})$, a formula that will generalize to $n > 3$. The assumption is not a restriction---it comes about by subtracting the mean, $\bar{x} = (x_1 + x_2 + x_3)/3$, from each $x$ term (and similarly subtract $\bar{y}$ from each $y$ term). A process called "centering."
With this observation, the formulas can be re-expressed through:
@@ -1407,18 +1415,22 @@ contour!(xs, ys, f, levels = [.7, .85, 1, 1.15, 1.3])
We can still identify the tangent and normal directions. What is different about this point is that local movement on the constraint curve is also local movement on the contour line of $f$, so $f$ doesn't increase or decrease here, as it would if this point were an extrema along the constraint. The key to seeing this is the contour lines of $f$ are *tangent* to the constraint. The respective gradients are *orthogonal* to their tangent lines, and in dimension $2$, this implies they are parallel to each other.
::: {.callout-note icon=false}
## The method of Lagrange multipliers
> *The method of Lagrange multipliers*: To optimize $f(x,y)$ subject to a constraint $g(x,y) = k$ we solve for all *simultaneous* solutions to
>
>
> \begin{align*}
> \nabla{f}(x,y) &= \lambda \nabla{g}(x,y), \text{and}\\
> g(x,y) &= k.
> \end{align*}
>
>
> These *possible* points are evaluated to see if they are maxima or minima.
To optimize $f(x,y)$ subject to a constraint $g(x,y) = k$ we solve for all *simultaneous* solutions to
$$
\begin{align*}
\nabla{f}(x,y) &= \lambda \nabla{g}(x,y), \text{and}\\
g(x,y) &= k.
\end{align*}
$$
These *possible* points are evaluated to see if they are maxima or minima.
:::
The method will not work if $\nabla{g} = \vec{0}$ or if $f$ and $g$ are not differentiable.
@@ -1472,12 +1484,13 @@ $$
The we have
$$
\begin{align*}
\frac{\partial L}{\partial{x}} &= \frac{\partial{f}}{\partial{x}} - \lambda \frac{\partial{g}}{\partial{x}}\\
\frac{\partial L}{\partial{y}} &= \frac{\partial{f}}{\partial{y}} - \lambda \frac{\partial{g}}{\partial{y}}\\
\frac{\partial L}{\partial{\lambda}} &= 0 + (g(x,y) - k).
\end{align*}
$$
But if the Lagrange condition holds, each term is $0$, so Lagrange's method can be seen as solving for point $\nabla{L} = \vec{0}$. The optimization problem in two variables with a constraint becomes a problem of finding and classifying zeros of a function with *three* variables.
@@ -1556,13 +1569,14 @@ The starting point is a *perturbation*: $\hat{y}(x) = y(x) + \epsilon_1 \eta_1(x
With this notation, and fixing $y$ we can re-express the equations in terms of $\epsilon_1$ and $\epsilon_2$:
$$
\begin{align*}
F(\epsilon_1, \epsilon_2) &= \int f(x, \hat{y}, \hat{y}') dx =
\int f(x, y + \epsilon_1 \eta_1 + \epsilon_2 \eta_2, y' + \epsilon_1 \eta_1' + \epsilon_2 \eta_2') dx,\\
G(\epsilon_1, \epsilon_2) &= \int g(x, \hat{y}, \hat{y}') dx =
\int g(x, y + \epsilon_1 \eta_1 + \epsilon_2 \eta_2, y' + \epsilon_1 \eta_1' + \epsilon_2 \eta_2') dx.
\end{align*}
$$
Then our problem is restated as:
@@ -1573,7 +1587,7 @@ $$
G(\epsilon_1, \epsilon_2) = L.
$$
Now, Lagrange's method can be employed. This will be fruitful - even though we know the answer - it being $\epsilon_1 = \epsilon_2 = 0$!
Now, Lagrange's method can be employed. This will be fruitful---even though we know the answer---it being $\epsilon_1 = \epsilon_2 = 0$!
Forging ahead, we compute $\nabla{F}$ and $\lambda \nabla{G}$ and set $\epsilon_1 = \epsilon_2 = 0$ where the two are equal. This will lead to a description of $y$ in terms of $y'$.
@@ -1590,7 +1604,7 @@ $$
Computing just the first one, we have using the chain rule and assuming interchanging the derivative and integral is possible:
$$
\begin{align*}
\frac{\partial{F}}{\partial{\epsilon_1}}
&= \int \frac{\partial}{\partial{\epsilon_1}}(
@@ -1598,6 +1612,7 @@ f(x, y + \epsilon_1 \eta_1 + \epsilon_2 \eta_2, y' + \epsilon_1 \eta_1' + \epsil
&= \int \left(\frac{\partial{f}}{\partial{y}} \eta_1 + \frac{\partial{f}}{\partial{y'}} \eta_1'\right) dx\quad\quad(\text{from }\nabla{f} \cdot \langle 0, \eta_1, \eta_1'\rangle)\\
&=\int \eta_1 \left(\frac{\partial{f}}{\partial{y}} - \frac{d}{dx}\frac{\partial{f}}{\partial{y'}}\right) dx.
\end{align*}
$$
The last line by integration by parts:
@@ -1664,11 +1679,12 @@ ex2 = Eq(ex1.lhs()^2 - 1, simplify(ex1.rhs()^2) - 1)
Now $y'$ can be integrated using the substitution $y - C = \lambda \cos\theta$ to give: $-\lambda\int\cos\theta d\theta = x + D$, $D$ some constant. That is:
$$
\begin{align*}
x + D &= - \lambda \sin\theta\\
y - C &= \lambda\cos\theta.
\end{align*}
$$
Squaring gives the equation of a circle: $(x +D)^2 + (y-C)^2 = \lambda^2$.
@@ -1680,11 +1696,12 @@ We center and *rescale* the problem so that $x_0 = -1, x_1 = 1$. Then $L > 2$ as
We have $y=0$ at $x=1$ and $-1$ giving:
$$
\begin{align*}
(-1 + D)^2 + (0 - C)^2 &= \lambda^2\\
(+1 + D)^2 + (0 - C)^2 &= \lambda^2.
\end{align*}
$$
Squaring out and solving gives $D=0$, $1 + C^2 = \lambda^2$. That is, an arc of circle with radius $\sqrt{1+C^2}$ and centered at $(0, C)$.
@@ -1776,7 +1793,7 @@ where $R_k(x) = f^{k+1}(\xi)/(k+1)!(x-a)^{k+1}$ for some $\xi$ between $a$ and $
This theorem can be generalized to scalar functions, but the notation can be cumbersome. Following [Folland](https://sites.math.washington.edu/~folland/Math425/taylor2.pdf) we use *multi-index* notation. Suppose $f:R^n \rightarrow R$, and let $\alpha=(\alpha_1, \alpha_2, \dots, \alpha_n)$. Then define the following notation:
$$
\begin{align*}
|\alpha| &= \alpha_1 + \cdots + \alpha_n, \\
\alpha! &= \alpha_1!\alpha_2!\cdot\cdots\cdot\alpha_n!, \\
@@ -1784,6 +1801,7 @@ This theorem can be generalized to scalar functions, but the notation can be cum
\partial^\alpha f &= \partial_1^{\alpha_1}\partial_2^{\alpha_2}\cdots \partial_n^{\alpha_n} f \\
& = \frac{\partial^{|\alpha|}f}{\partial x_1^{\alpha_1} \partial x_2^{\alpha_2} \cdots \partial x_n^{\alpha_n}}.
\end{align*}
$$
This notation makes many formulas from one dimension carry over to higher dimensions. For example, the binomial theorem says:
@@ -1800,8 +1818,8 @@ $$
(x_1 + x_2 + \cdots + x_n)^n = \sum_{|\alpha|=k} \frac{k!}{\alpha!} \vec{x}^\alpha.
$$
Taylor's theorem then becomes:
::: {.callout-note icon=false}
## Taylor's theorem using multi-index
If $f: R^n \rightarrow R$ is sufficiently smooth ($C^{k+1}$) on an open convex set $S$ about $\vec{a}$ then if $\vec{a}$ and $\vec{a}+\vec{h}$ are in $S$,
@@ -1812,18 +1830,20 @@ $$
where $R_{\vec{a},k} = \sum_{|\alpha|=k+1}\partial^\alpha \frac{f(\vec{a} + c\vec{h})}{\alpha!} \vec{h}^\alpha$ for some $c$ in $(0,1)$.
:::
##### Example
The elegant notation masks what can be complicated expressions. Consider the simple case $f:R^2 \rightarrow R$ and $k=2$. Then this says:
$$
\begin{align*}
f(x + dx, y+dy) &= f(x, y) + \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy \\
&+ \frac{\partial^2 f}{\partial x^2} \frac{dx^2}{2} + 2\frac{\partial^2 f}{\partial x\partial y} \frac{dx dy}{2}\\
&+ \frac{\partial^2 f}{\partial y^2} \frac{dy^2}{2} + R_{\langle x, y \rangle, k}(\langle dx, dy \rangle).
\end{align*}
$$
Using $\nabla$ and $H$ for the Hessian and $\vec{x} = \langle x, y \rangle$ and $d\vec{x} = \langle dx, dy \rangle$, this can be expressed as:

View File

@@ -24,7 +24,7 @@ For a scalar function $f: R^n \rightarrow R$, the gradient of $f$, $\nabla{f}$,
| $f: R\rightarrow R$ | univariate | familiar graph of function | $f$ |
| $f: R\rightarrow R^m$ | vector-valued | space curve when n=2 or 3 | $\vec{r}$, $\vec{N}$ |
| $f: R^n\rightarrow R$ | scalar | a surface when n=2 | $f$ |
| $F: R^n\rightarrow R^n$ | vector field | a vector field when n=2 | $F$ |
| $F: R^n\rightarrow R^n$ | vector field | a vector field when n=2, 3| $F$ |
| $F: R^n\rightarrow R^m$ | multivariable | n=2,m=3 describes a surface | $F$, $\Phi$ |
@@ -34,7 +34,9 @@ After an example where the use of a multivariable function is of necessity, we d
## Vector fields
We have seen that the gradient of a scalar function, $f:R^2 \rightarrow R$, takes a point in $R^2$ and associates a vector in $R^2$. As such $\nabla{f}:R^2 \rightarrow R^2$ is a vector field. A vector field can be visualized by sampling a region and representing the field at those points. The details, as previously mentioned, are in the `vectorfieldplot` function of `CalculusWithJulia`.
We have seen that the gradient of a scalar function, $f:R^2 \rightarrow R$, takes a point in $R^2$ and associates a vector in $R^2$. As such $\nabla{f}:R^2 \rightarrow R^2$ is a vector field. A vector field is a vector-valued function from $R^n \rightarrow R^n$ for $n \geq 2$.
An input/output pair can be visualized by identifying the input values as a point, and the output as a vector visualized by anchoring the vector at the point. A vector field is a sampling of such pairs, usually taken over some ordered grid. The details, as previously mentioned, are in the `vectorfieldplot` function of `CalculusWithJulia`.
```{julia}
@@ -78,6 +80,7 @@ Vector fields are also useful for other purposes, such as transformations, examp
For transformations, a useful visualization is to plot curves where one variables is fixed. Consider the transformation from polar coordinates to cartesian coordinates $F(r, \theta) = r \langle\cos(\theta),\sin(\theta)\rangle$. The following plot will show in blue fixed values of $r$ (circles) and in red fixed values of $\theta$ (rays).
::: {#fig-transformation-partial-derivative}
```{julia}
#| hold: true
@@ -95,10 +98,21 @@ pt = [1, pi/4]
J = ForwardDiff.jacobian(F, pt)
arrow!(F(pt...), J[:,1], linewidth=5, color=:red)
arrow!(F(pt...), J[:,2], linewidth=5, color=:blue)
pt = [0.5, pi/8]
J = ForwardDiff.jacobian(F, pt)
arrow!(F(pt...), J[:,1], linewidth=5, color=:red)
arrow!(F(pt...), J[:,2], linewidth=5, color=:blue)
```
Plot of a vector field from $R^2 \rightarrow R^2$ illustrated by drawing curves with fixed $r$ and $\theta$. The partial derivatives are added as layers.
:::
To the plot, we added the partial derivatives with respect to $r$ (in red) and with respect to $\theta$ (in blue). These are found with the soon-to-be discussed Jacobian. From the graph, you can see that these vectors are tangent vectors to the drawn curves.
The curves form a non-rectangular grid. Were the cells exactly parallelograms, the area would be computed taking into account the length of the vectors and the angle between them---the same values that come out of a cross product.
## Parametrically defined surfaces
@@ -136,7 +150,7 @@ When a surface is described as a level curve, $f(x,y,z) = c$, then the gradient
When a surface is described parametrically, there is no "gradient." The *partial* derivatives are of interest, e.g., $\partial{F}/\partial{\theta}$ and $\partial{F}/\partial{\phi}$, vectors defined componentwise. These will be lie in the tangent plane of the surface, as they can be viewed as tangent vectors for parametrically defined curves on the surface. Their cross product will be *normal* to the surface. The magnitude of the cross product, which reflects the angle between the two partial derivatives, will be informative as to the surface area.
### Plotting parametrized surfaces in `Julia`
### Plotting parameterized surfaces in `Julia`
Consider the parametrically described surface above. How would it be plotted? Using the `Plots` package, the process is quite similar to how a surface described by a function is plotted, but the $z$ values must be computed prior to plotting.
@@ -191,11 +205,12 @@ surface(unzip(Phi.(thetas, phis'))...)
The partial derivatives of each component, $\partial{\Phi}/\partial{\theta}$ and $\partial{\Phi}/\partial{\phi}$, can be computed directly:
$$
\begin{align*}
\partial{\Phi}/\partial{\theta} &= \langle -\sin(\phi)\sin(\theta), \sin(\phi)\cos(\theta),0 \rangle,\\
\partial{\Phi}/\partial{\phi} &= \langle \cos(\phi)\cos(\theta), \cos(\phi)\sin(\theta), -\sin(\phi) \rangle.
\end{align*}
$$
Using `SymPy`, we can compute through:
@@ -233,6 +248,217 @@ arrow!(Phi(pt...), out₁[:,1], linewidth=3)
arrow!(Phi(pt...), out₁[:,2], linewidth=3)
```
##### Example: A detour into plotting
The presentation of a 3D figure in a 2D format requires the use of linear perspective. The `Plots` package adds lighting effects, to nicely render a surface, as seen.
In this example, we see some of the mathematics behind how drawing a surface can be done more primitively to showcase some facts about vectors. We follow a few techniques learned from @Angenent.
```{julia}
#| echo: false
gr()
nothing
```
For our purposes we wish to mathematically project a figure onto a 2D plane.
The plane here is described by a view point in 3D space, $\vec{v}$. Taking this as one vector in an orthogonal coordinate system, the other two can be easily produced, the first by switching two coordinates, as would be done in 2D; the second through the cross product:
```{julia}
function projection_plane(v)
vx, vy, vz = v
a = [-vy, vx, 0] # v ⋅ a = 0
b = v × a # so v ⋅ b = 0
return (a/norm(a), b/norm(b))
end
```
Using these two unit vectors to describe the plane, the projection of a point onto the plane is simply found by taking dot products:
```{julia}
function project(x, v)
â, b̂ = projection_plane(v)
(x ⋅ â, x ⋅ b̂) # (x ⋅ â) â + (x ⋅ b̂) b̂
end
```
Let's see this in action by plotting a surface of revolution given by
```{julia}
radius(t) = 1 / (1 + exp(t))
t₀, tₙ = 0, 3
surf(t, θ) = [t, radius(t)*cos(θ), radius(t)*sin(θ)]
```
We begin by fixing a view point and plotting the projected axes. We do the latter with a function for re-use.
```{julia}
v = [2, -2, 1]
function plot_axes()
empty_style = (xaxis = ([], false),
yaxis = ([], false),
legend=false)
plt = plot(; empty_style...)
axis_values = [[(0,0,0), (3.5,0,0)], # x axis
[(0,0,0), (0, 2.0 * radius(0), 0)], # yaxis
[(0,0,0), (0, 0, 1.5 * radius(0))]] # z axis
for (ps, ax) ∈ zip(axis_values, ("x", "y", "z"))
p0, p1 = ps
a, b = project(p0, v), project(p1, v)
annotate!([(b...,text(ax, :bottom))])
plot!([a, b]; arrow=true, head=:tip, line=(:gray, 1)) # gr() allows arrows
end
plt
end
plt = plot_axes()
```
We are using the vector of tuples interface (representing points) to specify the curve to draw.
Now we add on some curves for fixed $t$ and then fixed $\theta$ utilizing the fact that `project` returns a tuple of $x$---$y$ values to display.
```{julia}
for t in range(t₀, tₙ, 20)
curve = [project(surf(t, θ), v) for θ in range(0, 2pi, 100)]
plot!(curve; line=(:black, 1))
end
for θ in range(0, 2pi, 60)
curve = [project(surf(t, θ), v) for t in range(t₀, tₙ, 20)]
plot!(curve; line=(:black, 1))
end
plt
```
The graphic is a little busy!
Let's focus on the cells layering the surface. These have equal size in the $t \times \theta$ range, but unequal area on the screen. Where they parallellograms, the area could be found by taking the 2-dimensional cross product of the two partial derivatives, resulting in a formula like: $a_x b_y - a_y b_x$.
When we discuss integrals related to such figures, this amount of area will be characterized by a computation involving the determinant of the upcoming Jacobian function.
We make a function to close over the viewpoint vector that can be passed to `ForwardDiff`, as it will return a vector and not a tuple.
```{julia}
function psurf(v)
(t,θ) -> begin
v1, v2 = project(surf(t, θ), v)
[v1, v2] # or call collect to make a tuple into a vector
end
end
```
The function returned by `psurf` is from $R^2 \rightarrow R^2$. With such a function, the computation of this approximate area becomes:
```{julia}
function detJ(F, t, θ)
∂θ = ForwardDiff.derivative(θ -> F(t, θ), θ)
∂t = ForwardDiff.derivative(t -> F(t, θ), t)
(ax, ay), (bx, by) = ∂θ, ∂t
ax * by - ay * bx
end
```
For our purposes, we are interested in the sign of the returned value. Plotting, we can see that some "area" is positive, some "negative":
```{julia}
t = 1
G = psurf(v)
plot(θ -> detJ(G, t, θ), 0, 2pi)
```
With this parameterization and viewpoint, the positive area for the surface is when the normal vector points towards the viewing point. In the following, we only plot such values:
```{julia}
plt = plot_axes()
function I(F, t, θ)
x, y = F(t, θ)
detJ(F, t, θ) >= 0 ? (x, y) : (x, NaN) # use NaN for y value
end
for t in range(t₀, tₙ, 20)
curve = [I(G, t, θ) for θ in range(0, 2pi, 100)]
plot!(curve; line=(:gray, 1))
end
for θ in range(0, 2pi, 60)
curve = [I(G, t, θ) for t in range(t₀, tₙ, 20)]
plot!(curve; line=(:gray, 1))
end
plt
```
The values for which `detJ` is zero form the visible boundary of the object. We can plot just those to get an even less busy view. We identify them by finding the value of $\theta$ in $[0,\pi]$ and $[\pi,2\pi]$ that makes the `detJ` function zero:
```{julia}
fold(F, t, θmin, θmax) = find_zero(θ -> detJ(F, t, θ), (θmin, θmax))
ts = range(t₀, tₙ, 100)
back_edge = fold.(G, ts, 0, pi)
front_edge = fold.(G, ts, pi, 2pi)
plt = plot_axes()
plot!(project.(surf.(ts, back_edge), (v,)); line=(:black, 1))
plot!(project.(surf.(ts, front_edge), (v,)); line=(:black, 1))
```
Adding caps makes the graphic stand out. The caps are just discs (fixed values of $t$) which are filled in with gray using a transparency so that the axes aren't masked.
```{julia}
θs = range(0, 2pi, 100)
S = Shape(project.(surf.(t₀, θs), (v,)))
plot!(S; fill=(:gray, 0.33))
S = Shape(project.(surf.(tₙ, θs), (v,)))
plot!(S; fill=(:gray, 0.33))
```
Finally, we introduce some shading using the same technique but assuming the light comes from a different position.
```{julia}
lightpt = [2, -2, 5] # from further above
H = psurf(lightpt)
light_edge = fold.(H, ts, pi, 2pi);
```
Angles between the light edge and the front edge would be in shadow. We indicate this by drawing lines for fixed $t$ values. As denser lines indicate more shadow, we feather how these are drawn:
```{julia}
for (i, (t, top, bottom)) in enumerate(zip(ts, light_edge, front_edge))
λ = iseven(i) ? 1.0 : 0.8
top = bottom + λ*(top - bottom)
curve = [project(surf(t, θ), v) for θ in range(bottom, top, 20)]
plot!(curve, line=(:black, 1))
end
plt
```
We can compare to the graph produced by `surface` for the same function:
```{julia}
ts = range(t₀, tₙ, 50)
θs = range(0, 2pi, 100)
surface(unzip(surf.(ts, θs'))...; legend=false)
```
```{julia}
#| echo: false
plotly()
nothing
```
## The total derivative
@@ -359,7 +585,7 @@ where $\epsilon(h) \rightarrow \vec{0}$ as $h \rightarrow \vec{0}$.
We have, using this for *both* $F$ and $G$:
$$
\begin{align*}
F(G(a + \vec{h})) - F(G(a)) &=
F(G(a) + (dG_a \cdot \vec{h} + \epsilon_G \vec{h})) - F(G(a))\\
@@ -367,18 +593,20 @@ F(G(a) + (dG_a \cdot \vec{h} + \epsilon_G \vec{h})) - F(G(a))\\
&+ \quad\epsilon_F (dG_a \cdot \vec{h} + \epsilon_G \vec{h}) - F(G(a))\\
&= dF_{G(a)} \cdot (dG_a \cdot \vec{h}) + dF_{G(a)} \cdot (\epsilon_G \vec{h}) + \epsilon_F (dG_a \cdot \vec{h}) + (\epsilon_F \cdot \epsilon_G\vec{h})
\end{align*}
$$
The last line uses the linearity of $dF$ to isolate $dF_{G(a)} \cdot (dG_a \cdot \vec{h})$. Factoring out $\vec{h}$ and taking norms gives:
$$
\begin{align*}
\frac{\| F(G(a+\vec{h})) - F(G(a)) - dF_{G(a)}dG_a \cdot \vec{h} \|}{\| \vec{h} \|} &=
\frac{\| dF_{G(a)}\cdot(\epsilon_G\vec{h}) + \epsilon_F (dG_a\cdot \vec{h}) + (\epsilon_F\cdot\epsilon_G\vec{h}) \|}{\| \vec{h} \|} \\
&\leq \| dF_{G(a)}\cdot\epsilon_G + \epsilon_F (dG_a) + \epsilon_F\cdot\epsilon_G \|\frac{\|\vec{h}\|}{\| \vec{h} \|}\\
&\rightarrow 0.
\end{align*}
$$
### Examples
@@ -660,7 +888,7 @@ det(A1), 1/det(A2)
The technique of *implicit differentiation* is a useful one, as it allows derivatives of more complicated expressions to be found. The main idea, expressed here with three variables is if an equation may be viewed as $F(x,y,z) = c$, $c$ a constant, then $z=\phi(x,y)$ may be viewed as a function of $x$ and $y$. Hence, we can use the chain rule to find: $\partial z / \partial x$ and $\partial z /\partial y$. Let $G(x,y) = \langle x, y, \phi(x,y) \rangle$ and then differentiation $(F \circ G)(x,y) = c$:
$$
\begin{align*}
0 &= dF_{G(x,y)} \circ dG_{\langle x, y\rangle}\\
&= [\frac{\partial F}{\partial x}\quad \frac{\partial F}{\partial y}\quad \frac{\partial F}{\partial z}](G(x,y)) \cdot
@@ -670,6 +898,7 @@ The technique of *implicit differentiation* is a useful one, as it allows deriva
\frac{\partial \phi}{\partial x} & \frac{\partial \phi}{\partial y}
\end{bmatrix}.
\end{align*}
$$
Solving yields
@@ -685,14 +914,17 @@ Where the right hand side of each is evaluated at $G(x,y)$.
When can it be reasonably assumed that such a function $z= \phi(x,y)$ exists?
The [Implicit Function Theorem](https://en.wikipedia.org/wiki/Implicit_function_theorem) provides a statement (slightly abridged here):
::: {.callout-note icon=false}
The [Implicit Function Theorem](https://en.wikipedia.org/wiki/Implicit_function_theorem) (slightly abridged)
> Let $F:R^{n+m} \rightarrow R^m$ be a continuously differentiable function and let $R^{n+m}$ have (compactly defined) coordinates $\langle \vec{x}, \vec{y} \rangle$, Fix a point $\langle \vec{a}, \vec{b} \rangle$ with $F(\vec{a}, \vec{b}) = \vec{0}$. Let $J_{F, \vec{y}}(\vec{a}, \vec{b})$ be the Jacobian restricted to *just* the $y$ variables. ($J$ is $m \times m$.) If this matrix has non-zero determinant (it is invertible), then there exists an open set $U$ containing $\vec{a}$ and a *unique* continuously differentiable function $G: U \subset R^n \rightarrow R^m$ such that $G(\vec{a}) = \vec{b}$, $F(\vec{x}, G(\vec{x})) = 0$ for $\vec x$ in $U$. Moreover, the partial derivatives of $G$ are given by the matrix product:
>
> $\frac{\partial G}{\partial x_j}(\vec{x}) = - [J_{F, \vec{y}}(x, F(\vec{x}))]^{-1} \left[\frac{\partial F}{\partial x_j}(x, G(\vec{x}))\right].$
Let $F:R^{n+m} \rightarrow R^m$ be a continuously differentiable function and let $R^{n+m}$ have (compactly defined) coordinates $\langle \vec{x}, \vec{y} \rangle$, Fix a point $\langle \vec{a}, \vec{b} \rangle$ with $F(\vec{a}, \vec{b}) = \vec{0}$. Let $J_{F, \vec{y}}(\vec{a}, \vec{b})$ be the Jacobian restricted to *just* the $y$ variables. ($J$ is $m \times m$.) If this matrix has non-zero determinant (it is invertible), then there exists an open set $U$ containing $\vec{a}$ and a *unique* continuously differentiable function $G: U \subset R^n \rightarrow R^m$ such that $G(\vec{a}) = \vec{b}$, $F(\vec{x}, G(\vec{x})) = 0$ for $\vec x$ in $U$. Moreover, the partial derivatives of $G$ are given by the matrix product:
$$
\frac{\partial G}{\partial x_j}(\vec{x}) = - [J_{F, \vec{y}}(x, F(\vec{x}))]^{-1} \left[\frac{\partial F}{\partial x_j}(x, G(\vec{x}))\right].
$$
:::
---
@@ -832,8 +1064,337 @@ Taking $\partial/\partial{a_i}$ gives equations $2a_i\sigma_i^2 + \lambda = 0$,
For the special case of a common variance, $\sigma_i=\sigma$, the above simplifies to $a_i = 1/n$ and the estimator is $\sum X_i/n$, the familiar sample mean, $\bar{X}$.
##### Example: The mean value theorem
[Perturbing the Mean Value Theorem: Implicit Functions, the Morse Lemma, and Beyond](https://www.jstor.org/stable/48661587) by Lowry-Duda, and Wheeler presents an interesting take on the mean-value theorem by asking if the endpoint $b$ moves continuously, does the value $c$ move continuously?
Fix the left-hand endpoint, $a_0$, and consider:
$$
F(b,c) = \frac{f(b) - f(a_0)}{b-a_0} - f'(c).
$$
Solutions to $F(b,c)=0$ satisfy the mean value theorem for $f$.
Suppose $(b_0,c_0)$ is one such solution.
By using the implicit function theorem, the question of finding a $C(b)$ such that $C$ is continuous near $b_0$ and satisfied $F(b, C(b)) =0$ for $b$ near $b_0$ can be characterized.
To analyze this question, Lowry-Duda and Wheeler fix a set of points $a_0 = 0$, $b_0=3$ and consider functions $f$ with $f(a_0) = f(b_0) = 0$. Similar to how Rolle's theorem easily proves the mean value theorem, this choice imposes no loss of generality.
Suppose further that $c_0 = 1$, where $c_0$ solves the mean value theorem:
$$
f'(c_0) = \frac{f(b_0) - f(a_0)}{b_0 - a_0}.
$$
Again, this is no loss of generality. By construction $(b_0, c_0)$ is a zero of the just defined $F$.
We are interested in the shape of the level set $F(b,c) = 0$ which reveals other solutions $(b,c)$. For a given $f$, a contour plot, with $b>c$, can reveal this shape.
To find a source of examples for such functions, polynomials are considered, beginning with these constraints:
$$
f(a_0) = 0, f(b_0) = 0, f(c_0) = 1, f'(c_0) = 0
$$
With four conditions, we might guess a cubic parabola with four unknowns should fit. We use `SymPy` to identify the coefficients.
```{julia}
a₀, b₀, c₀ = 0, 3, 1
@syms x
@syms a[0:3]
p = sum(aᵢ*x^(i-1) for (i,aᵢ) ∈ enumerate(a))
dp = diff(p,x)
p, dp
```
The constraints are specified as follows; `solve` has no issue with this system of equations.
```{julia}
eqs = (p(x=>a₀) ~ 0,
p(x=>b₀) ~ 0,
p(x=>c₀) ~ 1,
dp(x=>c₀) ~ 0)
d = solve(eqs, a)
q = p(d...)
```
We can plot $q$ and emphasize the three points with:
```{julia}
xlims = (-0.5, 3.5)
plot(q; xlims, legend=false)
scatter!([a₀, b₀, c₀], [0,0,1]; marker=(5, 0.25))
```
We now make a plot of the level curve $F(x,y)=0$ using `contour` and the constraint that $b>c$ to graphically identify $C(b)$:
```{julia}
dq = diff(q, x)
λ(b,c) = b > c ? (q(b) - q(a₀)) / (b - a₀) - dq(c) : -Inf
bs = cs = range(0.5,3.5, 100)
plot(; legend=false)
contour!(bs, cs, λ; levels=[0])
plot!(identity; line=(1, 0.25))
scatter!([b₀], [c₀]; marker=(5, 0.25))
```
The curve that passes through the point $(3,1)$ is clearly continuous, and following it, we see continuous changes in $b$ result in continuous changes in $c$.
Following a behind-the-scenes blog post by [Lowry-Duda](https://davidlowryduda.com/choosing-functions-for-mvt-abscissa/) we wrap some of the above into a function to find a polynomial given a set of conditions on values for its self or its derivatives at a point.
```{julia}
function _interpolate(conds; x=x)
np1 = length(conds)
n = np1 - 1
as = [Sym("a$i") for i in 0:n]
p = sum(as[i+1] * x^i for i in 0:n)
# set p⁽ᵏ⁾(xᵢ) = v
eqs = Tuple(diff(p, x, k)(x => xᵢ) ~ v for (xᵢ, k, v) ∈ conds)
soln = solve(eqs, as)
p(soln...)
end
# sets p⁽⁰⁾(a₀) = 0, p⁽⁰⁾(b₀) = 0, p⁽⁰⁾(c₀) = 1, p⁽¹⁾(c₀) = 0
basic_conditions = [(a₀,0,0), (b₀,0,0), (c₀,0,1), (c₀,1,0)]
_interpolate(basic_conditions; x)
```
Before moving on, polynomial interpolation can suffer from the Runge phenomenon, where there can be severe oscillations between the points. To tamp these down, an additional *control* point is added which is adjusted to minimize the size of the derivative through the value $\int \| f'(x) \|^2 dx$ (the $L_2$ norm of the derivative):
```{julia}
function interpolate(conds)
@syms x, D
# set f'(2) = D, then adjust D to minimize L₂ below
new_conds = vcat(conds, [(2, 1, D)])
p = _interpolate(new_conds; x)
# measure size of p with ∫₀⁴f'(x)^2 dx
dp = diff(p, x)
L₂ = integrate(dp^2, (x, 0, 4))
dL₂ = diff(L₂, D)
soln = first(solve(dL₂ ~ 0, D)) # critical point to minimum L₂
p(D => soln)
end
q = interpolate(basic_conditions)
```
We also make a plotting function to show both `q` and the level curve of `F`:
```{julia}
function plot_q_level_curve(q; title="", layout=[1;1])
x = only(free_symbols(q)) # fish out x
dq = diff(q, x)
xlims = ylims = (-0.5, 4.5)
p₁ = plot(; xlims, ylims, title,
legend=false, aspect_ratio=:equal)
plot!(p₁, q; xlims, ylims)
scatter!(p₁, [a₀, b₀, c₀], [0,0,1]; marker=(5, 0.25))
λ(b,c) = b > c ? (q(b) - q(a₀)) / (b - a₀) - dq(c) : -Inf
bs = cs = range(xlims..., 100)
p₂ = plot(; xlims, ylims, legend=false, aspect_ratio=:equal)
contour!(p₂, bs, cs, λ; levels=[0])
plot!(p₂, identity; line=(1, 0.25))
scatter!(p₂, [b₀], [c₀]; marker=(5, 0.25))
plot(p₁, p₂; layout)
end
```
```{julia}
plot_q_level_curve(q; layout=(1,2))
```
Like previously, this highlights the presence of a continuous function in $b$ yielding $c$.
This is not the only possibility. Another such from their paper (Figure 3) looks like the following where some additional constraints are added ($f''(c_0) = 0, f'''(c_0)=3, f'(b_0)=-3$):
```{julia}
new_conds = [(c₀, 2, 0), (c₀, 3, 3), (b₀, 1, -3)]
q = interpolate(vcat(basic_conditions, new_conds))
plot_q_level_curve(q;layout=(1,2))
```
For this shape, if $b$ increases away from $b_0$, the secant line connecting $(a_0,0)$ and $(b, f(b)$ will have a negative slope, but there are no points nearby $x=c_0$ where the derivative has a tangent line with negative slope, so the continuous function is only on the left side of $b_0$. Mathematically, as $f$ is increasing $c_0$---as $f'''(c_0) = 3 > 0$---and $f$ is decreasing at $f(b_0)$---as $f'(b_0) = -1 < 0$, the signs alone suggest the scenario. The contour plot reveals, not one, but two one-sided functions of $b$ giving $c$.
---
Now to characterize all possibilities.
Suppose $F(x,y)$ is differentiable. Then $F(x,y)$ has this approximation (where $F_x$ and $F_y$ are the partial derivatives):
$$
F(x,y) \approx F(x_0,y_0) + F_x(x_0,y_0) (x - x_0) + F_y(x_0,y_0) (y-y_0)
$$
If $(x_0,y_0)$ is a zero of $F$, then the above can be solved for $y$ assuming $F_y$ does not vanish:
$$
y \approx y_0 - \frac{F_x(x_0, y_0)}{F_y(x_0, y_0)} \cdot (x - x_0)
$$
The main tool used in the authors' investigation is the implicit function theorem. The implicit function theorem states there is some function continuously describing $y$, not just approximately, under the above assumption of $F_y$ not vanishing.
Again, with $F(b,c) = (f(b) - f(a_0)) / (b -a_0) - f'(c)$ and assuming $f$ has at least two continuous derivatives, then:
$$
\begin{align*}
F(b_0,c_0) &= 0,\\
F_c(b_0, c_0) &= -f''(c_0).
\end{align*}
$$
Assuming $f''(c_0)$ is *non*-zero, then this proves that if $b$ moves continuously, a corresponding solution to the mean value theorem will as well, or there is a continuous function $C(b)$ with $F(b,C(b)) = 0$.
Further, they establish if $f'(b_0) \neq f'(c_0)$ then there is a continuous $B(c)$ near $c_0$ such that $F(B(c),c) = 0$; and that there are no other nearby solutions to $F(b,c)=0$ near $(b_0, c_0)$.
This leaves for consideration the possibilities when $f''(c_0) = 0$ and $f'(b_0) = f'(c_0)$.
One such possibility looks like:
```{julia}
new_conds = [(c₀, 2, 0), (c₀, 3, 3), (b₀, 1, 0), (b₀, 2, 3)]
q = interpolate(vcat(basic_conditions, new_conds))
plot_q_level_curve(q;layout=(1,2))
```
This picture shows more than one possible choice for a continuous function, as the contour plot has this looping intersection point at $(b_0,c_0)$.
To characterize possible behaviors, the authors recall the [Morse lemma](https://en.wikipedia.org/wiki/Morse_theory) applied to functions $f:R^2 \rightarrow R$ with vanishing gradient, but non-vanishing Hession. This states that after some continuous change of coordinates, $f$ looks like $\pm u^2 \pm v^2$. Only this one-dimensional Morse lemma (and a generalization) is required for this analysis:
> if $g(x)$ is three-times continuously differentiable with $g(x_0) = g'(x_0) = 0$ but $g''(x_0) \neq 0$ then *near* $x_0$ $g(x)$ can be transformed through a continuous change of coordinates to look like $\pm u^2$, where the sign is the sign of the second derivative of $g$.
That is, locally the function can be continuously transformed into a parabola opening up or down depending on the sign of the second derivative. Their proof starts with Taylor's remainder theorem to find a candidate for the change of coordinates and shows with the implicit function theorem this is a viable change.
Setting:
$$
\begin{align*}
g_1(b) &= (f(b) - f(a_0))/(b - a_0) - f'(c_0)\\
g_2(c) &= f'(c) - f'(c_0).
\end{align*}
$$
Then $F(c, b) = g_1(b) - g_2(c)$.
By construction, $g_2(c_0) = 0$ and $g_2^{(k)}(c_0) = f^{(k+1)}(c_0)$,
Adjusting $f$ to have a vanishing second---but not third---derivative at $c_0$ means $g_2$ will satisfy the assumptions of the lemma assuming $f$ has at least four continuous derivatives (as all our example polynomials do).
As for $g_1$, we have by construction $g_1(b_0) = 0$. By differentiation we get a pattern for some constants $c_j = (j+1)\cdot(j+2)\cdots \cdot k$ with $c_k = 1$.
$$
g^{(k)}(b) = k! \cdot \frac{f(a_0) - f(b)}{(a_0-b)^{k+1}} - \sum_{j=1}^k c_j \frac{f^{(j)}(b)}{(a_0 - b)^{k-j+1}}.
$$
Of note that when $f(a_0) = f(b_0) = 0$ that if $f^{(k)}(b_0)$ is the first non-vanishing derivative of $f$ at $b_0$ that $g^{(k)}(b_0) = f^{(k)}(b_0)/(b_0 - a_0)$ (they have the same sign).
In particular, if $f(a_0) = f(b_0) = 0$ and $f'(b_0)=0$ and $f''(b_0)$ is non-zero, the lemma applies to $g_1$, again assuming $f$ has at least four continuous derivatives.
Let $\sigma_1 = \text{sign}(f''(b_0))$ and $\sigma_2 = \text{sign}(f'''(c_0))$, then we have $F(b,c) = \sigma_1 u^2 - \sigma_2 v^2$ after some change of variables. The authors conclude:
* If $\sigma_1$ and $\sigma_2$ have different signs, then $F(b,c) = 0$ is like $u^2 = - v^2$ which has only one isolated solution, as the left hand side and right hand sign will have different signs except when $0$.
* If $\sigma_1$ and $\sigma_2$ have the same sign, then $F(b,c) = 0$ is like $u^2 = v^2$ which has two solutions $u = \pm v$.
Applied to the problem at hand:
* if $f''(b_0)$ and $f'''(c_0)$ have different signs, the $c_0$ can not be extended to a continuous function near $b_0$.
* if the two have the same sign, then there are two such functions possible.
```{julia}
conds₁ = [(b₀,1,0), (b₀,2,3), (c₀,2,0), (c₀,3,-3)]
conds₂ = [(b₀,1,0), (b₀,2,3), (c₀,2,0), (c₀,3, 3)]
q₁ = interpolate(vcat(basic_conditions, conds₁))
q₂ = interpolate(vcat(basic_conditions, conds₂))
p₁ = plot_q_level_curve(q₁)
p₂ = plot_q_level_curve(q₂)
plot(p₁, p₂; layout=(1,2))
```
There are more possibilities, as pointed out in the article.
Say a function, $h$, has *a zero of order $k$ at $x_0$* if the first $k-1$ derivatives of $h$ are zero at $x_0$, but that $h^{(k)}(x_0) \neq 0$. Now suppose $f$ has order $k$ at $b_0$ and order $l$ at $c_0$. Then $g_1$ will be order $k$ at $b_0$ and $g_2$ will have order $l-1$ at $c_0$. In the above, we had orders $2$ and $3$ respectively.
A generalization of the Morse lemma to the function, $h$ having a zero of order $k$ at $x_0$ is $h(x) = \pm u^k$ where if $k$ is odd either sign is possible and if $k$ is even, then the sign is that of $h^{(k)}(x_0)$.
With this, we get the following possibilities for $f$ with a zero of order $k$ at $b_0$ and $l$ at $c_0$:
* If $l$ is even, then there is one continuous solution near $(b_0,c_0)$
* If $l$ is odd and $k$ is even and $f^{(k)}(b_0)$ and $f^{(l)}(c_0)$ have the *same* sign, then there are two continuous solutions
* If $l$ is odd and $k$ is even and $f^{(k)}(b_0)$ and $f^{(l)}(c_0)$ have *opposite* signs, the $(b_0, c_0)$ is an isolated solution.
* If $l$ is add and $k$ is odd, then there are two continuous solutions, but only defined in a a one-sided neighborhood of $b_0$ where $f^{(k)}(b_0) f^{(l)}(c_0) (b - b_0) > 0$.
To visualize these four cases, we take $(l=2,k=1)$, $(l=3, k=2)$ (twice) and $(l=3, k=3)$.
```{julia}
condsₑ = [(c₀,2,3), (b₀,1,-3)]
condsₒₑ₊₊ = [(c₀,2,0), (c₀,3, 10), (b₀,1,0), (b₀,2,10)]
condsₒₑ₊₋ = [(c₀,2,0), (c₀,3,-20), (b₀,1,0), (b₀,2,20)]
condsₒₒ = [(c₀,2,0), (c₀,3,-20), (b₀,1,0), (b₀,2, 0), (b₀,3, 20)]
qₑ = interpolate(vcat(basic_conditions, condsₑ))
qₒₑ₊₊ = interpolate(vcat(basic_conditions, condsₒₑ₊₊))
qₒₑ₊₋ = interpolate(vcat(basic_conditions, condsₒₑ₊₋))
qₒₒ = interpolate(vcat(basic_conditions, condsₒₒ))
p₁ = plot_q_level_curve(qₑ; title = "(e,.)")
p₂ = plot_q_level_curve(qₒₑ₊₊; title = "(o,e,same)")
p₃ = plot_q_level_curve(qₒₑ₊₋; title = "(o,e,different)")
p₄ = plot_q_level_curve(qₒₒ; title = "(o,o)")
plot(p₁, p₂, p₃, p₄; layout=(1,4))
```
This handles most cases, but leaves the possibility that a function with infinite vanishing derivatives to consider. We steer the interested reader to the article for thoughts on that.
## Questions
##### Question
```{julia}
#| echo: false
gr()
p1 = vectorfieldplot((x,y) -> [x,y], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="A");
p2 = vectorfieldplot((x,y) -> [x-y,x], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9,title="B");
p3 = vectorfieldplot((x,y) -> [y,0], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="C");
p4 = vectorfieldplot((x,y) -> [-y,x], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="D");
plot(p1, p2, p3, p4; layout=[2,2])
```
In the above figure, match the function with the vector field plot.
```{julia}
#| echo: false
plotly()
matchq(("`F(x,y)=[-y ,x]`", "`F(x,y)=[y,0]`",
"`F(x,y)=[x-y,x]`", "`F(x,y)=[x,y]`"),
("A", "B", "C", "D"),
(4,3,2,1);
label="For each function mark the correct vector field plot"
)
```
###### Question

View File

@@ -38,9 +38,11 @@ A function $\vec{f}: R \rightarrow R^n$, $n > 1$ is called a vector-valued funct
$$
\vec{f}(t) = \langle \sin(t), 2\cos(t) \rangle, \quad
\vec{g}(t) = \langle \sin(t), \cos(t), t \rangle, \quad
\vec{h}(t) = \langle 2, 3 \rangle + t \cdot \langle 1, 2 \rangle.
\begin{align*}
\vec{f}(t) &= \langle \sin(t), 2\cos(t) \rangle, \\
\vec{g}(t) &= \langle \sin(t), \cos(t), t \rangle, \\
\vec{h}(t) &= \langle 2, 3 \rangle + t \cdot \langle 1, 2 \rangle.\\
\end{align*}
$$
The components themselves are also functions of $t$, in this case univariate functions. Depending on the context, it can be useful to view vector-valued functions as a function that returns a vector, or a vector of the component functions.
@@ -104,24 +106,25 @@ However, we will use a different approach, as the component functions are not n
In `Plots`, the command `plot(xs, ys)`, where, say, `xs=[x1, x2, ..., xn]` and `ys=[y1, y2, ..., yn]`, will make a connect-the-dot plot between corresponding pairs of points. As previously discussed, this can be used as an alternative to plotting a function through `plot(f, a, b)`: first make a set of $x$ values, say `xs=range(a, b, length=100)`; then the corresponding $y$ values, say `ys = f.(xs)`; and then plotting through `plot(xs, ys)`.
Similarly, were a third vector, `zs`, for $z$ components used, `plot(xs, ys, zs)` will make a $3$-dimensional connect the dot plot
Similarly, were a third vector, `zs`, for $z$ components used, `plot(xs, ys, zs)` will make a $3$-dimensional connect the dot plot.
However, our representation of vector-valued functions naturally generates a vector of points: `[[x1,y1], [x2, y2], ..., [xn, yn]]`, as this comes from broadcasting `f` over some time values. That is, for a collection of time values, `ts` the command `f.(ts)` will produce a vector of points. (Technically a vector of vectors, but points if you identify the $2$-$d$ vectors as points.)
However, our representation of vector-valued functions naturally generates a vector of points: `[[x1,y1], [x2, y2], ..., [xn, yn]]`, as this comes from broadcasting `f` over some time values. That is, for a collection of time values, `ts` the command `f.(ts)` will produce a vector of vectors. In `Plots`, a vector of *tuples* will be read as a vector of points and plotted accordingly. On the other hand, a vector of vectors is read in as a number of series, with each element being plotted separately. (That is, `[x1,y1]` maps to `plot!([1,2], [x1,y1])`.) To get the desired graph, *either* our function can return a tuple---which makes it clumsier to work with when manipulating the output---or we can turn a vector of points into two vectors---one with the `x` values, one with the `y` values.
To get the `xs` and `ys` from this is conceptually easy: just iterate over all the points and extract the corresponding component. For example, to get `xs` we would have a command like `[p[1] for p in f.(ts)]`. Similarly, the `ys` would use `p[2]` in place of `p[1]`. The `unzip` function from the `CalculusWithJulia` package does this for us. The name comes from how the `zip` function in base `Julia` takes two vectors and returns a vector of the values paired off. This is the reverse. As previously mentioned, `unzip` uses the `invert` function of the `SplitApplyCombine` package to invert the indexing (the $j$th component of the $i$th point can be referenced by `vs[i][j]` or `invert(vs)[j][i]`).
To get the `xs` and `ys` from this is conceptually easy: just iterate over all the points and extract the corresponding component. For example, to get `xs` we would have a command like `[p[1] for p in f.(ts)]`. Similarly, the `ys` would use `p[2]` in place of `p[1]`.
The `unzip` function from the `CalculusWithJulia` package does this for us. The name comes from how the `zip` function in base `Julia` takes two vectors and returns a vector of the values paired off. This is the reverse. As previously mentioned, `unzip` uses the `invert` function of the `SplitApplyCombine` package to invert the indexing (the $j$th component of the $i$th point can be referenced by `vs[i][j]` or `invert(vs)[j][i]`).
Visually, we have `unzip` performing this reassociation:
Visually, we have `unzip` performing this re-association:
```{verbatim}
[[x1, y1, z1], (⌈x1⌉, ⌈y1⌉, ⌈z1⌉,
[x2, y2, z2], |x2|, |y2|, |z2|,
[x3, y3, z3], --> |x3|, |y3|, |z3|,
[[x, y, z], (⌈x⌉, ⌈y⌉, ⌈z⌉,
[x, y, z], |x|, |y|, |z|,
[x, y, z], --> |x|, |y|, |z|,
⋮ ⋮
[xn, yn, zn]] ⌊xn⌋, ⌊yn⌋, ⌊zn⌋ )
[x, y, z]] ⌊x⌋, ⌊y⌋, ⌊z⌋ )
```
To turn a collection of vectors into separate arguments for a function, splatting (the `...`) is used.
@@ -175,6 +178,18 @@ ts = range(-2, 2, length=200)
plot(unzip(h.(ts))...)
```
### Using points, not vectors
As mentioned, there is an alternate manner to plot a vector-valued function that has some conveniences. This is to use a tuple to store the component values. For example:
```{julia}
g(t) = (cos(t) + 1/5 * cos(5t), sin(t) + 2/3*sin(3t))
ts = range(0, 2pi, 251)
plot(g.(ts); legend=false, aspect_ratio=:equal)
```
Broadcasting `g` creates a vector of tuples, which `Plots` treats as points. The drawback to this approach, as mentioned, is that manipulating the output is generally easily when the function output is a vector.
### The `plot_parametric` function
@@ -229,8 +244,8 @@ For example:
#| hold: true
a, ecc = 20, 3/4
f(t) = a*(1-ecc^2)/(1 + ecc*cos(t)) * [cos(t), sin(t)]
plot_parametric(0..2pi, f, legend=false)
scatter!([0],[0], markersize=4)
plot_parametric(0..2pi, f; legend=false)
scatter!([(0,0)]; markersize=4)
```
@@ -272,14 +287,14 @@ function spiro(t; r=2, R=5, rho=0.8*r)
cent(t) = (R-r) * [cos(t), sin(t)]
p = plot(legend=false, aspect_ratio=:equal)
circle!([0,0], R, color=:blue)
circle!(cent(t), r, color=:black)
p = plot(; legend=false, aspect_ratio=:equal)
circle!([0,0], R; linecolor=:blue)
circle!(cent(t), r; linecolor=:black)
tp(t) = -R/r * t
tp(t) = -R / r * t
s(t) = cent(t) + rho * [cos(tp(t)), sin(tp(t))]
plot_parametric!(0..t, s, color=:red)
plot_parametric!(0..t, s; linecolor=:red)
p
end
@@ -479,10 +494,15 @@ f(t) = [3cos(t), 2sin(t)]
t, Δt = pi/4, pi/16
df = f(t + Δt) - f(t)
plot(legend=false)
arrow!([0,0], f(t))
arrow!([0,0], f(t + Δt))
arrow!(f(t), df)
plot(; legend=false, aspect_ratio=:equal)
plot_parametric!(pi/5..3pi/8, f; line=(:red, 1))
arrow!([0,0], f(t); line=(:blue,))
arrow!([0,0], f(t + Δt); line=(:blue,))
arrow!(f(t), df; line=(:black, 3,0.5))
annotate!([(f(t)..., text("f(t)", :bottom, :left)),
(f(t+Δt)..., text("f(t + Δt)", :bottom, :left)),
((f(t) + df/2)..., text("df", :top, :right)),
])
```
The length of the difference appears to be related to the length of $\Delta t$, in a similar manner as the univariate derivative. The following limit defines the *derivative* of a vector-valued function:
@@ -514,9 +534,12 @@ We can visualize the tangential property through a graph:
```{julia}
#| hold: true
f(t) = [3cos(t), 2sin(t)]
p = plot_parametric(0..2pi, f, legend=false, aspect_ratio=:equal)
plot(; legend=false, aspect_ratio=:equal)
p = plot_parametric!(0..2pi, f; line=(:black, 1))
for t in [1,2,3]
arrow!(f(t), f'(t)) # add arrow with tail on curve, in direction of derivative
arrow!([0,0], f(t); line=(:gray, 1, 0.5))
annotate!((2f(t)/3)..., text("f($t)", :top, :left))
arrow!(f(t), f'(t); line=(:blue, 2)) # add arrow with tail on curve, in direction of derivative
end
p
```
@@ -528,8 +551,8 @@ Were symbolic expressions used in place of functions, the vector-valued function
```{julia}
@syms 𝒕
𝒗vf = [cos(𝒕), sin(𝒕), 𝒕]
@syms t
vvf = [cos(t), sin(t), t]
```
We will see working with these expressions is not identical to working with a vector-valued function.
@@ -539,7 +562,7 @@ To plot, we can avail ourselves of the the parametric plot syntax. The following
```{julia}
plot(𝒗vf..., 0, 2pi)
plot(vvf..., 0, 2pi)
```
The `unzip` usage, as was done above, could be used, but it would be more trouble in this case.
@@ -549,7 +572,7 @@ To evaluate the function at a given value, say $t=2$, we can use `subs` with bro
```{julia}
subs.(𝒗vf, 𝒕=>2)
subs.(vvf, t=>2)
```
Limits are performed component by component, and can also be defined by broadcasting, again with the need to adjust the values:
@@ -557,21 +580,21 @@ Limits are performed component by component, and can also be defined by broadcas
```{julia}
@syms Δ
limit.((subs.(𝒗vf, 𝒕 => 𝒕 + Δ) - 𝒗vf) / Δ, Δ => 0)
limit.((subs.(vvf, t => t + Δ) - vvf) / Δ, Δ => 0)
```
Derivatives, as was just done through a limit, are a bit more straightforward than evaluation or limit taking, as we won't bump into the shape mismatch when broadcasting:
```{julia}
diff.(𝒗vf, 𝒕)
diff.(vvf, t)
```
The second derivative, can be found through:
```{julia}
diff.(𝒗vf, 𝒕, 𝒕)
diff.(vvf, t, t) # or diff.(vvf, t, 2)
```
### Applications of the derivative
@@ -586,13 +609,13 @@ Here are some sample applications of the derivative.
The derivative of a vector-valued function is similar to that of a univariate function, in that it indicates a direction tangent to a curve. The point-slope form offers a straightforward parameterization. We have a point given through the vector-valued function and a direction given by its derivative. (After identifying a vector with its tail at the origin with the point that is the head of the vector.)
With this, the equation is simply $\vec{tl}(t) = \vec{f}(t_0) + \vec{f}'(t_0) \cdot (t - t_0)$, where the dot indicates scalar multiplication.
With this, the equation is simply $\vec{tl}(t) = \vec{f}(t_0) + \cdot (t - t_0) \vec{f}'(t_0) $, where the dot indicates scalar multiplication.
##### Example: parabolic motion
In physics, we learn that the equation $F=ma$ can be used to derive a formula for position, when acceleration, $a$, is a constant. The resulting equation of motion is $x = x_0 + v_0t + (1/2) at^2$. Similarly, if $x(t)$ is a vector-valued position vector, and the *second* derivative, $x''(t) =\vec{a}$, a constant, then we have: $x(t) = \vec{x_0} + \vec{v_0}t + (1/2) \vec{a} t^2$.
In physics, we learn that the equation $F=ma$ can be used to derive a formula for position, when acceleration, $a$, is a constant. The resulting equation of motion is $x(t) = x_0 + v_0t + (1/2) at^2$. Similarly, if $x(t)$ is a vector-valued position vector, and the *second* derivative, $x''(t) =\vec{a}$, a constant, then we have: $x(t) = \vec{x_0} + \vec{v_0}t + (1/2) \vec{a} t^2$.
For two dimensions, we have the force due to gravity acts downward, only in the $y$ direction. The acceleration is then $\vec{a} = \langle 0, -g \rangle$. If we start at the origin, with initial velocity $\vec{v_0} = \langle 2, 3\rangle$, then we can plot the trajectory until the object returns to ground ($y=0$) as follows:
@@ -606,7 +629,8 @@ xpos(t) = x0 + v0*t + (1/2)*a*t^2
t_0 = find_zero(t -> xpos(t)[2], (1/10, 100)) # find when y=0
plot_parametric(0..t_0, xpos)
plot(; legend=false)
plot_parametric!(0..t_0, xpos)
```
```{julia}
@@ -797,10 +821,10 @@ The dot being scalar multiplication by the derivative of the univariate function
Vector-valued functions do not have multiplication or division defined for them, so there are no ready analogues of the product and quotient rule. However, the dot product and the cross product produce new functions that may have derivative rules available.
For the dot product, the combination $\vec{f}(t) \cdot \vec{g}(t)$ we have a univariate function of $t$, so we know a derivative is well defined. Can it be represented in terms of the vector-valued functions? In terms of the component functions, we have this calculation specific to $n=2$, but that which can be generalized:
For the dot product, the combination $\vec{f}(t) \cdot \vec{g}(t)$ creates a univariate function of $t$, so we know a derivative is well defined. Can it be represented in terms of the vector-valued functions? In terms of the component functions, we have this calculation specific to $n=2$, but that which can be generalized:
$$
\begin{align*}
\frac{d}{dt}(\vec{f}(t) \cdot \vec{g}(t)) &=
\frac{d}{dt}(f_1(t) g_1(t) + f_2(t) g_2(t))\\
@@ -808,6 +832,7 @@ For the dot product, the combination $\vec{f}(t) \cdot \vec{g}(t)$ we have a uni
&= f_1'(t) g_1(t) + f_2'(t) g_2(t) + f_1(t) g_1'(t) + f_2(t) g_2'(t)\\
&= \vec{f}'(t)\cdot \vec{g}(t) + \vec{f}(t) \cdot \vec{g}'(t).
\end{align*}
$$
Suggesting that a product rule like formula applies for dot products.
@@ -839,11 +864,12 @@ diff.(uₛ × vₛ, tₛ) - (diff.(uₛ, tₛ) × vₛ + uₛ × diff.(vₛ, t
In summary, these two derivative formulas hold for vector-valued functions $R \rightarrow R^n$:
$$
\begin{align*}
(\vec{u} \cdot \vec{v})' &= \vec{u}' \cdot \vec{v} + \vec{u} \cdot \vec{v}',\\
(\vec{u} \times \vec{v})' &= \vec{u}' \times \vec{v} + \vec{u} \times \vec{v}'.
(\vec{u} \cdot \vec{v})' &= \vec{u}' \cdot \vec{v} + \vec{u} \cdot \vec{v}', \\
(\vec{u} \times \vec{v})' &= \vec{u}' \times \vec{v} + \vec{u} \times \vec{v}'\qaud (n=3).
\end{align*}
$$
##### Application. Circular motion and the tangent vector.
@@ -890,17 +916,21 @@ $$
\vec{F} = m \vec{a} = m \ddot{\vec{x}}.
$$
Combining, Newton states $\vec{a} = -(GM/r^2) \hat{x}$.
(The double dot is notation for two derivatives in a $t$ variable.)
Combining, Newton's law states $\vec{a} = -(GM/r^2) \hat{x}$.
Now to show the first law. Consider $\vec{x} \times \vec{v}$. It is constant, as:
$$
\begin{align*}
(\vec{x} \times \vec{v})' &= \vec{x}' \times \vec{v} + \vec{x} \times \vec{v}'\\
&= \vec{v} \times \vec{v} + \vec{x} \times \vec{a}.
&= \vec{v} \times \vec{v} + \vec{x} \times \vec{a}\\
&= 0.
\end{align*}
$$
Both terms are $\vec{0}$, as $\vec{a}$ is parallel to $\vec{x}$ by the above, and clearly $\vec{v}$ is parallel to itself.
@@ -912,34 +942,37 @@ This says, $\vec{x} \times \vec{v} = \vec{c}$ is a constant vector, meaning, the
Now, by differentiating $\vec{x} = r \hat{x}$ we have:
$$
\begin{align*}
\vec{v} &= \vec{x}'\\
&= (r\hat{x})'\\
&= r' \hat{x} + r \hat{x}',
\end{align*}
$$
and so
$$
\begin{align*}
\vec{c} &= \vec{x} \times \vec{v}\\
&= (r\hat{x}) \times (r'\hat{x} + r \hat{x}')\\
&= r^2 (\hat{x} \times \hat{x}').
\end{align*}
$$
From this, we can compute $\vec{a} \times \vec{c}$:
$$
\begin{align*}
\vec{a} \times \vec{c} &= (-\frac{GM}{r^2})\hat{x} \times r^2(\hat{x} \times \hat{x}')\\
&= -GM \hat{x} \times (\hat{x} \times \hat{x}') \\
&= GM (\hat{x} \times \hat{x}')\times \hat{x}.
\end{align*}
$$
The last line by anti-commutativity.
@@ -948,22 +981,24 @@ The last line by anti-commutativity.
But, the triple cross product can be simplified through the identify $(\vec{u}\times\vec{v})\times\vec{w} = (\vec{u}\cdot\vec{w})\vec{v} - (\vec{v}\cdot\vec{w})\vec{u}$. So, the above becomes:
$$
\begin{align*}
\vec{a} \times \vec{c} &= GM ((\hat{x}\cdot\hat{x})\hat{x}' - (\hat{x} \cdot \hat{x}')\hat{x})\\
&= GM (1 \hat{x}' - 0 \hat{x}).
\end{align*}
$$
Now, since $\vec{c}$ is constant, we have:
$$
\begin{align*}
(\vec{v} \times \vec{c})' &= (\vec{a} \times \vec{c})\\
&= GM \hat{x}'\\
&= (GM\hat{x})'.
\end{align*}
$$
The two sides have the same derivative, hence differ by a constant:
@@ -973,13 +1008,13 @@ $$
\vec{v} \times \vec{c} = GM \hat{x} + \vec{d}.
$$
As $\vec{x}$ and $\vec{v}\times\vec{c}$ lie in the same plane - orthogonal to $\vec{c}$ - so does $\vec{d}$. With a suitable re-orientation, so that $\vec{d}$ is along the $x$ axis, $\vec{c}$ is along the $z$-axis, then we have $\vec{c} = \langle 0,0,c\rangle$ and $\vec{d} = \langle d ,0,0 \rangle$, and $\vec{x} = \langle x, y, 0 \rangle$. Set $\theta$ to be the angle, then $\hat{x} = \langle \cos(\theta), \sin(\theta), 0\rangle$.
As $\vec{x}$ and $\vec{v}\times\vec{c}$ lie in the same plane---orthogonal to $\vec{c}$---so does $\vec{d}$. With a suitable re-orientation, so that $\vec{d}$ is along the $x$ axis, $\vec{c}$ is along the $z$-axis, then we have $\vec{c} = \langle 0,0,c\rangle$ and $\vec{d} = \langle d ,0,0 \rangle$, and $\vec{x} = \langle x, y, 0 \rangle$. Set $\theta$ to be the angle, then $\hat{x} = \langle \cos(\theta), \sin(\theta), 0\rangle$.
Now
$$
\begin{align*}
c^2 &= \|\vec{c}\|^2 \\
&= \vec{c} \cdot \vec{c}\\
@@ -989,9 +1024,10 @@ c^2 &= \|\vec{c}\|^2 \\
&= GMr + r \hat{x} \cdot \vec{d}\\
&= GMr + rd \cos(\theta).
\end{align*}
$$
Solving, this gives the first law. That is, the radial distance is in the form of an ellipse:
Solving for $r$, this gives the first law. That is, the radial distance is in the form of an ellipse:
$$
@@ -1222,23 +1258,25 @@ p
---
::: {.callout-note appearance="minimal"}
## Curvature of a space curve
The *curvature* of a $3$-dimensional space curve is defined by:
> *The curvature*: For a $3-D$ curve the curvature is defined by:
>
> $\kappa = \frac{\| r'(t) \times r''(t) \|}{\| r'(t) \|^3}.$
$$
\kappa = \frac{\| r'(t) \times r''(t) \|}{\| r'(t) \|^3}.
$$
For $2$-dimensional space curves, the same formula applies after embedding a $0$ third component. It can also be expressed directly as
For $2$-dimensional space curves, the same formula applies after embedding a $0$ third component. It simplifies to:
$$
\kappa = (x'y''-x''y')/\|r'\|^3. \quad (r(t) =\langle x(t), y(t) \rangle)
$$
Curvature can also be defined as derivative of the tangent vector, $\hat{T}$, *when* the curve is parameterized by arc length, a topic still to be taken up. The vector $\vec{r}'(t)$ is the direction of motion, whereas $\vec{r}''(t)$ indicates how fast and in what direction this is changing. For curves with little curve in them, the two will be nearly parallel and the cross product small (reflecting the presence of $\cos(\theta)$ in the definition). For "curvy" curves, $\vec{r}''$ will be in a direction opposite of $\vec{r}'$ to the $\cos(\theta)$ term in the cross product will be closer to $1$.
:::
Curvature can also be defined by the derivative of the tangent vector, $\hat{T}$, *when* the curve is parameterized by arc length, a topic still to be taken up. The vector $\vec{r}'(t)$ is the direction of motion, whereas $\vec{r}''(t)$ indicates how fast and in what direction this is changing. For curves with little curve in them, the two will be nearly parallel and the cross product small (reflecting the presence of $\sin(\theta)$ in the definition). For "curvy" curves, $\vec{r}''$ will be in a direction orthogonal of $\vec{r}'$ to the $\sin(\theta)$ term in the cross product will be closer to $1$.
Let $\vec{r}(t) = k \cdot \langle \cos(t), \sin(t), 0 \rangle$. This will have curvature:
@@ -1260,17 +1298,16 @@ If a curve is imagined to have a tangent "circle" (second order Taylor series ap
The [torsion](https://en.wikipedia.org/wiki/Torsion_of_a_curve), $\tau$, of a space curve ($n=3$), is a measure of how sharply the curve is twisting out of the plane of curvature.
::: {.callout-note appearance="minimal}
## Torsion of a space curve
The torsion is defined for smooth curves by
> *The torsion*:
>
> $\tau = \frac{(\vec{r}' \times \vec{r}'') \cdot \vec{r}'''}{\|\vec{r}' \times \vec{r}''\|^2}.$
$$
\tau = \frac{(\vec{r}' \times \vec{r}'') \cdot \vec{r}'''}{\|\vec{r}' \times \vec{r}''\|^2}.
$$
For the torsion to be defined, the cross product $\vec{r}' \times \vec{r}''$ must be non zero, that is the two must not be parallel or zero.
:::
##### Example: Tubular surface
@@ -1285,7 +1322,7 @@ This last example comes from a collection of several [examples](https://github.c
The task is to illustrate a space curve, $c(t)$, using a tubular surface. At each time point $t$, assume the curve has tangent, $e_1$; normal, $e_2$; and binormal, $e_3$. (This assumes the defining derivatives exist and are non-zero and the cross product in the torsion is non zero.) The tubular surface is a circle of radius $\epsilon$ in the plane determined by the normal and binormal. This curve would be parameterized by $r(t,u) = c(t) + \epsilon (e_2(t) \cdot \cos(u) + e_3(t) \cdot \sin(u))$ for varying $u$.
The Frenet-Serret equations setup a system of differential equations driven by the curvature and torsion. We use the `DifferentialEquations` package to solve this equation for two specific functions and a given initial condition. The equations when expanded into coordinates become $12$ different equations:
The Frenet-Serret equations describe the relationship between tangent, normal, and binormal vectors through a system of differential equations driven by the curvature and torsion. Their derivation will be discussed later, here we give an example usage by using the `DifferentialEquations` package to solve these equations for a specific curvature and torsion function and given initial conditions. The vector equations along with the relationship between the space curve and its tangent vector---when expanded into coordinates---become $12$ different equations:
```{julia}
@@ -1307,7 +1344,7 @@ function Frenet_eq!(du, u, p, s) #system of ODEs
end
```
The last set of equations describe the motion of the spine. It follows from specifying the tangent to the curve is $e_1$, as desired; it is parameterized by arc length, as $\mid c'(t) \mid = 1$.
The last set of equations describe the motion of the spine. It follows from specifying the tangent to the curve is $e_1$, as desired (it is parameterized by arc length, as $\lVert c'(t) \rVert = 1$).
Following the example of `@empet`, we define a curvature function and torsion function, the latter a constant:
@@ -1336,7 +1373,7 @@ prob = ODEProblem(Frenet_eq!, u0, t_span, (κ, τ))
sol = solve(prob, Tsit5());
```
The "spine" is the center axis of the tube and is the $10$th, $11$th, and $12$th coordinates:
The "spine" is the center axis of the tube and is described by the $10$th, $11$th, and $12$th coordinates:
```{julia}
@@ -1363,14 +1400,15 @@ ts_0 = range(a_0, b_0, length=251)
t_0 = (a_0 + b_0) / 2
ϵ = 1/5
plot_parametric(a_0..b_0, spine)
plot(; legend=false)
plot_parametric!(a_0..b_0, spine; line=(:black, 2))
arrow!(spine(t_0), e₁(t_0))
arrow!(spine(t_0), e₂(t_0))
arrow!(spine(t_0), e₃(t_0))
arrow!(spine(t_0), e₁(t_0); line=(:blue,))
arrow!(spine(t_0), e₂(t_0); line=(:red,))
arrow!(spine(t_0), e₃(t_0); line=(:green,))
r_0(t, θ) = spine(t) + ϵ * (e₂(t)*cos(θ) + e₃(t)*sin(θ))
plot_parametric!(0..2pi, θ -> r_0(t_0, θ))
plot_parametric!(0..2pi, θ -> r_0(t_0, θ); line=(:black, 1))
```
The `ϵ` value determines the radius of the tube; we see it above as the radius of the drawn circle. The function `r` for a fixed `t` traces out such a circle centered at a point on the spine. For a fixed `θ`, the function `r` describes a line on the surface of the tube paralleling the spine.
@@ -1397,12 +1435,15 @@ plotly();
In [Arc length](../integrals/arc_length.html) there is a discussion of how to find the arc length of a parameterized curve in $2$ dimensions. The general case is discussed by [Destafano](https://randomproofs.files.wordpress.com/2010/11/arc_length.pdf) who shows:
> *Arc-length*: if a curve $C$ is parameterized by a smooth function $\vec{r}(t)$ over an interval $I$, then the arc length of $C$ is:
>
> $$
> \int_I \| \vec{r}'(t) \| dt.
> $$
::: {.callout-note icon=false}
## Arc-length
If a curve $C$ is parameterized by a smooth function $\vec{r}(t)$ over an interval $I$, then the arc length of $C$ is:
$$
\int_I \| \vec{r}'(t) \| dt.
$$
:::
If we associate $\vec{r}'(t)$ with the velocity, then this is the integral of the speed (the magnitude of the velocity).
@@ -1463,7 +1504,7 @@ speed = simplify(norm(diff.(viviani(t, a), t)))
integrate(speed, (t, 0, 4*PI))
```
We see that the answer depends linearly on $a$, but otherwise is a constant expressed as an integral. We use `QuadGk` to provide a numeric answer for the case $a=1$:
We see that the answer depends linearly on $a$, but otherwise is a constant involving an integral. We use `QuadGk` to provide a numeric answer for the case $a=1$:
```{julia}
@@ -1519,12 +1560,13 @@ $$
As before, but further, we have if $\kappa$ is the curvature and $\tau$ the torsion, these relationships expressing the derivatives with respect to $s$ in terms of the components in the frame:
$$
\begin{align*}
\hat{T}'(s) &= &\kappa \hat{N}(s) &\\
\hat{N}'(s) &= -\kappa \hat{T}(s) & &+ \tau \hat{B}(s)\\
\hat{B}'(s) &= &-\tau \hat{N}(s) &
\end{align*}
$$
These are the [Frenet-Serret](https://en.wikipedia.org/wiki/Frenet%E2%80%93Serret_formulas) formulas.
@@ -1560,7 +1602,7 @@ This should be $\kappa \hat{N}$, so we do:
```{julia}
κₕ = norm(outₕ) |> simplify
Normₕ = outₕ / κₕ
κₕ, Normₕ
κₕ
```
Interpreting, $a$ is the radius of the circle and $b$ how tight the coils are. If $a$ gets much larger than $b$, then the curvature is like $1/a$, just as with a circle. If $b$ gets very big, then the trajectory looks more stretched out and the curvature gets smaller.
@@ -1637,18 +1679,19 @@ end
Levi and Tabachnikov prove in their Proposition 2.4:
$$
\begin{align*}
\kappa(u) &= \frac{d\alpha(u)}{du} + \frac{\sin(\alpha(u))}{a},\\
|\frac{dv}{du}| &= |\cos(\alpha)|, \quad \text{and}\\
k &= \frac{\tan(\alpha)}{a}.
\end{align*}
$$
The first equation relates the steering angle with the curvature. If the steering angle is not changed ($d\alpha/du=0$) then the curvature is constant and the motion is circular. It will be greater for larger angles (up to $\pi/2$). As the curvature is the reciprocal of the radius, this means the radius of the circular trajectory will be smaller. For the same constant steering angle, the curvature will be smaller for longer wheelbases, meaning the circular trajectory will have a larger radius. For cars, which have similar dynamics, this means longer wheelbase cars will take more room to make a U-turn.
The second equation may be interpreted in ratio of arc lengths. The infinitesimal arc length of the rear wheel is proportional to that of the front wheel only scaled down by $\cos(\alpha)$. When $\alpha=0$ - the bike is moving in a straight line - and the two are the same. At the other extreme - when $\alpha=\pi/2$ - the bike must be pivoting on its rear wheel and the rear wheel has no arc length. This cosine, is related to the speed of the back wheel relative to the speed of the front wheel, which was used in the initial differential equation.
The second equation may be interpreted in ratio of arc lengths. The infinitesimal arc length of the rear wheel is proportional to that of the front wheel only scaled down by $\cos(\alpha)$. When $\alpha=0$---the bike is moving in a straight line---and the two are the same. At the other extreme---when $\alpha=\pi/2$---the bike must be pivoting on its rear wheel and the rear wheel has no arc length. This cosine, is related to the speed of the back wheel relative to the speed of the front wheel, which was used in the initial differential equation.
The last equation, relates the curvature of the back wheel track to the steering angle of the front wheel. When $\alpha=\pm\pi/2$, the rear-wheel curvature, $k$, is infinite, resulting in a cusp (no circle with non-zero radius will approximate the trajectory). This occurs when the front wheel is steered orthogonal to the direction of motion. As was seen in previous graphs of the trajectories, a cusp can happen for quite regular front wheel trajectories.
@@ -1657,13 +1700,14 @@ The last equation, relates the curvature of the back wheel track to the steering
To derive the first one, we have previously noted that when a curve is parameterized by arc length, the curvature is more directly computed: it is the magnitude of the derivative of the tangent vector. The tangent vector is of unit length, when parametrized by arc length. This implies its derivative will be orthogonal. If $\vec{r}(t)$ is a parameterization by arc length, then the curvature formula simplifies as:
$$
\begin{align*}
\kappa(s) &= \frac{\| \vec{r}'(s) \times \vec{r}''(s) \|}{\|\vec{r}'(s)\|^3} \\
&= \frac{\| \vec{r}'(s) \times \vec{r}''(s) \|}{1} \\
&= \| \vec{r}'(s) \| \| \vec{r}''(s) \| \sin(\theta) \\
&= 1 \| \vec{r}''(s) \| 1 = \| \vec{r}''(s) \|.
\end{align*}
$$
So in the above, the curvature is $\kappa = \| \vec{F}''(u) \|$ and $k = \|\vec{B}''(v)\|$.
@@ -1691,7 +1735,7 @@ $$
It must be that the tangent line of $\vec{B}$ is parallel to $\vec{U} \cos(\alpha) + \vec{V} \sin(\alpha)$. To utilize this, we differentiate $\vec{B}$ using the facts that $\vec{U}' = -\kappa \vec{V}$ and $\vec{V}' = \kappa \vec{U}$. These coming from $\vec{U} = \vec{F}'$ and so it's derivative in $u$ has magnitude yielding the curvature, $\kappa$, and direction orthogonal to $\vec{U}$.
$$
\begin{align*}
\vec{B}'(u) &= \vec{F}'(u)
-a \vec{U}' \cos(\alpha) -a \vec{U} (-\sin(\alpha)) \alpha'
@@ -1703,16 +1747,17 @@ a (\kappa) \vec{U} \sin(\alpha) - a \vec{V} \cos(\alpha) \alpha' \\
+ a(\alpha' - \kappa) \sin(\alpha) \vec{U}
- a(\alpha' - \kappa) \cos(\alpha)\vec{V}.
\end{align*}
$$
Extend the $2$-dimensional vectors to $3$ dimensions, by adding a zero $z$ component, then:
$$
\begin{align*}
\vec{0} &= (\vec{U}
+ a(\alpha' - \kappa) \sin(\alpha) \vec{U}
+ a(\alpha' - \kappa) \cos(\alpha)\vec{V}) \times
- a(\alpha' - \kappa) \cos(\alpha)\vec{V}) \times
(\vec{U} \cos(\alpha) + \vec{V} \sin(\alpha)) \\
&= (\vec{U} \times \vec{V}) \sin(\alpha) +
a(\alpha' - \kappa) \sin(\alpha) \vec{U} \times \vec{V} \sin(\alpha) -
@@ -1721,6 +1766,7 @@ a(\alpha' - \kappa) \cos(\alpha)\vec{V} \times \vec{U} \cos(\alpha) \\
a(\alpha'-\kappa) \cos^2(\alpha)) \vec{U} \times \vec{V} \\
&= (\sin(\alpha) + a (\alpha' - \kappa)) \vec{U} \times \vec{V}.
\end{align*}
$$
The terms $\vec{U} \times\vec{U}$ and $\vec{V}\times\vec{V}$ being $\vec{0}$, due to properties of the cross product. This says the scalar part must be $0$, or
@@ -1733,7 +1779,7 @@ $$
As for the second equation, from the expression for $\vec{B}'(u)$, after setting $a(\alpha'-\kappa) = -\sin(\alpha)$:
$$
\begin{align*}
\|\vec{B}'(u)\|^2
&= \| (1 -\sin(\alpha)\sin(\alpha)) \vec{U} +\sin(\alpha)\cos(\alpha) \vec{V} \|^2\\
@@ -1742,9 +1788,10 @@ As for the second equation, from the expression for $\vec{B}'(u)$, after setting
&= \cos^2(\alpha)(\cos^2(\alpha) + \sin^2(\alpha))\\
&= \cos^2(\alpha).
\end{align*}
$$
From this $\|\vec{B}(u)\| = |\cos(\alpha)\|$. But $1 = \|d\vec{B}/dv\| = \|d\vec{B}/du \| \cdot |du/dv|$ and $|dv/du|=|\cos(\alpha)|$ follows.
From this $\|\vec{B}'(u)\| = |\cos(\alpha)|$. But $1 = \|d\vec{B}/dv\| = \|d\vec{B}/du \| \cdot |du/dv|$ and $|dv/du|=|\cos(\alpha)|$ follows.
@@ -1760,13 +1807,13 @@ Xₑ(t)= 2 * cos(t)
Yₑ(t) = sin(t)
rₑ(t) = [Xₑ(t), Yₑ(t)]
unit_vec(x) = x / norm(x)
plot(legend=false, aspect_ratio=:equal)
plot(; legend=false, aspect_ratio=:equal)
ts = range(0, 2pi, length=50)
for t in ts
Pₑ, Vₑ = rₑ(t), unit_vec([-Yₑ'(t), Xₑ'(t)])
plot_parametric!(-4..4, x -> Pₑ + x*Vₑ)
plot_parametric!(-4..4, x -> Pₑ + x*Vₑ; line=(:black, 1))
end
plot!(Xₑ, Yₑ, 0, 2pi, linewidth=5)
plot!(Xₑ, Yₑ, 0, 2pi, line=(:red, 5))
```
is that of an ellipse with many *normal* lines drawn to it. The normal lines appear to intersect in a somewhat diamond-shaped curve. This curve is the evolute of the ellipse. We can characterize this using the language of planar curves.
@@ -1778,11 +1825,12 @@ Consider a parameterization of a curve by arc-length, $\vec\gamma(s) = \langle u
Consider two nearby points $t$ and $t+\epsilon$ and the intersection of $l_t$ and $l_{t+\epsilon}$. That is, we need points $a$ and $b$ with: $l_t(a) = l_{t+\epsilon}(b)$. Setting the components equal, this is:
$$
\begin{align*}
u(t) - av'(t) &= u(t+\epsilon) - bv'(t+\epsilon) \\
v(t) + au'(t) &= v(t+\epsilon) + bu'(t+\epsilon).
\end{align*}
$$
This is a linear equation in two unknowns ($a$ and $b$) which can be solved. Here is the value for `a`:
@@ -1801,24 +1849,26 @@ out[a]
Letting $\epsilon \rightarrow 0$ we get an expression for $a$ that will describe the evolute at time $t$ in terms of the function $\gamma$. Looking at the expression above, we can see that dividing the *numerator* by $\epsilon$ and taking a limit will yield $u'(t)^2 + v'(t)^2$. If the *denominator* has a limit after dividing by $\epsilon$, then we can find the description sought. Pursuing this leads to:
$$
\begin{align*}
\frac{u'(t) v'(t+\epsilon) - v'(t) u'(t+\epsilon)}{\epsilon}
&= \frac{u'(t) v'(t+\epsilon) -u'(t)v'(t) + u'(t)v'(t)- v'(t) u'(t+\epsilon)}{\epsilon} \\
&= \frac{u'(t)(v'(t+\epsilon) -v'(t))}{\epsilon} + \frac{(u'(t)- u'(t+\epsilon))v'(t)}{\epsilon},
\end{align*}
$$
which in the limit will give $u'(t)v''(t) - u''(t) v'(t)$. All told, in the limit as $\epsilon \rightarrow 0$ we get
$$
\begin{align*}
a &= \frac{u'(t)^2 + v'(t)^2}{u'(t)v''(t) - v'(t) u''(t)} \\
&= 1/(\|\vec\gamma'\|\kappa) \\
&= 1/(\|\hat{T}\|\kappa) \\
&= 1/\kappa,
\end{align*}
$$
with $\kappa$ being the curvature of the planar curve. That is, the evolute of $\vec\gamma$ is described by:
@@ -1837,24 +1887,28 @@ Tangent(r, t) = unit_vec(r'(t))
Normal(r, t) = unit_vec((𝒕 -> Tangent(r, 𝒕))'(t))
curvature(r, t) = norm(r'(t) × r''(t) ) / norm(r'(t))^3
plot_parametric(0..2pi, t -> rₑ₃(t)[1:2], legend=false, aspect_ratio=:equal)
plot_parametric!(0..2pi, t -> (rₑ₃(t) + Normal(rₑ₃, t)/curvature(rₑ₃, t))[1:2])
plot(; legend=false, aspect_ratio=:equal, xlims=(-6,6), ylims=(-5,5))
plot_parametric!(0..2pi, t -> rₑ₃(t)[1:2]; line=(:red,5))
plot_parametric!(0..2pi, t -> (rₑ₃(t) + Normal(rₑ₃, t)/curvature(rₑ₃, t))[1:2];
line=(:black, 1))
```
We computed the above illustration using $3$ dimensions (hence the use of `[1:2]...`) as the curvature formula is easier to express. Recall, the curvature also appears in the [Frenet-Serret](https://en.wikipedia.org/wiki/Frenet%E2%80%93Serret_formulas) formulas: $d\hat{T}/ds = \kappa \hat{N}$ and $d\hat{N}/ds = -\kappa \hat{T}+ \tau \hat{B}$. In a planar curve, as under consideration, the binormal is $\vec{0}$. This allows the computation of $\vec\beta(s)'$:
$$
\begin{align*}
\vec{\beta}' &= \frac{d(\vec\gamma + (1/ \kappa) \hat{N})}{ds}\\
&= \hat{T} + (-\frac{\kappa '}{\kappa ^2}\hat{N} + \frac{1}{\kappa} \hat{N}')\\
&= \hat{T} - \frac{\kappa '}{\kappa ^2}\hat{N} + \frac{1}{\kappa} (-\kappa \hat{T})\\
&= - \frac{\kappa '}{\kappa ^2}\hat{N}.
\end{align*}
$$
We see $\vec\beta'$ is zero (the curve is non-regular) when $\kappa'(s) = 0$. The curvature changes from increasing to decreasing, or vice versa at each of the $4$ crossings of the major and minor axes - there are $4$ non-regular points, and we see $4$ cusps in the evolute.
We see $\vec\beta'$ is zero (the curve is non-regular) when $\kappa'(s) = 0$. The curvature changes from increasing to decreasing, or vice versa at each of the $4$ crossings of the major and minor axes--there are $4$ non-regular points, and we see $4$ cusps in the evolute.
----
The curve parameterized by $\vec{r}(t) = 2(1 - \cos(t)) \langle \cos(t), \sin(t)\rangle$ over $[0,2\pi]$ is cardiod. It is formed by rolling a circle of radius $r$ around another similar sized circle. The following graphically shows the evolute is a smaller cardiod (one-third the size). For fun, the evolute of the evolute is drawn:
@@ -1869,10 +1923,10 @@ end
#| hold: trie
r(t) = 2*(1 - cos(t)) * [cos(t), sin(t), 0]
plot(legend=false, aspect_ratio=:equal)
plot_parametric!(0..2pi, t -> r(t)[1:2])
plot_parametric!(0..2pi, t -> evolute(r)(t)[1:2])
plot_parametric!(0..2pi, t -> ((evolute∘evolute)(r)(t))[1:2])
plot(; legend=false, aspect_ratio=:equal)
plot_parametric!(0..2pi, t -> r(t)[1:2]; line=(:black, 1))
plot_parametric!(0..2pi, t -> evolute(r)(t)[1:2]; line=(:red, 1))
plot_parametric!(0..2pi, t -> ((evolute∘evolute)(r)(t))[1:2]; line=(:blue,1))
```
---
@@ -1889,9 +1943,10 @@ a = t1
beta(r, t) = r(t) - Tangent(r, t) * quadgk(t -> norm(r'(t)), a, t)[1]
p = plot_parametric(-2..2, r, legend=false)
plot_parametric!(t0..t1, t -> beta(r, t))
for t in range(t0,-0.2, length=4)
p = plot(; legend=false)
plot_parametric!(-2..2, r; line=(:black, 1))
plot_parametric!(t0..t1, t -> beta(r, t); line=(:red,1))
for t in range(t0, -0.2, length=4)
arrow!(r(t), -Tangent(r, t) * quadgk(t -> norm(r'(t)), a, t)[1])
scatter!(unzip([r(t)])...)
end
@@ -1902,24 +1957,26 @@ This lends itself to this mathematical description, if $\vec\gamma(t)$ parameter
$$
\vec\beta(t) = \vec\gamma(t) + \left((a - \int_{t_0}^t \| \vec\gamma'(t)\| dt) \hat{T}(t)\right),
\vec\beta(t) = \vec\gamma(t) + \left(a - \int_{t_0}^t \| \vec\gamma'(t)\| dt\right) \hat{T}(t),
$$
where $\hat{T}(t) = \vec\gamma'(t)/\|\vec\gamma'(t)\|$ is the unit tangent vector. The above uses two parameters ($a$ and $t_0$), but only one is needed, as there is an obvious redundancy (a point can *also* be expressed by $t$ and the shortened length of string). [Wikipedia](https://en.wikipedia.org/wiki/Involute) uses this definition for $a$ and $t$ values in an interval $[t_0, t_1]$:
$$
\vec\beta_a(t) = \vec\gamma(t) - \frac{\vec\gamma'(t)}{\|\vec\gamma'(t)\|}\int_a^t \|\vec\gamma'(t)\| dt.
\vec\beta_a(t) = \vec\gamma(t) -
\frac{\vec\gamma'(t)}{\|\vec\gamma'(t)\|}\int_a^t \|\vec\gamma'(t)\| dt.
$$
If $\vec\gamma(s)$ is parameterized by arc length, then this simplifies quite a bit, as the unit tangent is just $\vec\gamma'(s)$ and the remaining arc length just $(s-a)$:
$$
\begin{align*}
\vec\beta_a(s) &= \vec\gamma(s) - \vec\gamma'(s) (s-a) \\
&=\vec\gamma(s) - \hat{T}_{\vec\gamma}(s)(s-a).\quad (a \text{ is the arc-length parameter})
\end{align*}
$$
With this characterization, we see several properties:
@@ -1940,11 +1997,12 @@ $$
In the following we show that:
$$
\begin{align*}
\kappa_{\vec\beta_a}(s) &= 1/(s-a),\\
\hat{N}_{\vec\beta_a}(s) &= \hat{T}_{\vec\beta_a}'(s)/\|\hat{T}_{\vec\beta_a}'(s)\| = -\hat{T}_{\vec\gamma}(s).
\end{align*}
$$
The first shows in a different way that when $s=a$ the curve is not regular, as the curvature fails to exists. In the above figure, when the involute touches $\vec\gamma$, there will be a cusp.
@@ -1953,7 +2011,7 @@ The first shows in a different way that when $s=a$ the curve is not regular, as
With these two identifications and using $\vec\gamma'(s) = \hat{T}_{\vec\gamma(s)}$, we have the evolute simplifies to
$$
\begin{align*}
\vec\beta_a(s) + \frac{1}{\kappa_{\vec\beta_a}(s)}\hat{N}_{\vec\beta_a}(s)
&=
@@ -1962,20 +2020,22 @@ With these two identifications and using $\vec\gamma'(s) = \hat{T}_{\vec\gamma(s
\vec\gamma(s) + \hat{T}_{\vec\gamma}(s)(s-a) + \frac{1}{1/(s-a)} (-\hat{T}_{\vec\gamma}(s)) \\
&= \vec\gamma(s).
\end{align*}
$$
That is the evolute of an involute of $\vec\gamma(s)$ is $\vec\gamma(s)$.
That is, the evolute of an involute of $\vec\gamma(s)$ is $\vec\gamma(s)$.
We have:
$$
\begin{align*}
\beta_a(s) &= \vec\gamma - \vec\gamma'(s)(s-a)\\
\beta_a'(s) &= -\kappa_{\vec\gamma}(s)(s-a)\hat{N}_{\vec\gamma}(s)\\
\beta_a''(s) &= (-\kappa_{\vec\gamma}(s)(s-a))' \hat{N}_{\vec\gamma}(s) + (-\kappa_{\vec\gamma}(s)(s-a))(-\kappa_{\vec\gamma}\hat{T}_{\vec\gamma}(s)),
\end{align*}
$$
the last line by the Frenet-Serret formula for *planar* curves which show $\hat{T}'(s) = \kappa(s) \hat{N}$ and $\hat{N}'(s) = -\kappa(s)\hat{T}(s)$.
@@ -1984,11 +2044,12 @@ the last line by the Frenet-Serret formula for *planar* curves which show $\hat
To compute the curvature of $\vec\beta_a$, we need to compute both:
$$
\begin{align*}
\| \vec\beta' \|^3 &= |\kappa^3 (s-a)^3|\\
\| \vec\beta' \times \vec\beta'' \| &= |\kappa(s)^3 (s-a)^2|,
\end{align*}
$$
the last line using both $\hat{N}\times\hat{N} = \vec{0}$ and $\|\hat{N}\times\hat{T}\| = 1$. The curvature then is $\kappa_{\vec\beta_a}(s) = 1/(s-a)$.
@@ -2014,8 +2075,9 @@ speed = 2sin(t/2)
ex = r(t) - rp/speed * integrate(speed, t)
plot_parametric(0..4pi, r, legend=false)
plot_parametric!(0..4pi, u -> float.(subs.(ex, t .=> u)))
plot(; legend=false)
plot_parametric!(0..4pi, r; line=(:black, 1))
plot_parametric!(0..4pi, u -> float(subs.(ex, t => u)); line=(:blue, 1))
```
The expression `ex` is secretly `[t + sin(t), 3 + cos(t)]`, another cycloid.
@@ -2672,13 +2734,14 @@ radioq(choices, answ)
The evolute comes from the formula $\vec\gamma(T) - (1/\kappa(t)) \hat{N}(t)$. For hand computation, this formula can be explicitly given by two components $\langle X(t), Y(t) \rangle$ through:
$$
\begin{align*}
r(t) &= x'(t)^2 + y'(t)^2\\
k(t) &= x'(t)y''(t) - x''(t) y'(t)\\
X(t) &= x(t) - y'(t) r(t)/k(t)\\
Y(t) &= y(t) + x'(t) r(t)/k(t)
\end{align*}
$$
Let $\vec\gamma(t) = \langle t, t^2 \rangle = \langle x(t), y(t)\rangle$ be a parameterization of a parabola.
@@ -2776,7 +2839,7 @@ What is the resulting curve?
choices = [
"An astroid of the form ``c \\langle \\cos^3(t), \\sin^3(t) \\rangle``",
"An cubic parabola of the form ``\\langle ct^3, dt^2\\rangle``",
"An ellipse of the form ``\\langle a\\cos(t), b\\sin(t)``",
"An ellipse of the form ``\\langle a\\cos(t), b\\sin(t)\\rangle``",
"A cyloid of the form ``c\\langle t + \\sin(t), 1 - \\cos(t)\\rangle``"
]
answ = 1

File diff suppressed because it is too large Load Diff

View File

@@ -59,7 +59,7 @@ in a spirit similar to a section of a book. Just like a book, there
are try-it-yourself questions at the end of each page. All have a
limited number of self-graded answers. These notes borrow ideas from
many sources, for example @Strang, @Knill, @Schey, @Thomas,
@RogawskiAdams, several Wikipedia pages, and other sources.
@RogawskiAdams, @Angenent, several Wikipedia pages, and other sources.
These notes are accompanied by a `Julia` package `CalculusWithJulia`
that provides some simple functions to streamline some common tasks
@@ -77,7 +77,9 @@ These notes may be compiled into a `pdf` file through Quarto. As the result is r
-->
To *contribute* -- say by suggesting additional topics, correcting a
mistake, or fixing a typo -- click the "Edit this page" link and join the list of [contributors](https://github.com/jverzani/CalculusWithJuliaNotes.jl/graphs/contributors). Thanks to all contributors and a *very* special thanks to `@fangliu-tju` for their careful and most-appreciated proofreading.
mistake, or fixing a typo -- click the "Edit this page" link and join the list of [contributors](https://github.com/jverzani/CalculusWithJuliaNotes.jl/graphs/contributors). Thanks to all contributors.
A *very* special thanks goes out to `@fangliu-tju` for their careful and most-appreciated proofreading and error spotting spread over a series of PRs.
## Running Julia

View File

@@ -3,9 +3,15 @@ CalculusWithJulia = "a2e0e22d-7d4c-5312-9169-8b992201a882"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
HCubature = "19dc6840-f33b-545b-b366-655c7e3ffd49"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
ImplicitIntegration = "bc256489-3a69-4a66-afc4-127cc87e6182"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
PlotlyBase = "a03496cd-edff-5a9b-9e67-9cda94a718b5"
PlotlyKaleido = "f2990250-8cf9-495f-b13a-cce12b45703c"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuadGK = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -1,4 +1,4 @@
# The Gradient, Divergence, and Curl
# The gradient, divergence, and curl
{{< include ../_common_code.qmd >}}
@@ -105,7 +105,7 @@ annotate!([
(.5, -.1, "Δy"),
(1+.75dx, .1, "Δx"),
(1+dx+.1, .75, "Δz"),
(.5,.15,L"(x,y,z)"),
(.5,.15,"(x,y,z)"),
(.45,.6, "î"),
(1+.8dx, .7, "ĵ"),
(.8, 1+dy+.1, "k̂")
@@ -204,20 +204,20 @@ arrow!([1/2, 1-dx], .01 *[-1,0], linewidth=3, color=:blue)
arrow!([1-dx, 1/2], .01 *[0, 1], linewidth=3, color=:blue)
annotate!([
(0,-1/16,L"(x,y)"),
(1, -1/16, L"(x+\Delta{x},y)"),
(0, 1+1/16, L"(x,y+\Delta{y})"),
(1/2, 4dx, L"\hat{i}"),
(1/2, 1-4dx, L"-\hat{i}"),
(3dx, 1/2, L"-\hat{j}"),
(1-3dx, 1/2, L"\hat{j}")
(0,-1/16,"(x,y)"),
(1, -1/16, "(x+Δx,y)"),
(0, 1+1/16, "(x,y+Δy)"),
(1/2, 4dx, "î}"),
(1/2, 1-4dx, "-"),
(3dx, 1/2, "-"),
(1-3dx, 1/2, "ĵ")
])
```
Let $F=\langle F_x, F_y\rangle$. For small enough values of $\Delta{x}$ and $\Delta{y}$ the line integral, $\oint_C F\cdot d\vec{r}$ can be *approximated* by $4$ terms:
$$
\begin{align*}
\left(F(x,y) \cdot \hat{i}\right)\Delta{x} &+
\left(F(x+\Delta{x},y) \cdot \hat{j}\right)\Delta{y} +
@@ -230,6 +230,7 @@ F_x(x, y+\Delta{y}) (-\Delta{x}) + F_y(x,y) (-\Delta{y})\\
(F_y(x + \Delta{x}, y) - F_y(x, y))\Delta{y} -
(F_x(x, y+\Delta{y})-F_x(x,y))\Delta{x}.
\end{align*}
$$
The Riemann approximation allows a choice of evaluation point for Riemann integrable functions, and the choice here lends itself to further analysis. Were the above divided by $\Delta{x}\Delta{y}$, the area of the box, and a limit taken, partial derivatives appear to suggest this formula:
@@ -275,7 +276,7 @@ annotate!([
(.5, -.1, "Δy"),
(1+.75dx, .1, "Δx"),
(1+dx+.1, .75, "Δz"),
(.5,.15,L"(x,y,z)"),
(.5,.15,"(x,y,z)"),
(.45,.6, "î"),
(1+.8dx, .667, "ĵ"),
(.8, 1+dy+.067, "k̂"),
@@ -309,10 +310,10 @@ annotate!([
(.9, 1+dx, "C₁"),
(2*dx, 1/2, L"\hat{T}=\hat{i}"),
(1+2*dx,1/2, L"\hat{T}=-\hat{i}"),
(1/2,-3/2*dx, L"\hat{T}=\hat{j}"),
(1/2, 1+(3/2)*dx, L"\hat{T}=-\hat{j}"),
(2*dx, 1/2, "T̂=î"),
(1+2*dx,1/2, "T̂=-î"),
(1/2,-3/2*dx, "T̂= ĵ"),
(1/2, 1+(3/2)*dx, "T̂=-ĵ"),
(3dx,1-2dx, "(x,y,z+Δz)"),
(4dx,2dx, "(x+Δx,y,z+Δz)"),
@@ -326,18 +327,19 @@ p
Now we compute the *line integral*. Consider the top face, $S_1$, connecting $(x,y,z+\Delta z), (x + \Delta x, y, z + \Delta z), (x + \Delta x, y + \Delta y, z + \Delta z), (x, y + \Delta y, z + \Delta z)$, Using the *right hand rule*, parameterize the boundary curve, $C_1$, in a counter clockwise direction so the right hand rule yields the outward pointing normal ($\hat{k}$). Then the integral $\oint_{C_1} F\cdot \hat{T} ds$ is *approximated* by the following Riemann sum of $4$ terms:
$$
\begin{align*}
F(x,y, z+\Delta{z}) \cdot \hat{i}\Delta{x} &+ F(x+\Delta x, y, z+\Delta{z}) \cdot \hat{j} \Delta y \\
&+ F(x, y+\Delta y, z+\Delta{z}) \cdot (-\hat{i}) \Delta{x} \\
&+ F(x, y, z+\Delta{z}) \cdot (-\hat{j}) \Delta{y}.
\end{align*}
$$
(The points $c_i$ are chosen from the endpoints of the line segments.)
$$
\begin{align*}
\oint_{C_1} F\cdot \hat{T} ds
&\approx (F_y(x+\Delta x, y, z+\Delta{z}) \\
@@ -345,17 +347,19 @@ F(x,y, z+\Delta{z}) \cdot \hat{i}\Delta{x} &+ F(x+\Delta x, y, z+\Delta{z}) \cd
&- (F_x(x,y + \Delta{y}, z+\Delta{z}) \\
&- F_x(x, y, z+\Delta{z})) \Delta{x}
\end{align*}
$$
As before, were this divided by the *area* of the surface, we have after rearranging and cancellation:
$$
\begin{align*}
\frac{1}{\Delta{S_1}} \oint_{C_1} F \cdot \hat{T} ds &\approx
\frac{F_y(x+\Delta x, y, z+\Delta{z}) - F_y(x, y, z+\Delta{z})}{\Delta{x}}\\
&- \frac{F_x(x, y+\Delta y, z+\Delta{z}) - F_x(x, y, z+\Delta{z})}{\Delta{y}}.
\end{align*}
$$
In the limit, as $\Delta{S} \rightarrow 0$, this will converge to $\partial{F_y}/\partial{x}-\partial{F_x}/\partial{y}$.
@@ -367,7 +371,7 @@ Had the bottom of the box been used, a similar result would be found, up to a mi
Unlike the two dimensional case, there are other directions to consider and here the other sides will yield different answers. Consider now the face connecting $(x,y,z), (x+\Delta{x}, y, z), (x+\Delta{x}, y, z + \Delta{z})$, and $(x,y,z +\Delta{z})$ with outward pointing normal $-\hat{j}$. Let $S_2$ denote this face and $C_2$ describe its boundary. Orient this curve so that the right hand rule points in the $-\hat{j}$ direction (the outward pointing normal). Then, as before, we can approximate:
$$
\begin{align*}
\oint_{C_2} F \cdot \hat{T} ds
&\approx
@@ -378,6 +382,7 @@ F(x,y,z) \cdot \hat{i} \Delta{x} \\
&= (F_z(x+\Delta{x},y,z) - F_z(x, y, z))\Delta{z} -
(F_x(x,y,z+\Delta{z}) - F(x,y,z)) \Delta{x}.
\end{align*}
$$
Dividing by $\Delta{S}=\Delta{x}\Delta{z}$ and taking a limit will give:
@@ -401,16 +406,18 @@ $$
In short, depending on the face chosen, a different answer is given, but all have the same type.
::: {.callout-note icon=false}
## The curl
> Define the *curl* of a $3$-dimensional vector field $F=\langle F_x,F_y,F_z\rangle$ by:
>
> $$
> \text{curl}(F) =
> \langle \frac{\partial{F_z}}{\partial{y}} - \frac{\partial{F_y}}{\partial{z}},
> \frac{\partial{F_x}}{\partial{z}} - \frac{\partial{F_z}}{\partial{x}},
> \frac{\partial{F_y}}{\partial{x}} - \frac{\partial{F_x}}{\partial{y}} \rangle.
> $$
Define the *curl* of a $3$-dimensional vector field $F=\langle F_x,F_y,F_z\rangle$ by:
$$
\text{curl}(F) =
\langle \frac{\partial{F_z}}{\partial{y}} - \frac{\partial{F_y}}{\partial{z}},
\frac{\partial{F_x}}{\partial{z}} - \frac{\partial{F_z}}{\partial{x}},
\frac{\partial{F_y}}{\partial{x}} - \frac{\partial{F_x}}{\partial{y}} \rangle.
$$
:::
If $S$ is some surface with closed boundary $C$ oriented so that the unit normal, $\hat{N}$, of $S$ is given by the right hand rule about $C$, then
@@ -474,7 +481,7 @@ The divergence, gradient, and curl all involve partial derivatives. There is a n
This is a *vector differential operator* that acts on functions and vector fields through the typical notation to yield the three operations:
$$
\begin{align*}
\nabla{f} &= \langle
\frac{\partial{f}}{\partial{x}},
@@ -512,6 +519,7 @@ F_x & F_y & F_z
\end{bmatrix}
,\quad\text{the curl}.
\end{align*}
$$
:::{.callout-note}
@@ -842,12 +850,13 @@ Let $f$ and $g$ denote scalar functions, $R^3 \rightarrow R$ and $F$ and $G$ be
As with the sum rule of univariate derivatives, these operations satisfy:
$$
\begin{align*}
\nabla(f + g) &= \nabla{f} + \nabla{g}\\
\nabla\cdot(F+G) &= \nabla\cdot{F} + \nabla\cdot{G}\\
\nabla\times(F+G) &= \nabla\times{F} + \nabla\times{G}.
\end{align*}
$$
### Product rule
@@ -856,12 +865,13 @@ As with the sum rule of univariate derivatives, these operations satisfy:
The product rule $(uv)' = u'v + uv'$ has related formulas:
$$
\begin{align*}
\nabla{(fg)} &= (\nabla{f}) g + f\nabla{g} = g\nabla{f} + f\nabla{g}\\
\nabla\cdot{fF} &= (\nabla{f})\cdot{F} + f(\nabla\cdot{F})\\
\nabla\times{fF} &= (\nabla{f})\times{F} + f(\nabla\times{F}).
\end{align*}
$$
### Rules over cross products
@@ -870,12 +880,13 @@ The product rule $(uv)' = u'v + uv'$ has related formulas:
The cross product of two vector fields is a vector field for which the divergence and curl may be taken. There are formulas to relate to the individual terms:
$$
\begin{align*}
\nabla\cdot(F \times G) &= (\nabla\times{F})\cdot G - F \cdot (\nabla\times{G})\\
\nabla\times(F \times G) &= F(\nabla\cdot{G}) - G(\nabla\cdot{F}) + (G\cdot\nabla)F-(F\cdot\nabla)G\\
&= \nabla\cdot(BA^t - AB^t).
\end{align*}
$$
The curl formula is more involved.
@@ -921,7 +932,7 @@ Second,
This is not as clear, but can be seen algebraically as terms cancel. First:
$$
\begin{align*}
\nabla\cdot(\nabla\times{F}) &=
\langle
@@ -938,6 +949,7 @@ This is not as clear, but can be seen algebraically as terms cancel. First:
\left(\frac{\partial^2{F_x}}{\partial{z}\partial{y}} - \frac{\partial^2{F_z}}{\partial{x}\partial{y}}\right) +
\left(\frac{\partial^2{F_y}}{\partial{x}\partial{z}} - \frac{\partial^2{F_x}}{\partial{y}\partial{z}}\right)
\end{align*}
$$
Focusing on one component function, $F_z$ say, we see this contribution:
@@ -974,10 +986,10 @@ apoly!(ps, linewidth=3, color=:red)
ps = [[1,0],[1+dx, dy],[1+dx, 1+dy],[1,1]]
apoly!(ps, linewidth=3, color=:green)
annotate!(dx+.02, dy-0.05, L"P_1")
annotate!(0+0.05, 0 - 0.02, L"P_2")
annotate!(1+0.05, 0 - 0.02, L"P_3")
annotate!(1+dx+.02, dy-0.05, L"P_4")
annotate!(dx+.02, dy-0.05, "P")
annotate!(0+0.05, 0 - 0.02, "P")
annotate!(1+0.05, 0 - 0.02, "P")
annotate!(1+dx+.02, dy-0.05, "P")
p
```
@@ -1014,7 +1026,7 @@ This is because of how the line integrals are oriented so that the right-hand ru
The [invariance of charge](https://en.wikipedia.org/wiki/Maxwell%27s_equations#Charge_conservation) can be derived as a corollary of Maxwell's equation. The divergence of the curl of the magnetic field is $0$, leading to:
$$
\begin{align*}
0 &= \nabla\cdot(\nabla\times{B}) \\
&=
@@ -1024,6 +1036,7 @@ The [invariance of charge](https://en.wikipedia.org/wiki/Maxwell%27s_equations#C
&=
\mu_0(\nabla\cdot{J} + \frac{\partial{\rho}}{\partial{t}}).
\end{align*}
$$
That is $\nabla\cdot{J} = -\partial{\rho}/\partial{t}$. This says any change in the charge density in time ($\partial{\rho}/\partial{t}$) is balanced off by a divergence in the electric current density ($\nabla\cdot{J}$). That is, charge can't be created or destroyed in an isolated system.
@@ -1048,7 +1061,7 @@ $$
Without explaining why, these values can be computed using volume and surface integrals:
$$
\begin{align*}
\phi(\vec{r}') &=
\frac{1}{4\pi} \int_V \frac{\nabla \cdot F(\vec{r})}{\|\vec{r}'-\vec{r} \|} dV -
@@ -1056,16 +1069,18 @@ Without explaining why, these values can be computed using volume and surface in
A(\vec{r}') &= \frac{1}{4\pi} \int_V \frac{\nabla \times F(\vec{r})}{\|\vec{r}'-\vec{r} \|} dV +
\frac{1}{4\pi} \oint_S \frac{F(\vec{r})}{\|\vec{r}'-\vec{r} \|} \times \hat{N} dS.
\end{align*}
$$
If $V = R^3$, an unbounded domain, *but* $F$ *vanishes* faster than $1/r$, then the theorem still holds with just the volume integrals:
$$
\begin{align*}
\phi(\vec{r}') &=\frac{1}{4\pi} \int_V \frac{\nabla \cdot F(\vec{r})}{\|\vec{r}'-\vec{r} \|} dV\\
A(\vec{r}') &= \frac{1}{4\pi} \int_V \frac{\nabla \times F(\vec{r})}{\|\vec{r}'-\vec{r}\|} dV.
\end{align*}
$$
## Change of variable
@@ -1080,7 +1095,7 @@ Some details are [here](https://en.wikipedia.org/wiki/Curvilinear_coordinates),
We restrict to $n=3$ and use $(x,y,z)$ for Cartesian coordinates and $(u,v,w)$ for an *orthogonal* curvilinear coordinate system, such as spherical or cylindrical. If $\vec{r} = \langle x,y,z\rangle$, then
$$
\begin{align*}
d\vec{r} &= \langle dx,dy,dz \rangle = J \langle du,dv,dw\rangle\\
&=
@@ -1091,6 +1106,7 @@ d\vec{r} &= \langle dx,dy,dz \rangle = J \langle du,dv,dw\rangle\\
\frac{\partial{\vec{r}}}{\partial{v}} dv +
\frac{\partial{\vec{r}}}{\partial{w}} dw.
\end{align*}
$$
The term ${\partial{\vec{r}}}/{\partial{u}}$ is tangent to the curve formed by *assuming* $v$ and $w$ are constant and letting $u$ vary. Similarly for the other partial derivatives. Orthogonality assumes that at every point, these tangent vectors are orthogonal.
@@ -1138,7 +1154,7 @@ This uses orthogonality, so $\hat{e}_v \times \hat{e}_w$ is parallel to $\hat{e}
The volume element is found by *projecting* $d\vec{r}$ onto the $\hat{e}_u$, $\hat{e}_v$, $\hat{e}_w$ coordinate system through $(d\vec{r} \cdot\hat{e}_u) \hat{e}_u$, $(d\vec{r} \cdot\hat{e}_v) \hat{e}_v$, and $(d\vec{r} \cdot\hat{e}_w) \hat{e}_w$. Then forming the triple scalar product to compute the volume of the parallelepiped:
$$
\begin{align*}
\left[(d\vec{r} \cdot\hat{e}_u) \hat{e}_u\right] \cdot
\left(
@@ -1149,6 +1165,7 @@ The volume element is found by *projecting* $d\vec{r}$ onto the $\hat{e}_u$, $\h
&=
h_u h_v h_w du dv dw,
\end{align*}
$$
as the unit vectors are orthonormal, their triple scalar product is $1$ and $d\vec{r}\cdot\hat{e}_u = h_u du$, etc.
@@ -1214,7 +1231,7 @@ p
The tangent vectors found from the partial derivatives of $\vec{r}$:
$$
\begin{align*}
\frac{\partial{\vec{r}}}{\partial{r}} &=
\langle \cos(\theta) \cdot \sin(\phi), \sin(\theta) \cdot \sin(\phi), \cos(\phi)\rangle,\\
@@ -1223,12 +1240,13 @@ The tangent vectors found from the partial derivatives of $\vec{r}$:
\frac{\partial{\vec{r}}}{\partial{\phi}} &=
\langle r\cdot\cos(\theta)\cdot\cos(\phi), r\cdot\sin(\theta)\cdot\cos(\phi), -r\cdot\sin(\phi) \rangle.
\end{align*}
$$
With this, we have $h_r=1$, $h_\theta=r\sin(\phi)$, and $h_\phi = r$. So that
$$
\begin{align*}
dl &= \sqrt{dr^2 + (r\sin(\phi)d\theta)^2 + (rd\phi)^2},\\
dS_r &= r^2\sin(\phi)d\theta d\phi,\\
@@ -1236,6 +1254,7 @@ dS_\theta &= rdr d\phi,\\
dS_\phi &= r\sin(\phi)dr d\theta, \quad\text{and}\\
dV &= r^2\sin(\phi) drd\theta d\phi.
\end{align*}
$$
The following visualizes the volume and the surface elements.
@@ -1292,7 +1311,7 @@ p
If $f$ is a scalar function then $df = \nabla{f} \cdot d\vec{r}$ by the chain rule. Using the curvilinear coordinates:
$$
\begin{align*}
df &=
\frac{\partial{f}}{\partial{u}} du +
@@ -1303,6 +1322,7 @@ df &=
\frac{1}{h_v}\frac{\partial{f}}{\partial{v}} h_vdv +
\frac{1}{h_w}\frac{\partial{f}}{\partial{w}} h_wdw.
\end{align*}
$$
But, as was used above, $d\vec{r} \cdot \hat{e}_u = h_u du$, etc. so $df$ can be re-expressed as:

View File

@@ -13,6 +13,7 @@ plotly()
using QuadGK
using SymPy
using HCubature
import ImplicitIntegration
```
---
@@ -391,17 +392,19 @@ By "iterated" we mean performing two different definite integrals. For example,
The question then: under what conditions will the three integrals be equal?
::: {.callout-note icon=false}
## [Fubini](https://math.okstate.edu/people/lebl/osu4153-s16/chapter10-ver1.pdf)
> [Fubini](https://math.okstate.edu/people/lebl/osu4153-s16/chapter10-ver1.pdf). Let $R \times S$ be a closed rectangular region in $R^n \times R^m$. Suppose $f$ is bounded. Define $f_x(y) = f(x,y)$ and $f^y(x) = f(x,y)$ where $x$ is in $R^n$ and $y$ in $R^m$. *If* $f_x$ and $f^y$ are integrable then
>
> $$
> \iint_{R\times S}fdV = \iint_R \left(\iint_S f_x(y) dy\right) dx
> = \iint_S \left(\iint_R f^y(x) dx\right) dy.
> $$
Let $R \times S$ be a closed rectangular region in $R^n \times R^m$. Suppose $f$ is bounded. Define $f_x(y) = f(x,y)$ and $f^y(x) = f(x,y)$ where $x$ is in $R^n$ and $y$ in $R^m$. *If* $f_x$ and $f^y$ are integrable then
$$
\iint_{R\times S}fdV = \iint_R \left(\iint_S f_x(y) dy\right) dx
= \iint_S \left(\iint_R f^y(x) dx\right) dy.
$$
Similarly, if $f^y$ is integrable for all $y$, then $\iint_{R\times S}fdV =\iint_S \iint_R f(x,y) dx dy$.
:::
An immediate corollary is that the above holds for continuous functions when $R$ and $S$ are bounded, the case described here.
@@ -784,6 +787,120 @@ Compare to
sin(1)/2
```
### Integrating over implicitly defined regions
To use `HCubature` to find an integral over some region, that region is transformed into a rectanguler region and the Jacobian is used to modify the integrand. The `ImplicitIntegration` package allows the region to be implicitly defined (it need not be rectangular) and uses an algorithm to integrate over the region as given. It can integrate regions of the form $\phi(x) \leq 0$, that is it computes:
$$
\iint_{(x,y): \phi(x,y) \leq 0} f(x,y) dx dy.
$$
It can also integrate over over boundaries of the form $\phi(x) = 0$. The latter can be visualized through `implicit_plot`.
The main function from `ImplicitIntegration` is `integrate`. The package is imported below to avoid naming conflicts with `SymPy`'s `integrate` function:
```{julia}
import ImplicitIntegration
```
The unit circle (with radius $r=1$) can be parameterized with:
```{julia}
r = 1.0
phi(x) = sqrt(sum(xi^2 for xi in x)) - r
x0s, x1s = (-1.0, -1.0), (1.0, 1.0)
```
When a point is inside the disk centered at the origin of radius $r$, $\phi$ will be negative. The `phi` function takes a container describing a point.
We can visualize this region, with `plot_implicit`, though we need to make a function that takes two arguments to specify $x$ and $y$ and
rework the specification of the viewing window, as `ImplicitEquations.integrate` expects limits to be specified in a manner that readily accommodates higher dimensions, and the plotting function uses a more mathematical specification.
```{julia}
𝑖(xs...) = xs # turn (x,y) arguments into a container
xlims, ylims = collect(zip(x0s, x1s))
implicit_plot(phi∘𝑖; xlims, ylims, legend=false)
scatter!([pt for pt ∈ tuple.(range(xlims..., 40), range(ylims...,40)')
if phi(pt) < 0],
marker=(1,:blue))
```
The area of the unit circle is identified by integrating a function that is constantly $1$ over the region. This is computed by:
```{julia}
res = ImplicitIntegration.integrate(x -> 1.0, phi, x0s, x1s)
```
The result, `res`, is a structure containing details of the algorithm. The `val` property contains the result.
This compares, approximately, the computed value to the known value ($\pi \cdot r^2$):
```{julia}
res.val ≈ pi * r^2
```
To find the *perimeter* we use the fact that it is the surface of the disc:
```{julia}
res = ImplicitIntegration.integrate(x -> 1.0, phi, x0s, x1s; surface=true)
res.val ≈ 2pi * r
```
For two-dimensional regions, this provides an alternate means to calculate arc-lengths.
Extending the above, we can find the surface area of the upper hemisphere of the unit sphere. To do this, we integrate a different function over $\phi$, one that describes the sphere. This being the following
```{julia}
f(x) = sqrt(sum(xi^2 for xi in x)) # or LinearAlgebra.norm
res = ImplicitIntegration.integrate(f, phi, x0s, x1s; surface=true)
res.val ≈ (1/2) * 4pi * r^2
```
The volume would be similarly done, only without the `surface` call:
```{julia}
f(x) = sqrt(sum(xi^2 for xi in x))
res = ImplicitIntegration.integrate(f, phi, x0s, x1s;)
res.val ≈ (1/2) * 4/3 * pi * r^2
```
Of course more complicated functions could be used.
Now consider a more complicated region over $[-2\pi, 2\pi] \times [-2\pi, 2\pi]$:
```{julia}
function phi(x)
x1,x2 = x
x1*cos(x2)*cos(x1*x2) + x2*cos(x1)*cos(x1*x2) + x1*x2*cos(x1)*cos(x2)
end
x0s, x1s = (-2pi, -2pi), (2pi, 2pi)
xlims, ylims = collect(zip(x0s, x1s))
implicit_plot(phi∘𝑖; xlims, ylims, legend=false)
scatter!([pt for pt ∈ tuple.(range(xlims..., 40), range(ylims...,40)')
if phi(pt) < 0],
marker=(1,:blue))
```
In a slightly tricky way, a grid of points is created to indicate where `phi` is negative.
The area of the negative part of this function can be found by integrating the constant function with value $1$:
```{julia}
res = ImplicitIntegration.integrate(x -> 1.0, phi, x0s, x1s)
(val=res.val, proportion=res.val / (4pi)^2)
```
## Triple integrals
@@ -939,11 +1056,12 @@ In [Katz](http://www.jstor.org/stable/2689856) a review of the history of "chang
We view $R$ in two coordinate systems $(x,y)$ and $(u,v)$. We have that
$$
\begin{align*}
dx &= A du + B dv\\
dy &= C du + D dv,
\end{align*}
$$
where $A = \partial{x}/\partial{u}$, $B = \partial{x}/\partial{v}$, $C= \partial{y}/\partial{u}$, and $D = \partial{y}/\partial{v}$. Lagrange, following Euler, first sets $x$ to be constant (as is done in iterated integration). Hence, $dx = 0$ and so $du = -(B/A) dv$ and, after substitution, $dy = (D-C(B/A))dv$. Then Lagrange set $y$ to be a constant, so $dy = 0$ and hence $dv=0$ so $dx = Adu$. The area "element" $dx dy = A du \cdot (D - C(B/A)) dv = (AD - BC) du dv$. Since areas and volumes are non-negative, the absolute value is used. With this, we have "$dxdy = |AD-BC|du dv$" as the analog of $dx = g'(u) du$.
@@ -952,11 +1070,12 @@ where $A = \partial{x}/\partial{u}$, $B = \partial{x}/\partial{v}$, $C= \partial
The expression $AD - BC$ was also derived by Euler, by related means. Lagrange extended the analysis to 3 dimensions. Before doing so, it is helpful to understand the problem from a geometric perspective. Euler was attempting to understand the effects of the following change of variable:
$$
\begin{align*}
x &= a + mt + \sqrt{1-m^2} v\\
y & = b + \sqrt{1-m^2}t -mv
\end{align*}
$$
Euler knew this to be a clockwise *rotation* by an angle $\theta$ with $\cos(\theta) = m$, a *reflection* through the $x$ axis, and a translation by $\langle a, b\rangle$. All these *should* preserve the area represented by $dx dy$, so he was *expecting* $dx dy = dt dv$.
@@ -1090,13 +1209,15 @@ Using the fact that the two vectors involved are columns in the Jacobian of the
The absolute value of the determinant of the Jacobian is the multiplying factor that is seen in the change of variable formula for all dimensions:
::: {.callout-note icon=false}
## [Change of variable](https://en.wikipedia.org/wiki/Integration_by_substitution#Substitution_for_multiple_variables)
> [Change of variable](https://en.wikipedia.org/wiki/Integration_by_substitution#Substitution_for_multiple_variables) Let $U$ be an open set in $R^n$, $G:U \rightarrow R^n$ be an *injective* differentiable function with *continuous* partial derivatives. If $f$ is continuous and compactly supported, then
>
> $$
> \iint_{G(S)} f(\vec{x}) dV = \iint_S (f \circ G)(\vec{u}) |\det(J_G)(\vec{u})| dU.
> $$
Let $U$ be an open set in $R^n$, $G:U \rightarrow R^n$ be an *injective* differentiable function with *continuous* partial derivatives. If $f$ is continuous and compactly supported, then
$$
\iint_{G(S)} f(\vec{x}) dV = \iint_S (f \circ G)(\vec{u}) |\det(J_G)(\vec{u})| dU.
$$
:::
For the one-dimensional case, there is no absolute value, but there the interval is reversed, producing "negative" area. This is not the case here, where $S$ is parameterized to give positive volume.
@@ -1308,12 +1429,13 @@ What about other triangles, say the triangle bounded by $x=0$, $y=0$ and $y-x=1$
This can be seen as a reflection through the line $x=1/2$ of the triangle above. If $G_1$ represents the mapping from $U [0,1]\times[0,1]$ into the triangle of the last problem, and $G_2$ represents the reflection through the line $x=1/2$, then the transformation $G_2 \circ G_1$ will map the box $U$ into the desired region. By the chain rule, we have:
$$
\begin{align*}
\int_{(G_2\circ G_1)(U)} f dx &= \int_U (f\circ G_2 \circ G_1) |\det(J_{G_2 \circ G_1})| du \\
&=
\int_U (f\circ G_2 \circ G_1) |\det(J_{G_2}(G_1(u)))||\det(J_{G_1}(u))| du.
\end{align*}
$$
(In [Katz](http://www.jstor.org/stable/2689856) it is mentioned that Jacobi showed this in 1841.)
@@ -1671,9 +1793,9 @@ The Jacobian can be computed to be $\rho^2\sin(\phi)$.
```{julia}
#| hold: true
@syms ρ theta phi
G(ρ, theta, phi) = ρ * [sin(phi)*cos(theta), sin(phi)*sin(theta), cos(phi)]
det(G(ρ, theta, phi).jacobian([ρ, theta, phi])) |> simplify |> abs
@syms ρ θ ϕ
G(ρ, θ, ϕ) = ρ * [sin(ϕ)*cos(θ), sin(ϕ)*sin(θ), cos(ϕ)]
det(G(ρ, θ, ϕ).jacobian([ρ, θ, ϕ])) |> simplify |> abs
```
##### Example

View File

@@ -1,4 +1,4 @@
# Line and Surface Integrals
# Line and surface integrals
{{< include ../_common_code.qmd >}}
@@ -166,13 +166,14 @@ However, it proves more interesting to define an integral incorporating how prop
The canonical example is [work](https://en.wikipedia.org/wiki/Work_(physics)), which is a measure of a force times a distance. For an object following a path, the work done is still a force times a distance, but only that force in the direction of the motion is considered. (The *constraint force* keeping the object on the path does no work.) Mathematically, $\hat{T}$ describes the direction of motion along a path, so the work done in moving an object over a small segment of the path is $(F\cdot\hat{T}) \Delta{s}$. Adding up incremental amounts of work leads to a Riemann sum for a line integral involving a vector field.
::: {.callout-note icon=false}
## Work
> The *work* done in moving an object along a path $C$ by a force field, $F$, is given by the integral
>
> $$
> \int_C (F \cdot \hat{T}) ds = \int_C F\cdot d\vec{r} = \int_a^b ((F\circ\vec{r}) \cdot \frac{d\vec{r}}{dt})(t) dt.
> $$
The *work* done in moving an object along a path $C$ by a force field, $F$, is given by the integral
$$
\int_C (F \cdot \hat{T}) ds = \int_C F\cdot d\vec{r} = \int_a^b ((F\circ\vec{r}) \cdot \frac{d\vec{r}}{dt})(t) dt.
$$
:::
---
@@ -180,13 +181,15 @@ The canonical example is [work](https://en.wikipedia.org/wiki/Work_(physics)), w
In the $n=2$ case, there is another useful interpretation of the line integral. In this dimension the normal vector, $\hat{N}$, is well defined in terms of the tangent vector, $\hat{T}$, through a rotation: $\langle a,b\rangle^t = \langle b,-a\rangle$. (The negative, $\langle -b,a\rangle$ is also a candidate, the difference in this choice would lead to a sign difference in the answer.) This allows the definition of a different line integral, called a flow integral, as detailed later:
::: {.callout-note icon=false}
## Flow
> The *flow* across a curve $C$ is given by
>
> $$
> \int_C (F\cdot\hat{N}) ds = \int_a^b (F \circ \vec{r})(t) \cdot (\vec{r}'(t))^t dt.
> $$
The *flow* across a curve $C$ is given by
$$
\int_C (F\cdot\hat{N}) ds = \int_a^b (F \circ \vec{r})(t) \cdot (\vec{r}'(t))^t dt.
$$
:::
### Examples
@@ -296,9 +299,11 @@ using the Fundamental Theorem of Calculus.
The main point above is that *if* the vector field is the gradient of a scalar field, then the work done depends *only* on the endpoints of the path and not the path itself.
::: {.callout-note icon=false}
## Conservative vector field
> **Conservative vector field**: If $F$ is a vector field defined in an *open* region $R$; $A$ and $B$ are points in $R$ and *if* for *any* curve $C$ in $R$ connecting $A$ to $B$, the line integral of $F \cdot \vec{T}$ over $C$ depends *only* on the endpoint $A$ and $B$ and not the path, then the line integral is called *path indenpendent* and the field is called a *conservative field*.
If $F$ is a vector field defined in an *open* region $R$; $A$ and $B$ are points in $R$ and *if* for *any* curve $C$ in $R$ connecting $A$ to $B$, the line integral of $F \cdot \vec{T}$ over $C$ depends *only* on the endpoint $A$ and $B$ and not the path, then the line integral is called *path indenpendent* and the field is called a *conservative field*.
:::
The force of gravity is the gradient of a scalar field. As such, the two integrals above which yield $0$ could have been computed more directly. The particular scalar field is $f = -GMm/\|\vec{r}\|$, which goes by the name the gravitational *potential* function. As seen, $f$ depends only on magnitude, and as the endpoints of the path in the example have the same distance to the origin, the work integral, $(f\circ\vec{r})(b) - (f\circ\vec{r})(a)$ will be $0$.
@@ -335,7 +340,7 @@ W = integrate(F(r(t)) ⋅ T(r(t)), (t, 0, 2PI))
There are technical assumptions about curves and regions that are necessary for some statements to be made:
* Let $C$ be a [Jordan](https://en.wikipedia.org/wiki/Jordan_curve_theorem) curve - a non-self-intersecting continuous loop in the plane. Such a curve divides the plane into two regions, one bounded and one unbounded. The normal to a Jordan curve is assumed to be in the direction of the unbounded part.
* Let $C$ be a [Jordan](https://en.wikipedia.org/wiki/Jordan_curve_theorem) curve---a non-self-intersecting continuous loop in the plane. Such a curve divides the plane into two regions, one bounded and one unbounded. The normal to a Jordan curve is assumed to be in the direction of the unbounded part.
* Further, we will assume that our curves are *piecewise smooth*. That is comprised of finitely many smooth pieces, continuously connected.
* The region enclosed by a closed curve has an *interior*, $D$, which we assume is an *open* set (one for which every point in $D$ has some "ball" about it entirely within $D$ as well.)
* The region $D$ is *connected* meaning between any two points there is a continuous path in $D$ between the two points.
@@ -345,17 +350,19 @@ There are technical assumptions about curves and regions that are necessary for
### The fundamental theorem of line integrals
The fact that work in a potential field is path independent is a consequence of the Fundamental Theorem of Line [Integrals](https://en.wikipedia.org/wiki/Gradient_theorem):
The fact that work in a potential field is path independent is a consequence of
::: {.callout-note icon=false}
## The Fundamental Theorem of Line [Integrals](https://en.wikipedia.org/wiki/Gradient_theorem):
> Let $U$ be an open subset of $R^n$, $f: U \rightarrow R$ a *differentiable* function and $\vec{r}: R \rightarrow R^n$ a differentiable function such that the the path $C = \vec{r}(t)$, $a\leq t\leq b$ is contained in $U$. Then
>
> $$
> \int_C \nabla{f} \cdot d\vec{r} =
> \int_a^b \nabla{f}(\vec{r}(t)) \cdot \vec{r}'(t) dt =
> f(\vec{r}(b)) - f(\vec{r}(a)).
> $$
Let $U$ be an open subset of $R^n$, $f: U \rightarrow R$ a *differentiable* function and $\vec{r}: R \rightarrow R^n$ a differentiable function such that the the path $C = \vec{r}(t)$, $a\leq t\leq b$ is contained in $U$. Then
$$
\int_C \nabla{f} \cdot d\vec{r} =
\int_a^b \nabla{f}(\vec{r}(t)) \cdot \vec{r}'(t) dt =
f(\vec{r}(b)) - f(\vec{r}(a)).
$$
:::
That is, a line integral through a gradient field can be evaluated by evaluating the original scalar field at the endpoints of the curve. In other words, line integrals through gradient fields are conservative.
@@ -464,7 +471,7 @@ The flow integral is typically computed for a closed (Jordan) curve, measuring t
:::{.callout-note}
## Note
For a Jordan curve, the positive orientation of the curve is such that the normal direction (proportional to $\hat{T}'$) points away from the bounded interior. For a non-closed path, the choice of parameterization will determine the normal and the integral for flow across a curve is dependent - up to its sign - on this choice.
For a Jordan curve, the positive orientation of the curve is such that the normal direction (proportional to $\hat{T}'$) points away from the bounded interior. For a non-closed path, the choice of parameterization will determine the normal and the integral for flow across a curve is dependent---up to its sign---on this choice.
:::

View File

@@ -1,4 +1,4 @@
# Quick Review of Vector Calculus
# Quick review of vector calculus
{{< include ../_common_code.qmd >}}
@@ -99,12 +99,13 @@ In dimension $m=3$, the **binormal** vector, $\hat{B}$, is the unit vector $\hat
The [Frenet-Serret]() formulas define the **curvature**, $\kappa$, and the **torsion**, $\tau$, by
$$
\begin{align*}
\frac{d\hat{T}}{ds} &= & \kappa \hat{N} &\\
\frac{d\hat{N}}{ds} &= -\kappa\hat{T} & & + \tau\hat{B}\\
\frac{d\hat{B}}{ds} &= & -\tau\hat{N}&
\end{align*}
$$
These formulas apply in dimension $m=2$ with $\hat{B}=\vec{0}$.
@@ -122,16 +123,17 @@ The chain rule says $(\vec{r}(g(t))' = \vec{r}'(g(t)) g'(t)$.
A scalar function, $f:R^n\rightarrow R$, $n > 1$ has a **partial derivative** defined. For $n=2$, these are:
$$
\begin{align*}
\frac{\partial{f}}{\partial{x}}(x,y) &=
\lim_{h\rightarrow 0} \frac{f(x+h,y)-f(x,y)}{h}\\
\frac{\partial{f}}{\partial{y}}(x,y) &=
\lim_{h\rightarrow 0} \frac{f(x,y+h)-f(x,y)}{h}.
\end{align*}
$$
The generalization to $n>2$ is clear - the partial derivative in $x_i$ is the derivative of $f$ when the *other* $x_j$ are held constant.
The generalization to $n>2$ is clear---the partial derivative in $x_i$ is the derivative of $f$ when the *other* $x_j$ are held constant.
This may be viewed as the derivative of the univariate function $(f\circ\vec{r})(t)$ where $\vec{r}(t) = p + t \hat{e}_i$, $\hat{e}_i$ being the unit vector of all $0$s except a $1$ in the $i$th component.
@@ -356,7 +358,7 @@ $$
In two dimensions, we have the following interpretations:
$$
\begin{align*}
\iint_R dA &= \text{area of } R\\
\iint_R \rho dA &= \text{mass with constant density }\rho\\
@@ -364,12 +366,13 @@ In two dimensions, we have the following interpretations:
\frac{1}{\text{area}}\iint_R x \rho(x,y)dA &= \text{centroid of region in } x \text{ direction}\\
\frac{1}{\text{area}}\iint_R y \rho(x,y)dA &= \text{centroid of region in } y \text{ direction}
\end{align*}
$$
In three dimensions, we have the following interpretations:
$$
\begin{align*}
\iint_VdV &= \text{volume of } V\\
\iint_V \rho dV &= \text{mass with constant density }\rho\\
@@ -378,6 +381,7 @@ In three dimensions, we have the following interpretations:
\frac{1}{\text{volume}}\iint_V y \rho(x,y)dV &= \text{centroid of volume in } y \text{ direction}\\
\frac{1}{\text{volume}}\iint_V z \rho(x,y)dV &= \text{centroid of volume in } z \text{ direction}
\end{align*}
$$
To compute integrals over non-box-like regions, Fubini's theorem may be utilized. Alternatively, a **transformation** of variables

View File

@@ -1,4 +1,4 @@
# Green's Theorem, Stokes' Theorem, and the Divergence Theorem
# Green's theorem, Stokes' theorem, and the divergence theorem
{{< include ../_common_code.qmd >}}
@@ -109,7 +109,7 @@ p = plot(legend=false, xticks=nothing, yticks=nothing, border=:none, ylim=(-1/2,
for m in ms
drawf!(p, f, m, 0.9*dx/2)
end
annotate!([(ms[6]-dx/2,-0.3, L"x_{i-1}"), (ms[6]+dx/2,-0.3, L"x_{i}")])
annotate!([(ms[6]-dx/2,-0.3, "xᵢ₋₁}"), (ms[6]+dx/2,-0.3, "x")])
p
```
@@ -214,18 +214,20 @@ However, the microscopic boundary integrals have cancellations that lead to a ma
This all suggests that the flow integral around the surface of the larger region (the blue square) is equivalent to the integral of the curl component over the region. This is [Green](https://en.wikipedia.org/wiki/Green%27s_theorem)'s theorem, as stated by Wikipedia:
::: {.callout-note icon=false}
## Green's theorem
> **Green's theorem**: Let $C$ be a positively oriented, piecewise smooth, simple closed curve in the plane, and let $D$ be the region bounded by $C$. If $F=\langle F_x, F_y\rangle$, is a vector field on an open region containing $D$ having continuous partial derivatives then:
>
> $$
> \oint_C F\cdot\hat{T}ds =
> \iint_D \left(
> \frac{\partial{F_y}}{\partial{x}} - \frac{\partial{F_x}}{\partial{y}}
> \right) dA=
> \iint_D \text{curl}(F)dA.
> $$
Let $C$ be a positively oriented, piecewise smooth, simple closed curve in the plane, and let $D$ be the region bounded by $C$. If $F=\langle F_x, F_y\rangle$, is a vector field on an open region containing $D$ having continuous partial derivatives then:
$$
\oint_C F\cdot\hat{T}ds =
\iint_D \left(
\frac{\partial{F_y}}{\partial{x}} - \frac{\partial{F_x}}{\partial{y}}
\right) dA=
\iint_D \text{curl}(F)dA.
$$
:::
The statement of the theorem applies only to regions whose boundaries are simple closed curves. Not all simple regions have such boundaries. An annulus for example. This is a restriction that will be generalized.
@@ -271,11 +273,12 @@ r(t) = [a*cos(t),b*sin(t)]
To compute the area of the triangle with vertices $(0,0)$, $(a,0)$ and $(0,b)$ we can orient the boundary counter clockwise. Let $A$ be the line segment from $(0,b)$ to $(0,0)$, $B$ be the line segment from $(0,0)$ to $(a,0)$, and $C$ be the other. Then
$$
\begin{align*}
\frac{1}{2} \int_A F\cdot\hat{T} ds &=\frac{1}{2} \int_A -ydx = 0\\
\frac{1}{2} \int_B F\cdot\hat{T} ds &=\frac{1}{2} \int_B xdy = 0,
\end{align*}
$$
as on $A$, $y=0$ and $dy=0$ and on $B$, $x=0$ and $dx=0$.
@@ -311,7 +314,7 @@ For the two dimensional case the curl is a scalar. *If* $F = \langle F_x, F_y\ra
Now assume $\partial{F_y}/\partial{x} - \partial{F_x}/\partial{y} = 0$. Let $P$ and $Q$ be two points in the plane. Take any path, $C_1$ from $P$ to $Q$ and any return path, $C_2$, from $Q$ to $P$ that do not cross and such that $C$, the concatenation of the two paths, satisfies Green's theorem. Then, as $F$ is continuous on an open interval containing $D$, we have:
$$
\begin{align*}
0 &= \iint_D 0 dA \\
&=
@@ -321,6 +324,7 @@ Now assume $\partial{F_y}/\partial{x} - \partial{F_x}/\partial{y} = 0$. Let $P$
&=
\int_{C_1} F \cdot \hat{T} ds + \int_{C_2}F \cdot \hat{T} ds.
\end{align*}
$$
Reversing $C_2$ to go from $P$ to $Q$, we see the two work integrals are identical, that is the field is conservative.
@@ -339,13 +343,14 @@ For example, let $F(x,y) = \langle \sin(xy), \cos(xy) \rangle$. Is this a conser
We can check by taking partial derivatives. Those of interest are:
$$
\begin{align*}
\frac{\partial{F_y}}{\partial{x}} &= \frac{\partial{(\cos(xy))}}{\partial{x}} =
-\sin(xy) y,\\
\frac{\partial{F_x}}{\partial{y}} &= \frac{\partial{(\sin(xy))}}{\partial{y}} =
\cos(xy)x.
\end{align*}
$$
It is not the case that $\partial{F_y}/\partial{x} - \partial{F_x}/\partial{y}=0$, so this vector field is *not* conservative.
@@ -417,24 +422,26 @@ p
Let $A$ label the red line, $B$ the green curve, $C$ the blue line, and $D$ the black line. Then the area is given from Green's theorem by considering half of the the line integral of $F(x,y) = \langle -y, x\rangle$ or $\oint_C (xdy - ydx)$. To that matter we have:
$$
\begin{align*}
\int_A (xdy - ydx) &= a(-f(a))\\
\int_C (xdy - ydx) &= b f(b)\\
\int_D (xdy - ydx) &= 0\\
\end{align*}
$$
Finally the integral over $B$, using integration by parts:
$$
\begin{align*}
\int_B F(\vec{r}(t))\cdot \frac{d\vec{r}(t)}{dt} dt &=
\int_b^a \langle -f(t),t \rangle\cdot\langle 1, f'(t)\rangle dt\\
&= \int_a^b f(t)dt - \int_a^b tf'(t)dt\\
&= \int_a^b f(t)dt - \left(tf(t)\mid_a^b - \int_a^b f(t) dt\right).
\end{align*}
$$
Combining, we have after cancellation $\oint (xdy - ydx) = 2\int_a^b f(t) dt$, or after dividing by $2$ the signed area under the curve.
@@ -470,7 +477,7 @@ The cut leads to a counter-clockwise orientation on the outer ring and a clockw
To see that the area integral of $F(x,y) = (1/2)\langle -y, x\rangle$ produces the area for this orientation we have, using $C_1$ as the outer ring, and $C_2$ as the inner ring:
$$
\begin{align*}
\oint_{C_1} F \cdot \hat{T} ds &=
\int_0^{2\pi} (1/2)(2)\langle -\sin(t), \cos(t)\rangle \cdot (2)\langle-\sin(t), \cos(t)\rangle dt \\
@@ -479,6 +486,7 @@ To see that the area integral of $F(x,y) = (1/2)\langle -y, x\rangle$ produces t
\int_{0}^{2\pi} (1/2) \langle \sin(t), \cos(t)\rangle \cdot \langle-\sin(t), -\cos(t)\rangle dt\\
&= -(1/2)(2\pi) = -\pi.
\end{align*}
$$
(Using $\vec{r}(t) = 2\langle \cos(t), \sin(t)\rangle$ for the outer ring and $\vec{r}(t) = 1\langle \cos(t), -\sin(t)\rangle$ for the inner ring.)
@@ -713,13 +721,13 @@ The fluid would flow along the blue (stream) lines. The red lines have equal pot
# https://en.wikipedia.org/wiki/Jiffy_Pop#/media/File:JiffyPop.jpg
imgfile ="figures/jiffy-pop.png"
caption ="""
The Jiffy Pop popcorn design has a top surface that is designed to expand to accommodate the popped popcorn. Viewed as a surface, the surface area grows, but the boundary - where the surface meets the pan - stays the same. This is an example that many different surfaces can have the same bounding curve. Stokes' theorem will relate a surface integral over the surface to a line integral about the bounding curve.
The Jiffy Pop popcorn design has a top surface that is designed to expand to accommodate the popped popcorn. Viewed as a surface, the surface area grows, but the boundary---where the surface meets the pan---stays the same. This is an example that many different surfaces can have the same bounding curve. Stokes' theorem will relate a surface integral over the surface to a line integral about the bounding curve.
"""
# ImageFile(:integral_vector_calculus, imgfile, caption)
nothing
```
![The Jiffy Pop popcorn design has a top surface that is designed to expand to accommodate the popped popcorn. Viewed as a surface, the surface area grows, but the boundary - where the surface meets the pan - stays the same. This is an example that many different surfaces can have the same bounding curve. Stokes' theorem will relate a surface integral over the surface to a line integral about the bounding curve.
![The Jiffy Pop popcorn design has a top surface that is designed to expand to accommodate the popped popcorn. Viewed as a surface, the surface area grows, but the boundary---where the surface meets the pan---stays the same. This is an example that many different surfaces can have the same bounding curve. Stokes' theorem will relate a surface integral over the surface to a line integral about the bounding curve.
](./figures/jiffy-pop.png)
Were the figure of Jiffy Pop popcorn animated, the surface of foil would slowly expand due to pressure of popping popcorn until the popcorn was ready. However, the boundary would remain the same. Many different surfaces can have the same boundary. Take for instance the upper half unit sphere in $R^3$ it having the curve $x^2 + y^2 = 1$ as a boundary curve. This is the same curve as the surface of the cone $z = 1 - (x^2 + y^2)$ that lies above the $x-y$ plane. This would also be the same curve as the surface formed by a Mickey Mouse glove if the collar were scaled and positioned onto the unit circle.
@@ -739,7 +747,7 @@ $$
This gives the series of approximations:
$$
\begin{align*}
\oint_C F\cdot\hat{T} ds &=
\sum \oint_{C_i} F\cdot\hat{T} ds \\
@@ -750,18 +758,21 @@ This gives the series of approximations:
&\approx
\iint_S \nabla\times{F}\cdot\hat{N} dS.
\end{align*}
$$
In terms of our expanding popcorn, the boundary integral - after accounting for cancellations, as in Green's theorem - can be seen as a microscopic sum of boundary integrals each of which is approximated by a term $\nabla\times{F}\cdot\hat{N} \Delta{S}$ which is viewed as a Riemann sum approximation for the the integral of the curl over the surface. The cancellation depends on a proper choice of orientation, but with that we have:
In terms of our expanding popcorn, the boundary integral---after accounting for cancellations, as in Green's theorem---can be seen as a microscopic sum of boundary integrals each of which is approximated by a term $\nabla\times{F}\cdot\hat{N} \Delta{S}$ which is viewed as a Riemann sum approximation for the the integral of the curl over the surface. The cancellation depends on a proper choice of orientation, but with that we have:
::: {.callout-note icon=false}
## Stokes' theorem
> **Stokes' theorem**: Let $S$ be an orientable smooth surface in $R^3$ with boundary $C$, $C$ oriented so that the chosen normal for $S$ agrees with the right-hand rule for $C$'s orientation. Then *if* $F$ has continuous partial derivatives
>
> $$
> \oint_C F \cdot\hat{T} ds = \iint_S (\nabla\times{F})\cdot\hat{N} dA.
> $$
Let $S$ be an orientable smooth surface in $R^3$ with boundary $C$, $C$ oriented so that the chosen normal for $S$ agrees with the right-hand rule for $C$'s orientation. Then *if* $F$ has continuous partial derivatives
$$
\oint_C F \cdot\hat{T} ds = \iint_S (\nabla\times{F})\cdot\hat{N} dA.
$$
:::
Green's theorem is an immediate consequence upon viewing the region in $R^2$ as a surface in $R^3$ with normal $\hat{k}$.
@@ -997,17 +1008,17 @@ $$
the last approximation through a Riemann sum approximation. This heuristic leads to:
::: {.callout-note icon=false}
## The divergence theorem
> **The divergence theorem**: Suppose $V$ is a $3$-dimensional volume which is bounded (compact) and has a boundary, $S$, that is piecewise smooth. If $F$ is a continuously differentiable vector field defined on an open set containing $V$, then:
>
> $$
> \iiint_V (\nabla\cdot{F}) dV = \oint_S (F\cdot\hat{N})dS.
> $$
Suppose $V$ is a $3$-dimensional volume which is bounded (compact) and has a boundary, $S$, that is piecewise smooth. If $F$ is a continuously differentiable vector field defined on an open set containing $V$, then:
$$
\iiint_V (\nabla\cdot{F}) dV = \oint_S (F\cdot\hat{N})dS.
$$
That is, the volume integral of the divergence can be computed from the flux integral over the boundary of $V$.
:::
### Examples of the divergence theorem
@@ -1130,12 +1141,13 @@ The divergence theorem provides two means to compute a value, the point here is
Following Schey, we now consider a continuous analog to the crowd counting problem through a flow with a non-uniform density that may vary in time. Let $\rho(x,y,z;t)$ be the time-varying density and $v(x,y,z;t)$ be a vector field indicating the direction of flow. Consider some three-dimensional volume, $V$, with boundary $S$ (though two-dimensional would also be applicable). Then these integrals have interpretations:
$$
\begin{align*}
\iiint_V \rho dV &&\quad\text{Amount contained within }V\\
\frac{\partial}{\partial{t}} \iiint_V \rho dV &=
\iiint_V \frac{\partial{\rho}}{\partial{t}} dV &\quad\text{Change in time of amount contained within }V
\end{align*}
$$
Moving the derivative inside the integral requires an assumption of continuity. Assume the material is *conserved*, meaning that if the amount in the volume $V$ changes it must flow in and out through the boundary. The flow out through $S$, the boundary of $V$, is

View File

@@ -1,3 +1,7 @@
---
engine: julia
---
# Integrals
Identifying the area under a curve between two values is an age-old problem. In this chapter we see that for many case the Fundamental Theorem of Calculus can be used to identify the area. When not applicable, we will see how such areas may be accurately estimated.

View File

@@ -1,13 +1,19 @@
[deps]
CalculusWithJulia = "a2e0e22d-7d4c-5312-9169-8b992201a882"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
PlotlyBase = "a03496cd-edff-5a9b-9e67-9cda94a718b5"
PlotlyKaleido = "f2990250-8cf9-495f-b13a-cce12b45703c"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuadGK = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
SplitApplyCombine = "03a91e81-4c3e-53e1-a0a4-9c0c8f19dd66"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"
Unitful = "1986cc42-f94f-5a68-af5c-568840ba703d"
UnitfulUS = "7dc9378f-8956-57ef-a780-aa31cc70ff3d"

View File

@@ -0,0 +1,41 @@
# Appendix
```{julia}
#| hold: true
#| echo: false
gr()
## For **some reason** having this in the natural place messes up the plots.
## {{{approximate_surface_area}}}
xs,ys = range(-1, stop=1, length=50), range(-1, stop=1, length=50)
f(x,y)= 2 - (x^2 + y^2)
dr = [1/2, 3/4]
df = [f(dr[1],0), f(dr[2],0)]
function sa_approx_graph(i)
p = plot(xs, ys, f, st=[:surface], legend=false)
for theta in range(0, stop=i/10*2pi, length=10*i )
path3d!(p,sin(theta)*dr, cos(theta)*dr, df)
end
p
end
n = 10
anim = @animate for i=1:n
sa_approx_graph(i)
end
imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)
caption = L"""
Surface of revolution of $f(x) = 2 - x^2$ about the $y$ axis. The lines segments are the images of rotating the secant line connecting $(1/2, f(1/2))$ and $(3/4, f(3/4))$. These trace out the frustum of a cone which approximates the corresponding surface area of the surface of revolution. In the limit, this approximation becomes exact and a formula for the surface area of surfaces of revolution can be used to compute the value.
"""
plotly()
ImageFile(imgfile, caption)
```

View File

@@ -48,14 +48,18 @@ Recall the distance formula gives the distance between two points: $\sqrt{(x_1 -
Consider now two functions $g(t)$ and $f(t)$ and the parameterized graph between $a$ and $b$ given by the points $(g(t), f(t))$ for $a \leq t \leq b$. Assume that both $g$ and $f$ are differentiable on $(a,b)$ and continuous on $[a,b]$ and furthermore that $\sqrt{g'(t)^2 + f'(t)^2}$ is Riemann integrable.
::: {.callout-note icon=false}
## The arc length of a curve
> **The arc length of a curve**. For $f$ and $g$ as described, the arc length of the parameterized curve is given by
>
> $L = \int_a^b \sqrt{g'(t)^2 + f'(t)^2} dt.$
>
> For the special case of the graph of a function $f(x)$ between $a$ and $b$ the formula becomes $L = \int_a^b \sqrt{ 1 + f'(x)^2} dx$ (taking $g(t) = t$).
For $f$ and $g$ as described, the arc length of the parameterized curve is given by
$$
L = \int_a^b \sqrt{g'(t)^2 + f'(t)^2} dt.
$$
For the special case of the graph of a function $f(x)$ between $a$ and $b$ the formula becomes $L = \int_a^b \sqrt{ 1 + f'(x)^2} dx$ (taking $g(t) = t$).
:::
:::{.callout-note}
## Note
@@ -72,21 +76,34 @@ To see why, any partition of the interval $[a,b]$ by $a = t_0 < t_1 < \cdots < t
## {{{arclength_graph}}}
gr()
function make_arclength_graph(n)
x(t) = cos(t)/t
y(t) = sin(t)/t
a, b = 1, 4pi
ns = [10,15,20, 30, 50]
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
ns = [10,15,20, 30, 50]
plot(; empty_style..., aspect_ratio=:equal, size=fig_size)
title!("Approximate arc length with $(ns[n]) points")
g(t) = cos(t)/t
f(t) = sin(t)/t
ts = range(a, b, 250)
plot!(x.(ts), y.(ts); line=(:black,2))
pttn = range(a, b, ns[n])
plot!(x.(pttn), y.(pttn); line=(:red, 2))
ts = range(1, stop=4pi, length=200)
tis = range(1, stop=4pi, length=ns[n])
ts = range(0, 2pi, 100)
p = plot(g, f, 1, 4pi, legend=false, size=fig_size,
title="Approximate arc length with $(ns[n]) points")
plot!(p, map(g, tis), map(f, tis), color=:orange)
λ = 0.01
C = Plots.scale(Shape(:circle), λ)
p
for (u,v) ∈ zip(x.(pttn), y.(pttn))
S = Plots.translate(C, u,v)
plot!(S; fill=(:white,), line=(:black,2))
end
current()
end
n = 5
@@ -126,7 +143,7 @@ $$
But looking at each term, we can push the denominator into the square root as:
$$
\begin{align*}
d_i &= d_i \cdot \frac{t_i - t_{i-1}}{t_i - t_{i-1}}
\\
@@ -134,6 +151,7 @@ d_i &= d_i \cdot \frac{t_i - t_{i-1}}{t_i - t_{i-1}}
\left(\frac{f(t_i)-f(t_{i-1})}{t_i-t_{i-1}}\right)^2} \cdot (t_i - t_{i-1}) \\
&= \sqrt{ g'(\xi_i)^2 + f'(\psi_i)^2} \cdot (t_i - t_{i-1}).
\end{align*}
$$
The values $\xi_i$ and $\psi_i$ are guaranteed by the mean value theorem and must be in $[t_{i-1}, t_i]$.
@@ -272,7 +290,7 @@ nothing
The museum notes have
> For his Catenary series (19972003), of which Near the Lagoon is the largest and last work, Johns formed catenaries—a term used to describe the curve assumed by a cord suspended freely from two points—by tacking ordinary household string to the canvas or its supports.
> For his Catenary series (19972003), of which Near the Lagoon is the largest and last work, Johns formed catenaries—a term used to describe the curve assumed by a cord suspended freely from two points—by tacking ordinary household string to the canvas or its supports.
@@ -377,11 +395,12 @@ nothing
The [nephroid](http://www-history.mcs.st-and.ac.uk/Curves/Nephroid.html) is a curve that can be described parametrically by
$$
\begin{align*}
g(t) &= a(3\cos(t) - \cos(3t)), \\
f(t) &= a(3\sin(t) - \sin(3t)).
\end{align*}
$$
Taking $a=1$ we have this graph:
@@ -407,7 +426,7 @@ quadgk(t -> sqrt(𝒈'(t)^2 + 𝒇'(t)^2), 0, 2pi)[1]
The answer seems like a floating point approximation of $24$, which suggests that this integral is tractable. Pursuing this, the integrand simplifies:
$$
\begin{align*}
\sqrt{g'(t)^2 + f'(t)^2}
&= \sqrt{(-3\sin(t) + 3\sin(3t))^2 + (3\cos(t) - 3\cos(3t))^2} \\
@@ -417,6 +436,7 @@ The answer seems like a floating point approximation of $24$, which suggests th
&= 3\sqrt{2}\sqrt{1 - \cos(2t)}\\
&= 3\sqrt{2}\sqrt{2\sin(t)^2}.
\end{align*}
$$
The second to last line comes from a double angle formula expansion of $\cos(3t - t)$ and the last line from the half angle formula for $\cos$.
@@ -452,13 +472,14 @@ A teacher of small children assigns his students the task of computing the lengt
Mathematically, suppose a curve is described parametrically by $(g(t), f(t))$ for $a \leq t \leq b$. A new parameterization is provided by $\gamma(t)$. Suppose $\gamma$ is strictly increasing, so that an inverse function exists. (This assumption is implicitly made by the teacher, as it implies the student won't start counting in the wrong direction.) Then the same curve is described by composition through $(g(\gamma(u)), f(\gamma(u)))$, $\gamma^{-1}(a) \leq u \leq \gamma^{-1}(b)$. That the arc length is the same follows from substitution:
$$
\begin{align*}
\int_{\gamma^{-1}(a)}^{\gamma^{-1}(b)} \sqrt{([g(\gamma(t))]')^2 + ([f(\gamma(t))]')^2} dt
&=\int_{\gamma^{-1}(a)}^{\gamma^{-1}(b)} \sqrt{(g'(\gamma(t) )\gamma'(t))^2 + (f'(\gamma(t) )\gamma'(t))^2 } dt \\
&=\int_{\gamma^{-1}(a)}^{\gamma^{-1}(b)} \sqrt{g'(\gamma(t))^2 + f'(\gamma(t))^2} \gamma'(t) dt\\
&=\int_a^b \sqrt{g'(u)^2 + f'(u)^2} du = L
\end{align*}
$$
(Using $u=\gamma(t)$ for the substitution.)
@@ -483,15 +504,16 @@ For a simple example, we have $g(t) = R\cos(t)$ and $f(t)=R\sin(t)$ parameterizi
What looks at first glance to be just a slightly more complicated equation is that of an ellipse, with $g(t) = a\cos(t)$ and $f(t) = b\sin(t)$. Taking $a=1$ and $b = a + c$, for $c > 0$ we get the equation for the arc length as a function of $t$ is just
$$
\begin{align*}
s(u) &= \int_0^u \sqrt{(-\sin(t))^2 + b\cos(t)^2} dt\\
&= \int_0^u \sqrt{\sin(t)^2 + \cos(t)^2 + c\cos(t)^2} dt \\
&=\int_0^u \sqrt{1 + c\cos(t)^2} dt.
s(u) &= \int_0^u \sqrt{(-\sin(t))^2 + (b\cos(t))^2} dt\\
&= \int_0^u \sqrt{\sin(t)^2 + \cos(t)^2 + C\cos(t)^2} dt \\
&=\int_0^u \sqrt{1 + C\cos(t)^2} dt.
\end{align*}
$$
But, despite it not looking too daunting, this integral is not tractable through our techniques and has an answer involving elliptic integrals. We can work numerically though. Letting $a=1$ and $b=2$, we have the arc length is given by:
Where $C = 2c + c^2$ is a constant. But, despite it not looking too daunting, this integral is not tractable through our techniques and has an answer involving elliptic integrals. We can work numerically though. Letting $a=1$ and $b=2$, we have the arc length is given by:
```{julia}
@@ -533,7 +555,9 @@ plot(t -> g(𝒔(t)), t -> f(𝒔(t)), 0, sinv(2*pi))
Following (faithfully) [Kantorwitz and Neumann](https://www.researchgate.net/publication/341676916_The_English_Galileo_and_His_Vision_of_Projectile_Motion_under_Air_Resistance), we consider a function $f(x)$ with the property that **both** $f$ and $f'$ are strictly concave down on $[a,b]$ and suppose $f(a) = f(b)$. Further, assume $f'$ is continuous. We will see this implies facts about arc-length and other integrals related to $f$.
The following figure is clearly of a concave down function. The asymmetry about the critical point will be seen to be a result of the derivative also being concave down. This asymmetry will be characterized in several different ways in the following including showing that the arc length from $(a,0)$ to $(c,f(c))$ is longer than from $(c,f(c))$ to $(b,0)$.
@fig-kantorwitz-neumann is clearly of a concave down function. The asymmetry about the critical point will be seen to be a result of the derivative also being concave down. This asymmetry will be characterized in several different ways in the following including showing that the arc length from $(a,0)$ to $(c,f(c))$ is longer than from $(c,f(c))$ to $(b,0)$.
::: {#fig-kantorwitz-neumann}
```{julia}
@@ -568,7 +592,12 @@ plot!(zero)
annotate!([(0, 𝒚, "a"), (152, 𝒚, "b"), (u, 𝒚, "u"), (v, 𝒚, "v"), (c, 𝒚, "c")])
```
Take $a < u < c < v < b$ with $f(u) = f(v)$ and $c$ a critical point, as in the picture. There must be a critical point by Rolle's theorem, and it must be unique, as the derivative, which exists by the assumptions, must be strictly decreasing due to concavity of $f$ and hence there can be at most $1$ critical point.
Graph of function $f(x)$ with both $f$ and $f'$ strictly concave down.
:::
By Rolle's theorem there exists $c$ in $(a,b)$, a critical point, as in the picture. There must be a critical point by Rolle's theorem, and it must be unique, as the derivative, which exists by the assumptions, must be strictly decreasing due to concavity of $f$ and hence there can be at most $1$ critical point.
Take $a < u < c < v < b$ with $f(u) = f(v)$.
Some facts about this picture can be proven from the definition of concavity:
@@ -588,11 +617,12 @@ $$
So
$$
\begin{align*}
\int_0^1 (tf'(u) + (1-t)f'(v)) dt &< \int_0^1 f'(tu + (1-t)v) dt, \text{or}\\
\frac{f'(u) + f'(v)}{2} &< \frac{1}{v-u}\int_u^v f'(w) dw,
\end{align*}
$$
by the substitution $w = tu + (1-t)v$. Using the fundamental theorem of calculus to compute the mean value of the integral of $f'$ over $[u,v]$ gives the following as a consequence of strict concavity of $f'$:
@@ -630,7 +660,7 @@ By the fundamental theorem of calculus:
$$
(f_1^{-1}(y) + f_2^{-1}(y))\big|_\alpha^\beta > 0
(f_1^{-1}(y) + f_2^{-1}(y))\Big|_\alpha^\beta > 0
$$
On rearranging:
@@ -684,24 +714,26 @@ which holds by the strict concavity of $f'$, as found previously.
Using the substitution $x = f_i^{-1}(u)$ as needed to see:
$$
\begin{align*}
\int_a^u f(x) dx &= \int_0^{f(u)} u [f_1^{-1}]'(u) du \\
&> -\int_0^h u [f_2^{-1}]'(u) du \\
&= \int_h^0 u [f_2^{-1}]'(u) du \\
&= \int_v^b f(x) dx.
\end{align*}
$$
For the latter claim, integrating in the $y$ variable gives
$$
\begin{align*}
\int_u^c (f(x)-h) dx &= \int_h^m (c - f_1^{-1}(y)) dy\\
&> \int_h^m (c - f_2^{-1}(y)) dy\\
&> \int_h^m (f_2^{-1}(y) - c) dy\\
&= \int_c^v (f(x)-h) dx
\end{align*}
$$
Now, the area under $h$ over $[u,c]$ is greater than that over $[c,v]$ as $(u+v)/2 < c$ or $v-c < c-u$. That means the area under $f$ over $[u,c]$ is greater than that over $[c,v]$.
@@ -724,7 +756,7 @@ or $\phi'(z) < 0$. Moreover, we have by the first assertion that $f'(z) < -f'(\p
Using the substitution $x = \phi(z)$ gives:
$$
\begin{align*}
\int_v^b \sqrt{1 + f'(x)^2} dx &=
\int_u^a \sqrt{1 + f'(\phi(z))^2} \phi'(z) dz\\
@@ -733,6 +765,7 @@ Using the substitution $x = \phi(z)$ gives:
&= \int_a^u \sqrt{\phi'(z)^2 + f'(z)^2} dz\\
&< \int_a^u \sqrt{1 + f'(z)^2} dz
\end{align*}
$$
Letting $h=f(u \rightarrow c)$ we get the *inequality*
@@ -782,11 +815,12 @@ $$
with the case above corresponding to $W = -m(k/m)$. The set of equations then satisfy:
$$
\begin{align*}
x''(t) &= - W(t,x(t), x'(t), y(t), y'(t)) \cdot x'(t)\\
y''(t) &= -g - W(t,x(t), x'(t), y(t), y'(t)) \cdot y'(t)\\
\end{align*}
$$
with initial conditions: $x(0) = y(0) = 0$ and $x'(0) = v_0 \cos(\theta), y'(0) = v_0 \sin(\theta)$.
@@ -795,28 +829,30 @@ with initial conditions: $x(0) = y(0) = 0$ and $x'(0) = v_0 \cos(\theta), y'(0)
Only with certain drag forces, can this set of equations be solved exactly, though it can be approximated numerically for admissible $W$, but if $W$ is strictly positive then it can be shown $x(t)$ is increasing on $[0, x_\infty)$ and so invertible, and $f(u) = y(x^{-1}(u))$ is three times differentiable with both $f$ and $f'$ being strictly concave, as it can be shown that (say $x(v) = u$ so $dv/du = 1/x'(v) > 0$):
$$
\begin{align*}
f''(u) &= -\frac{g}{x'(v)^2} < 0\\
f'''(u) &= \frac{2gx''(v)}{x'(v)^3} \\
&= -\frac{2gW}{x'(v)^2} \cdot \frac{dv}{du} < 0
\end{align*}
$$
The latter by differentiating, the former a consequence of the following formulas for derivatives of inverse functions
$$
\begin{align*}
[x^{-1}]'(u) &= 1 / x'(v) \\
[x^{-1}]''(u) &= -x''(v)/(x'(v))^3
\end{align*}
$$
For then
$$
\begin{align*}
f(u) &= y(x^{-1}(u)) \\
f'(u) &= y'(x^{-1}(u)) \cdot {x^{-1}}'(u) \\
@@ -825,6 +861,7 @@ f''(u) &= y''(x^{-1}(u))\cdot[x^{-1}]'(u)^2 + y'(x^{-1}(u)) \cdot [x^{-1}]''(u)
&= -g/(x'(v))^2 - W y'/(x'(v))^2 - y'(v) \cdot (- W \cdot x'(v)) / x'(v)^3\\
&= -g/x'(v)^2.
\end{align*}
$$
## Questions

View File

@@ -16,8 +16,10 @@ using Roots
---
![A jigsaw puzzle needs a certain amount of area to complete. For a traditional rectangular puzzle, this area is comprised of the sum of the areas for each piece. Decomposing a total area into the sum of smaller, known, ones---even if only approximate---is the basis of definite integration.](figures/jigsaw.png)
The question of area has long fascinated human culture. As children, we learn early on the formulas for the areas of some geometric figures: a square is $b^2$, a rectangle $b\cdot h$ a triangle $1/2 \cdot b \cdot h$ and for a circle, $\pi r^2$. The area of a rectangle is often the intuitive basis for illustrating multiplication. The area of a triangle has been known for ages. Even complicated expressions, such as [Heron's](http://tinyurl.com/mqm9z) formula which relates the area of a triangle with measurements from its perimeter have been around for 2000 years. The formula for the area of a circle is also quite old. Wikipedia dates it as far back as the [Rhind](http://en.wikipedia.org/wiki/Rhind_Mathematical_Papyrus) papyrus for 1700 BC, with the approximation of $256/81$ for $\pi$.
The question of area has long fascinated human culture. As children, we learn early on the formulas for the areas of some geometric figures: a square is $b^2$, a rectangle $b\cdot h$, a triangle $1/2 \cdot b \cdot h$ and for a circle, $\pi r^2$. The area of a rectangle is often the intuitive basis for illustrating multiplication. The area of a triangle has been known for ages. Even complicated expressions, such as [Heron's](http://tinyurl.com/mqm9z) formula which relates the area of a triangle with measurements from its perimeter have been around for 2000 years. The formula for the area of a circle is also quite old. Wikipedia dates it as far back as the [Rhind](http://en.wikipedia.org/wiki/Rhind_Mathematical_Papyrus) papyrus for 1700 BC, with the approximation of $256/81$ for $\pi$.
The modern approach to area begins with a non-negative function $f(x)$ over an interval $[a,b]$. The goal is to compute the area under the graph. That is, the area between $f(x)$ and the $x$-axis between $a \leq x \leq b$.
@@ -81,39 +83,46 @@ gr()
f(x) = x^2
colors = [:black, :blue, :orange, :red, :green, :orange, :purple]
## Area of parabola
## Area of parabola
function make_triangle_graph(n)
title = "Area of parabolic cup ..."
n==1 && (title = "Area = 1/2")
n==2 && (title = "Area = previous + 1/8")
n==3 && (title = "Area = previous + 2*(1/8)^2")
n==4 && (title = "Area = previous + 4*(1/8)^3")
n==5 && (title = "Area = previous + 8*(1/8)^4")
n==6 && (title = "Area = previous + 16*(1/8)^5")
n==7 && (title = "Area = previous + 32*(1/8)^6")
n==1 && (title = L"Area $= 1/2$")
n==2 && (title = L"Area $=$ previous $+\; \frac{1}{8}$")
n==3 && (title = L"Area $=$ previous $+\; 2\cdot\frac{1}{8^2}$")
n==4 && (title = L"Area $=$ previous $+\; 4\cdot\frac{1}{8^3}$")
n==5 && (title = L"Area $=$ previous $+\; 8\cdot\frac{1}{8^4}$")
n==6 && (title = L"Area $=$ previous $+\; 16\cdot\frac{1}{8^5}$")
n==7 && (title = L"Area $=$ previous $+\; 32\cdot\frac{1}{8^6}$")
plt = plot(f, 0, 1, legend=false, size = fig_size, linewidth=2)
annotate!(plt, [(0.05, 0.9, text(title,:left))]) # if in title, it grows funny with gr
n >= 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1], color=colors[1], linetype=:polygon, fill=colors[1], alpha=.2)
n == 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1], color=colors[1], linewidth=2)
plt = plot(f, 0, 1;
legend=false,
size = fig_size,
linewidth=2)
annotate!(plt, [
(0.05, 0.9, text(title,:left))
]) # if in title, it grows funny with gr
n >= 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1];
color=colors[1], linetype=:polygon,
fill=colors[1], alpha=.2)
n == 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1];
color=colors[1], linewidth=2)
for k in 2:n
xs = range(0, stop=1, length=1+2^(k-1))
ys = map(f, xs)
k < n && plot!(plt, xs, ys, linetype=:polygon, fill=:black, alpha=.2)
ys = f.(xs)
k < n && plot!(plt, xs, ys;
linetype=:polygon, fill=:black, alpha=.2)
if k == n
plot!(plt, xs, ys, color=colors[k], linetype=:polygon, fill=:black, alpha=.2)
plot!(plt, xs, ys, color=:black, linewidth=2)
plot!(plt, xs, ys;
color=colors[k], linetype=:polygon, fill=:black, alpha=.2)
plot!(plt, xs, ys;
color=:black, linewidth=2)
end
end
plt
end
n = 7
anim = @animate for i=1:n
make_triangle_graph(i)
@@ -183,13 +192,47 @@ $$
S_n = f(c_1) \cdot (x_1 - x_0) + f(c_2) \cdot (x_2 - x_1) + \cdots + f(c_n) \cdot (x_n - x_{n-1}).
$$
Clearly for a given partition and choice of $c_i$, the above can be computed. Each term $f(c_i)\cdot(x_i-x_{i-1})$ can be visualized as the area of a rectangle with base spanning from $x_{i-1}$ to $x_i$ and height given by the function value at $c_i$. The following visualizes left Riemann sums for different values of $n$ in a way that makes Beekman's intuition plausible that as the number of rectangles gets larger, the approximate sum will get closer to the actual area.
Clearly for a given partition and choice of $c_i$, the above can be computed. Each term $f(c_i)\cdot(x_i-x_{i-1}) = f(c_i)\Delta_i$ can be visualized as the area of a rectangle with base spanning from $x_{i-1}$ to $x_i$ and height given by the function value at $c_i$. The following visualizes left Riemann sums for different values of $n$ in a way that makes Beekman's intuition plausible that as the number of rectangles gets larger, the approximate sum will get closer to the actual area.
```{julia}
#| hold: true
#| echo: false
gr()
function left_riemann(n)
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
rectangle = (x, y, w, h) -> Shape(x .+ [0,w,w,0], y .+ [0,0,h,h])
f = x -> -(x+1/2)*(x-1)*(x-3) + 1
a, b= 1, 3
plot(; empty_style...)
plot!(f, a, b; line=(:black, 3))
plot!([a-.25, b+.25], [0,0]; axis_style...)
plot!([a-.1, a-.1], [-.25, .5 + f(a/2 +b/2)]; axis_style...)
Δ = (b-a)/n
for i ∈ 0:n-1
xᵢ = a + i*Δ
plot!(rectangle(xᵢ, 0, Δ, f(xᵢ)), opacity=0.5, color=:red)
end
area = round(sum(f(a + i*Δ)*Δ for i ∈ 0:n-1), digits=3)
annotate!([
(a, 0, text(L"a", :top)),
(b, 0, text(L"b", :top)),
(a, f(a/2+b/2), text("\$L_{$n} = $area\$", :left))
])
current()
end
#=
rectangle(x, y, w, h) = Shape(x .+ [0,w,w,0], y .+ [0,0,h,h])
function ₙ(j)
a = ("₋","","","₀","₁","₂","₃","₄","₅","₆","₇","₈","₉")
@@ -210,6 +253,7 @@ function left_riemann(n)
title!("L$(ₙ(n)) = $a")
p
end
=#
anim = @animate for i ∈ (2,4,8,16,32,64)
left_riemann(i)
@@ -230,7 +274,7 @@ To successfully compute a good approximation for the area, we would need to choo
For Archimedes' problem - finding the area under $f(x)=x^2$ between $0$ and $1$ - if we take as a partition $x_i = i/n$ and $c_i = x_i$, then the above sum becomes:
$$
\begin{align*}
S_n &= f(c_1) \cdot (x_1 - x_0) + f(c_2) \cdot (x_2 - x_1) + \cdots + f(c_n) \cdot (x_n - x_{n-1})\\
&= (x_1)^2 \cdot \frac{1}{n} + (x_2)^2 \cdot \frac{1}{n} + \cdots + (x_n)^2 \cdot \frac{1}{n}\\
@@ -238,6 +282,7 @@ S_n &= f(c_1) \cdot (x_1 - x_0) + f(c_2) \cdot (x_2 - x_1) + \cdots + f(c_n) \cd
&= \frac{1}{n^3} \cdot (1^2 + 2^2 + \cdots + n^2) \\
&= \frac{1}{n^3} \cdot \frac{n\cdot(n-1)\cdot(2n+1)}{6}.
\end{align*}
$$
The latter uses a well-known formula for the sum of squares of the first $n$ natural numbers.
@@ -301,19 +346,24 @@ The general statement allows for any partition such that the largest gap goes to
Riemann sums weren't named after Riemann because he was the first to approximate areas using rectangles. Indeed, others had been using even more efficient ways to compute areas for centuries prior to Riemann's work. Rather, Riemann put the definition of the area under the curve on a firm theoretical footing with the following theorem which gives a concrete notion of what functions are integrable:
> **Riemann Integral**: A function $f$ is Riemann integrable over the interval $[a,b]$ and its integral will have value $V$ provided for every $\epsilon > 0$ there exists a $\delta > 0$ such that for any partition $a =x_0 < x_1 < \cdots < x_n=b$ with $\lvert x_i - x_{i-1} \rvert < \delta$ and for any choice of points $x_{i-1} \leq c_i \leq x_{i}$ this is satisfied:
>
> $$
> \lvert \sum_{i=1}^n f(c_i)(x_{i} - x_{i-1}) - V \rvert < \epsilon.
> $$
>
> When the integral exists, it is written $V = \int_a^b f(x) dx$.
::: {.callout-note icon=false}
## Riemann Integral
A function $f$ is Riemann integrable over the interval $[a,b]$ and its integral will have value $V$ provided for every $\epsilon > 0$ there exists a $\delta > 0$ such that for any partition $a =x_0 < x_1 < \cdots < x_n=b$ with $\lvert x_i - x_{i-1} \rvert < \delta$ and for any choice of points $x_{i-1} \leq c_i \leq x_{i}$ this is satisfied:
$$
\lvert \sum_{i=1}^n f(c_i)(x_{i} - x_{i-1}) - V \rvert < \epsilon.
$$
When the integral exists, it is written $V = \int_a^b f(x) dx$.
:::
:::{.callout-note}
## History note
The expression $V = \int_a^b f(x) dx$ is known as the *definite integral* of $f$ over $[a,b]$. Much earlier than Riemann, Cauchy had defined the definite integral in terms of a sum of rectangular products beginning with $S=(x_1 - x_0) f(x_0) + (x_2 - x_1) f(x_1) + \cdots + (x_n - x_{n-1}) f(x_{n-1})$ (the left Riemann sum). He showed the limit was well defined for any continuous function. Riemann's formulation relaxes the choice of partition and the choice of the $c_i$ so that integrability can be better understood.
The expression $V = \int_a^b f(x) dx$ is known as the *definite integral* of $f$ over $[a,b]$. Much earlier than Riemann, Cauchy had defined the definite integral in terms of a sum of rectangular products beginning with $S=f(x_0) \cdot (x_1 - x_0) + f(x_1) \cdot (x_2 - x_1) + \cdots + f(x_{n-1}) \cdot (x_n - x_{n-1}) $ (the left Riemann sum). He showed the limit was well defined for any continuous function. Riemann's formulation relaxes the choice of partition and the choice of the $c_i$ so that integrability can be better understood.
:::
@@ -323,18 +373,6 @@ The expression $V = \int_a^b f(x) dx$ is known as the *definite integral* of $f$
The following formulas are consequences when $f(x)$ is integrable. These mostly follow through a judicious rearranging of the approximating sums.
The area is $0$ when there is no width to the interval to integrate over:
> $$
> \int_a^a f(x) dx = 0.
> $$
Even our definition of a partition doesn't really apply, as we assume $a < b$, but clearly if $a=x_0=x_n=b$ then our only"approximating" sum could be $f(a)(b-a) = 0$.
The area under a constant function is found from the area of rectangle, a special case being $c=0$ yielding $0$ area:
@@ -347,16 +385,65 @@ The area under a constant function is found from the area of rectangle, a specia
For any partition of $a < b$, we have $S_n = c(x_1 - x_0) + c(x_2 -x_1) + \cdots + c(x_n - x_{n-1})$. By factoring out the $c$, we have a *telescoping sum* which means the sum simplifies to $S_n = c(x_n-x_0) = c(b-a)$. Hence any limit must be this constant value.
Scaling the $y$ axis by a constant can be done before or after computing the area:
::: {#fig-consequence-rectangle-area}
```{julia}
#| echo: false
gr()
let
c = 1
a,b = 0.5, 1.5
f(x) = c
Δ = 0.1
plt = plot(;
xaxis=([], false),
yaxis=([], false),
legend=false,
)
plot!(f, a, b; line=(:black, 2))
plot!([a-Δ, b + Δ], [0,0]; line=(:gray, 1), arrow=true, side=:head)
plot!([a-Δ/2, a-Δ/2], [-Δ, c + Δ]; line=(:gray, 1), arrow=true, side=:head)
plot!([a,a],[0,f(a)]; line=(:black, 1, :dash))
plot!([b,b],[0,f(b)]; line=(:black, 1, :dash))
annotate!([
(a, 0, text(L"a", :top, :right)),
(b, 0, text(L"b", :top, :left)),
(a-Δ/2-0.01, c, text(L"c", :right))
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration that the area under a constant function is that of a rectangle
:::
The area is $0$ when there is no width to the interval to integrate over:
> $$
> \int_a^b cf(x) dx = c \int_a^b f(x) dx.
> \int_a^a f(x) dx = 0.
> $$
Let $a=x_0 < x_1 < \cdots < x_n=b$ be any partition. Then we have $S_n= cf(c_1)(x_1-x_0) + \cdots + cf(c_n)(x_n-x_{n-1})$ $=$ $c\cdot\left[ f(c_1)(x_1 - x_0) + \cdots + f(c_n)(x_n - x_{n-1})\right]$. The "limit" of the left side is $\int_a^b c f(x) dx$. The "limit" of the right side is $c \cdot \int_a^b f(x)$. We call this a "sketch" as a formal proof would show that for any $\epsilon$ we could choose a $\delta$ so that any partition with norm $\delta$ will yield a sum less than $\epsilon$. Here, then our "any" partition would be one for which the $\delta$ on the left hand side applies. The computation shows that the same $\delta$ would apply for the right hand side when $\epsilon$ is the same.
Even our definition of a partition doesn't really apply, as we assume $a < b$, but clearly if $a=x_0=x_n=b$ then our only"approximating" sum could be $f(a)(b-a) = 0$.
#### Shifts
A jigsaw puzzle piece will have the same area if it is moved around on the table or flipped over. Similarly some shifts preserve area under a function.
The area is invariant under shifts left or right.
@@ -370,18 +457,72 @@ The area is invariant under shifts left or right.
Any partition $a =x_0 < x_1 < \cdots < x_n=b$ is related to a partition of $[a-c, b-c]$ through $a-c < x_0-c < x_1-c < \cdots < x_n - c = b-c$. Let $d_i=c_i-c$ denote this partition, then we have:
$$
\begin{align*}
f(c_1 -c) \cdot (x_1 - x_0) &+ f(c_2 -c) \cdot (x_2 - x_1) + \cdots\\
&\quad + f(c_n -c) \cdot (x_n - x_{n-1})\\
&= f(d_1) \cdot(x_1-c - (x_0-c)) + f(d_2) \cdot(x_2-c - (x_1-c)) + \cdots\\
&\quad + f(d_n) \cdot(x_n-c - (x_{n-1}-c)).
\end{align*}
$$
The left side will have a limit of $\int_a^b f(x-c) dx$ the right would have a "limit" of $\int_{a-c}^{b-c}f(x)dx$.
::: {#fig-consequence-rectangle-area}
```{julia}
#| echo: false
gr()
let
f(x) = 2 + cospi(x^2/10)*sinpi(x)
plt = plot(;
xaxis=([], false),
yaxis=([], false),
legend=false,
)
a, b = 0,4
c = 5
plot!(f, a, b; line=(:black, 2))
plot!(x -> f(x-c), a+c, b+c; line=(:red, 2))
plot!([-1, b+c + 1], [0,0]; line=(:gray, 2), arrow=true, side=:head)
for x ∈ (a,b)
plot!([x,x],[0,f(x)]; line=(:black,1, :dash))
end
for x ∈ (a+c,b+c)
plot!([x,x],[0,f(x)]; line=(:red,1, :dash))
end
annotate!([
(a+c,0, text(L"a", :top)),
(b+c,0, text(L"b", :top)),
(a,0, text(L"a-c", :top)),
(b,0, text(L"b-c", :top)),
(1.0, 3, text(L"f(x)",:left)),
(1.0+c, 3, text(L"f(x-c)",:left)),
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration that the area under shift remains the same
:::
Similarly, reflections don't effect the area under the curve, they just require a new parameterization:
@@ -391,7 +532,79 @@ Similarly, reflections don't effect the area under the curve, they just require
The scaling operation $g(x) = f(cx)$ has the following:
::: {#fig-consequence-reflect-area}
```{julia}
#| echo: false
gr()
let
f(x) = 2 + cospi(x^2/10)*sinpi(x)
g(x) = f(-x)
plt = plot(;
xaxis=([], false),
yaxis=([], false),
legend=false,
)
a, b = 1,4
plot!(f, a, b; line=(:black, 2))
plot!(g, -b, -a; line=(:red, 2))
plot!([-5, 5], [0,0]; line=(:gray,1), arrow=true, side=:head)
plot!([0,0], [-0.1,3.15]; line=(:gray,1), arrow=true, side=:head)
for x in (a,b)
plot!([x,x], [0,f(x)]; line=(:black,1,:dash))
plot!([-x,-x], [0,g(-x)]; line=(:red,1,:dash))
end
annotate!([
(a,0, text(L"a", :top)),
(b,0, text(L"b", :top)),
(-a,0, text(L"-a", :top)),
(-b,0, text(L"-b", :top)),
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration that the area remains constant under reflection through $y$ axis.
:::
The "reversed" area is the same, only accounted for with a minus sign.
> $$
> \int_a^b f(x) dx = -\int_b^a f(x) dx.
> $$
#### Scaling
Scaling the $y$ axis by a constant can be done before or after computing the area:
> $$
> \int_a^b cf(x) dx = c \int_a^b f(x) dx.
> $$
Let $a=x_0 < x_1 < \cdots < x_n=b$ be any partition. Then we have $S_n= cf(c_1)(x_1-x_0) + \cdots + cf(c_n)(x_n-x_{n-1})$ $=$ $c\cdot\left[ f(c_1)(x_1 - x_0) + \cdots + f(c_n)(x_n - x_{n-1})\right]$. The "limit" of the left side is $\int_a^b c f(x) dx$. The "limit" of the right side is $c \cdot \int_a^b f(x)$. We call this a "sketch" as a formal proof would show that for any $\epsilon$ we could choose a $\delta$ so that any partition with norm $\delta$ will yield a sum less than $\epsilon$. Here, then our "any" partition would be one for which the $\delta$ on the left hand side applies. The computation shows that the same $\delta$ would apply for the right hand side when $\epsilon$ is the same.
The scaling operation on the $x$ axis, $g(x) = f(cx)$, has the following property:
> $$
@@ -406,8 +619,9 @@ The scaling operation shifts $a$ to $ca$ and $b$ to $cb$ so the limits of integr
Combining two operations above, the operation $g(x) = \frac{1}{h}f(\frac{x-c}{h})$ will leave the area between $a$ and $b$ under $g$ the same as the area under $f$ between $(a-c)/h$ and $(b-c)/h$.
---
#### Area is additive
When two jigsaw pieces interlock their combined area is that of each added. This also applies to areas under functions.
The area between $a$ and $b$ can be broken up into the sum of the area between $a$ and $c$ and that between $c$ and $b$.
@@ -421,35 +635,185 @@ The area between $a$ and $b$ can be broken up into the sum of the area between $
For this, suppose we have a partition for both the integrals on the right hand side for a given $\epsilon/2$ and $\delta$. Combining these into a partition of $[a,b]$ will mean $\delta$ is still the norm. The approximating sum will combine to be no more than $\epsilon/2 + \epsilon/2$, so for a given $\epsilon$, this $\delta$ applies.
This is due to the area on the left and right of $0$ being equivalent.
::: {#fig-consequence-additive-area}
```{julia}
#| echo: false
gr()
let
f(x) = 2 + cospi(x^2/7)*sinpi(x)
a,b,c = 0.1, 8, 3
xs = range(a,c,100)
A1 = Shape(vcat(xs,c,a), vcat(f.(xs), 0, 0))
xs = range(c,b,100)
A2 = Shape(vcat(xs,b,c), vcat(f.(xs), 0, 0))
The "reversed" area is the same, only accounted for with a minus sign.
plt = plot(;
xaxis=([], false),
yaxis=([], false),
legend=false,
)
plot!([0,0] .- 0.1,[-.1,3]; line=(:gray, 1), arrow=true, side=:head)
plot!([0-.2, b+0.5],[0,0]; line=(:gray, 1), arrow=true, side=:head)
plot!(A1; fill=(:gray60,1), line=(nothing,))
plot!(A2; fill=(:gray90,1), line=(nothing,))
plot!(f, a, b; line=(:black, 2))
for x in (a,b,c)
plot!([x,x], [0, f(x)]; line=(:black, 1, :dash))
end
> $$
> \int_a^b f(x) dx = -\int_b^a f(x) dx.
> $$
annotate!([(x,0,text(latexstring("$y"),:top)) for (x,y) in zip((a,b,c),("a","b","c"))])
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration that the area between $a$ and $b$ can be computed as area between $a$ and $c$ and then $c$ and $b$.
:::
A consequence of the last few statements is:
> If $f(x)$ is an even function, then $\int_{-a}^a f(x) dx = 2 \int_0^a f(x) dx$. If $f(x)$ is an odd function, then $\int_{-a}^a f(x) dx = 0$.
> If $f(x)$ is an even function, then $\int_{-a}^a f(x) dx = 2 \int_0^a f(x) dx$.
> If $f(x)$ is an odd function, then $\int_{-a}^a f(x) dx = 0$.
Additivity works in the $y$ direction as well.
If $f(x)$ and $g(x)$ are two functions then
> $$
> \int_a^b (f(x) + g(x)) dx = \int_a^b f(x) dx + \int_a^b g(x) dx
> $$
For any partitioning with $x_i, x_{i-1}$ and $c_i$ this holds:
$$
(f(c_i) + g(c_i)) \cdot (x_i - x_{i-1}) =
f(c_i) \cdot (x_i - x_{i-1}) + g(c_i) \cdot (x_i - x_{i-1})
$$
This leads to the same statement for the areas under the curves.
The *linearity* of the integration operation refers to this combination of the above:
> $$
> \int_a^b (cf(x) + dg(x)) dx = c\int_a^b f(x) dx + d \int_a^b g(x)dx
> $$
The integral of a shifted function satisfies:
> $$
> \int_a^b \left(D + C\cdot f(\frac{x - B}{A})\right) dx = D\cdot(b-a) + C \cdot A \int_{\frac{a-B}{A}}^{\frac{b-B}{A}} f(x) dx
> $$
This follows from a few of the statements above:
$$
\begin{align*}
\int_a^b \left(D + C\cdot f(\frac{x - B}{A})\right) dx &=
\int_a^b D dx + C \int_a^b f(\frac{x-B}{A}) dx \\
&= D\cdot(b-a) + C\cdot A \int_{\frac{a-B}{A}}^{\frac{b-B}{A}} f(x) dx
\end{align*}
$$
#### Inequalities
Area under a non-negative function is non-negative
> $$
> \int_a^b f(x) dx \geq 0,\quad\text{when } a < b, \text{ and } f(x) \geq 0
> $$
Under this assumption, for any partitioning with $x_i, x_{i-1}$ and $c_i$ it holds the $f(c_i)\cdot(x_i - x_{i-1}) \geq 0$. So any sum of non-negative values can only be non-negative, even in the limit.
If $g$ bounds $f$ then the area under $g$ will bound the area under $f$, in particular if $f(x)$ is non negative, so will the area under $f$ be non negative for any $a < b$. (This assumes that $g$ and $f$ are integrable.)
> If $0 \leq f(x) \leq g(x)$ then $\int_a^b f(x) dx \leq \int_a^b g(x) dx.$
If $g$ bounds $f$ then the area under $g$ will bound the area under $f$.
> $$
> $\int_a^b f(x) dx \leq \int_a^b g(x) dx \quad\text{when } a < b\text{ and } 0 \leq f(x) \leq g(x)
> $$
For any partition of $[a,b]$ and choice of $c_i$, we have the term-by-term bound $f(c_i)(x_i-x_{i-1}) \leq g(c_i)(x_i-x_{i-1})$ So any sequence of partitions that converges to the limits will have this inequality maintained for the sum.
::: {#fig-consequence-0-area}
```{julia}
#| echo: false
gr()
let
f(x) = 1/6+x^3*(2-x)/2
g(x) = 1/6+exp(x/3)+(1-x/1.7)^6-0.6
a, b = 0, 2
plot(; empty_style...)
plot!([a-.5, b+.25], [0,0]; line=(:gray, 1), arrow=true, side=:head)
plot!([0,0] .- 0.25, [-0.25, 1.8]; line=(:gray, 1), arrow=true, side=:head)
xs = range(a,b,250)
S = Shape(vcat(xs, reverse(xs)), vcat(f.(xs), g.(reverse(xs))))
plot!(S; fill=(:gray70, 0.3), line=(nothing,))
S = Shape(vcat(xs, reverse(xs)), vcat(zero.(xs), f.(reverse(xs))))
plot!(S; fill=(:gray90, 0.3), line=(nothing,))
plot!(f, a, b; line=(:black, 4))
plot!(g, a, b; line=(:black, 2))
for x in (a,b)
plot!([x,x], [0, g(x)]; line=(:black,1,:dash))
end
annotate!([(x,0,text(t, :top)) for (x,t) in zip((a,b),(L"a", L"b"))])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration that if $f(x) \le g(x)$ on $[a,b]$ then the integrals share the same property. The excess area is clearly positive.
:::
(This also follows by considering $h(x) = g(x) - f(x) \geq 0$ by assumption, so $\int_a^b h(x) dx \geq 0$.)
For non-negative functions, integrals over larger domains are bigger
> $$
> \int_a^c f(x) dx \le \int_a^b f(x) dx,\quad\text{when } c < b \text{ and } f(x) \ge 0
> $$
This follows as $\int_c^b f(x) dx$ is non-negative under these assumptions.
### Some known integrals
@@ -471,15 +835,16 @@ Using the definition, we can compute a few definite integrals:
This is just the area of a trapezoid with heights $a$ and $b$ and side length $b-a$, or $1/2 \cdot (b + a) \cdot (b - a)$. The right sum would be:
$$
\begin{align*}
S &= x_1 \cdot (x_1 - x_0) + x_2 \cdot (x_2 - x_1) + \cdots x_n \cdot (x_n - x_{n-1}) \\
&= (a + 1\frac{b-a}{n}) \cdot \frac{b-a}{n} + (a + 2\frac{b-a}{n}) \cdot \frac{b-a}{n} + \cdots (a + n\frac{b-a}{n}) \cdot \frac{b-a}{n}\\
&= n \cdot a \cdot (\frac{b-a}{n}) + (1 + 2 + \cdots n) \cdot (\frac{b-a}{n})^2 \\
S &= x_1 \cdot (x_1 - x_0) + x_2 \cdot (x_2 - x_1) + \cdots + x_n \cdot (x_n - x_{n-1}) \\
&= (a + 1\frac{b-a}{n}) \cdot \frac{b-a}{n} + (a + 2\frac{b-a}{n}) \cdot \frac{b-a}{n} + \cdots + (a + n\frac{b-a}{n}) \cdot \frac{b-a}{n}\\
&= n \cdot a \cdot (\frac{b-a}{n}) + (1 + 2 + \cdots + n) \cdot (\frac{b-a}{n})^2 \\
&= n \cdot a \cdot (\frac{b-a}{n}) + \frac{n(n+1)}{2} \cdot (\frac{b-a}{n})^2 \\
& \rightarrow a \cdot(b-a) + \frac{(b-a)^2}{2} \\
&= \frac{b^2}{2} - \frac{a^2}{2}.
\end{align*}
$$
> $$
@@ -502,7 +867,7 @@ This is similar to the Archimedes case with $a=0$ and $b=1$ shown above.
Cauchy showed this using a *geometric series* for the partition, not the arithmetic series $x_i = a + i (b-a)/n$. The series defined by $1 + \alpha = (b/a)^{1/n}$, then $x_i = a \cdot (1 + \alpha)^i$. Here the bases $x_{i+1} - x_i$ simplify to $x_i \cdot \alpha$ and $f(x_i) = (a\cdot(1+\alpha)^i)^k = a^k (1+\alpha)^{ik}$, or $f(x_i)(x_{i+1}-x_i) = a^{k+1}\alpha[(1+\alpha)^{k+1}]^i$, so, using $u=(1+\alpha)^{k+1}=(b/a)^{(k+1)/n}$, $f(x_i) \cdot(x_{i+1} - x_i) = a^{k+1}\alpha u^i$. This gives
$$
\begin{align*}
S &= a^{k+1}\alpha u^0 + a^{k+1}\alpha u^1 + \cdots + a^{k+1}\alpha u^{n-1}\\
&= a^{k+1} \cdot \alpha \cdot (u^0 + u^1 + \cdot u^{n-1}) \\
@@ -510,6 +875,7 @@ S &= a^{k+1}\alpha u^0 + a^{k+1}\alpha u^1 + \cdots + a^{k+1}\alpha u^{n-1}\\
&= (b^{k+1} - a^{k+1}) \cdot \frac{\alpha}{(1+\alpha)^{k+1} - 1} \\
&\rightarrow \frac{b^{k+1} - a^{k+1}}{k+1}.
\end{align*}
$$
> $$
@@ -541,9 +907,12 @@ Certainly other integrals could be computed with various tricks, but we won't pu
### Some other consequences
* The definition is defined in terms of any partition with its norm bounded by $\delta$. If you know a function $f$ is Riemann integrable, then it is enough to consider just a regular partition $x_i = a + i \cdot (b-a)/n$ when forming the sums, as was done above. It is just that showing a limit for just this particular type of partition would not be sufficient to prove Riemann integrability.
* The choice of $c_i$ is arbitrary to allow for maximum flexibility. The Darboux integrals use the maximum and minimum over the subinterval. It is sufficient to prove integrability to show that the limit exists with just these choices.
* Most importantly,
* The definition is defined in terms of any partition with its norm bounded by $\delta$. If you know a function $f$ is Riemann integrable, then it is enough to consider just a regular partition $x_i = a + i \cdot (b-a)/n$ when forming the sums, as was done above. It is just that showing a limit for just this particular type of partition would not be sufficient to prove Riemann integrability.
* The choice of $c_i$ is arbitrary to allow for maximum flexibility. The Darboux integrals use the maximum and minimum over the subinterval. It is sufficient to prove integrability to show that the limit exists with just these choices.
* Most importantly,
> A continuous function on $[a,b]$ is Riemann integrable on $[a,b]$.
@@ -553,13 +922,13 @@ Certainly other integrals could be computed with various tricks, but we won't pu
The main idea behind this is that the difference between the maximum and minimum values over a partition gets small. That is if $[x_{i-1}, x_i]$ is like $1/n$ is length, then the difference between the maximum of $f$ over this interval, $M$, and the minimum, $m$ over this interval will go to zero as $n$ gets big. That $m$ and $M$ exists is due to the extreme value theorem, that this difference goes to $0$ is a consequence of continuity. What is needed is that this value goes to $0$ at the same rate no matter what interval is being discussed is a consequence of a notion of uniform continuity, a concept discussed in advanced calculus, but which holds for continuous functions on closed intervals. Armed with this, the Riemann sum for a general partition can be bounded by this difference times $b-a$, which will go to zero. So the upper and lower Riemann sums will converge to the same value.
* A "jump", or discontinuity of the first kind, is a value $c$ in $[a,b]$ where $\lim_{x \rightarrow c+} f(x)$ and $\lim_{x \rightarrow c-}f(x)$ both exist, but are not equal. It is true that a function that is not continuous on $I=[a,b]$, but only has discontinuities of the first kind on $I$ will be Riemann integrable on $I$.
* A "jump", or discontinuity of the first kind, is a value $c$ in $[a,b]$ where $\lim_{x \rightarrow c+} f(x)$ and $\lim_{x \rightarrow c-}f(x)$ both exist, but are not equal. It is true that a function that is not continuous on $I=[a,b]$, but only has discontinuities of the first kind on $I$ will be Riemann integrable on $I$.
For example, the function $f(x) = 1$ for $x$ in $[0,1]$ and $0$ otherwise will be integrable, as it is continuous at all but two points, $0$ and $1$, where it jumps.
* Some functions can have infinitely many points of discontinuity and still be integrable. The example of $f(x) = 1/q$ when $x=p/q$ is rational, and $0$ otherwise is often used as an example.
* Some functions can have infinitely many points of discontinuity and still be integrable. The example of $f(x) = 1/q$ when $x=p/q$ is rational, and $0$ otherwise is often used to illustrate this.
## Numeric integration
@@ -589,11 +958,11 @@ deltas = diff(xs) # forms x2-x1, x3-x2, ..., xn-xn-1
cs = xs[1:end-1] # finds left-hand end points. xs[2:end] would be right-hand ones.
```
Now to multiply the values. We want to sum the product `f(cs[i]) * deltas[i]`, here is one way to do so:
We want to sum the products $f(c_i)\Delta_i$. Here is one way to do so using `zip` to iterate over the paired off values in `cs` and `deltas`.
```{julia}
sum(f(cs[i]) * deltas[i] for i in 1:length(deltas))
sum(f(ci)*Δi for (ci, Δi) in zip(cs, deltas))
```
Our answer is not so close to the value of $1/3$, but what did we expect - we only used $n=5$ intervals. Trying again with $50,000$ gives us:
@@ -605,7 +974,7 @@ n = 50_000
xs = a:(b-a)/n:b
deltas = diff(xs)
cs = xs[1:end-1]
sum(f(cs[i]) * deltas[i] for i in 1:length(deltas))
sum(f(ci)*Δi for (ci, Δi) in zip(cs, deltas))
```
This value is about $10^{-5}$ off from the actual answer of $1/3$.
@@ -619,19 +988,19 @@ Before continuing, we define a function to compute Riemann sums for us with an
```{julia}
#| eval: false
riemann(f, a, b, n; method="right") = riemann(f, range(a,b,n+1); method=method)
function riemann(f, xs; method="right")
Ms = (left = (f,a,b) -> f(a),
right= (f,a,b) -> f(b),
Ms = (left = (f,a,b) -> f(a),
right = (f,a,b) -> f(b),
trapezoid = (f,a,b) -> (f(a) + f(b))/2,
simpsons = (f,a,b) -> (c = a/2 + b/2; (1/6) * (f(a) + 4*f(c) + f(b))),
simpsons = (f,a,b) -> (c = a/2 + b/2; (1/6) * (f(a) + 4*f(c) + f(b)))
)
_riemann(Ms[Symbol(method)], f, xs)
end
function _riemann(M, f, xs)
M = Ms[Symbol(method)}
xs = zip(xs[1:end-1], xs[2:end])
sum(M(f, a, b) * (b-a) for (a,b) ∈ xs)
end
riemann(f, a, b, n; method="right") =
riemann(f, range(a,b,n+1); method)
```
(This function is defined in `CalculusWithJulia` and need not be copied over if that package is loaded.)
@@ -687,7 +1056,7 @@ Consider a function $g(x)$ defined through its piecewise linear graph:
```{julia}
#| echo: false
g(x) = abs(x) > 2 ? 1.0 : abs(x) - 1.0
plot(g, -3,3)
plot(g, -3,3; legend=false)
plot!(zero)
```
@@ -698,6 +1067,25 @@ plot!(zero)
We could add the signed area over $[0,1]$ to the above, but instead see a square of area $1$, a triangle with area $1/2$ and a triangle with signed area $-1$. The total is then $1/2$.
This figure---using equal sized axes---may make the above decomposition more clear:
```{julia}
#| echo: false
let
g(x) = abs(x) > 2 ? 1.0 : abs(x) - 1.0
xs = [ -3, -2, -1, 1, 2, 3]
plot(; legend=false, aspect_ratio=:equal)
plot!(Shape([-3,-2,-2,-3], [0,0,1,1]); fill=(:gray,))
plot!(Shape([-2,-1,-1,-2], [0,0,0,1]); fill=(:gray10,))
plot!(Shape([-1,0,1], [0,-1,0]); fill=(:gray90,))
plot!(Shape([1,2,2,1], [0,1,0,0]); fill=(:gray10,))
plot!(Shape([2,3,3,2], [0,0,1,1]); fill=(:gray,))
plot!([0,0], [0, g(0)]; line=(:black,1,:dash))
end
```
* Compute $\int_{-3}^{3} g(x) dx$:
@@ -717,7 +1105,7 @@ An immediate consequence would be $\int_{-\pi}^\pi \sin(x) = 0$, as would $\int_
##### Example
Numerically estimate the definite integral $\int_0^e x\log(x) dx$. (We redefine the function to be $0$ at $0$, so it is continuous.)
Numerically estimate the definite integral $\int_0^2 x\log(x) dx$. (We redefine the function to be $0$ at $0$, so it is continuous.)
We have to be a bit careful with the Riemann sum, as the left Riemann sum will have an issue at $0=x_0$ (`0*log(0)`) returns `NaN` which will poison any subsequent arithmetic operations, so the value returned will be `NaN` and not an approximate answer. We could define our function with a check:
@@ -779,7 +1167,9 @@ We have the well-known triangle [inequality](http://en.wikipedia.org/wiki/Triang
This suggests that the following inequality holds for integrals:
> $\lvert \int_a^b f(x) dx \rvert \leq \int_a^b \lvert f(x) \rvert dx$.
> $$
> \lvert \int_a^b f(x) dx \rvert \leq \int_a^b \lvert f(x) \rvert dx$.
> $$
@@ -799,7 +1189,7 @@ While such bounds are disappointing, often, when looking for specific values, th
The Riemann sum above is actually extremely inefficient. To see how much, we can derive an estimate for the error in approximating the value using an arithmetic progression as the partition. Let's assume that our function $f(x)$ is increasing, so that the right sum gives an upper estimate and the left sum a lower estimate, so the error in the estimate will be between these two values:
$$
\begin{align*}
\text{error} &\leq
\left[
@@ -809,6 +1199,7 @@ f(x_1) \cdot (x_{1} - x_0) + f(x_2) \cdot (x_{2} - x_1) + \cdots + f(x_{n-1})(
&= \frac{b-a}{n} \cdot (\left[f(x_1) + f(x_2) + \cdots + f(x_n)\right] - \left[f(x_0) + \cdots + f(x_{n-1})\right]) \\
&= \frac{b-a}{n} \cdot (f(b) - f(a)).
\end{align*}
$$
We see the error goes to $0$ at a rate of $1/n$ with the constant depending on $b-a$ and the function $f$. In general, a similar bound holds when $f$ is not monotonic.
@@ -847,17 +1238,19 @@ $$
\frac{b-a}{6}(f(x_1) + 4f(x_2) + f(x_3)).
$$
This formula will actually be exact for any 3rd degree polynomial. In fact an entire family of similar approximations using $n$ points can be made exact for any polynomial of degree $n-1$ or lower. But with non-evenly spaced points, even better results can be found.
This formula will actually be exact for any 2nd degree polynomial. In fact an entire family of similar approximations using $n$ points can be made exact for any polynomial of degree $n-1$ or lower. But with non-evenly spaced points, even better results can be found.
The formulas for an approximation to the integral $\int_{-1}^1 f(x) dx$ discussed so far can be written as:
$$
\begin{align*}
S &= f(x_1) \Delta_1 + f(x_2) \Delta_2 + \cdots + f(x_n) \Delta_n\\
&= w_1 f(x_1) + w_2 f(x_2) + \cdots + w_n f(x_n).
&= w_1 f(x_1) + w_2 f(x_2) + \cdots + w_n f(x_n)\\
&= \sum_{i=1}^n w_i f(x_i).
\end{align*}
$$
The $w$s are "weights" and the $x$s are nodes. A [Gaussian](http://en.wikipedia.org/wiki/Gaussian_quadrature) *quadrature rule* is a set of weights and nodes for $i=1, \dots n$ for which the sum is *exact* for any $f$ which is a polynomial of degree $2n-1$ or less. Such choices then also approximate well the integrals of functions which are not polynomials of degree $2n-1$, provided $f$ can be well approximated by a polynomial over $[-1,1]$. (Which is the case for the "nice" functions we encounter.) Some examples are given in the questions.
@@ -890,7 +1283,7 @@ f(x) = x^5 - x + 1
quadgk(f, -2, 2)
```
The error term is $0$, answer is $4$ up to the last unit of precision (1 ulp), so any error is only in floating point approximations.
The error term is $0$, the answer is $4$ up to the last unit of precision (1 ulp), so any error is only in floating point approximations.
For the numeric computation of definite integrals, the `quadgk` function should be used over the Riemann sums or even Simpson's rule.
@@ -1409,6 +1802,70 @@ val, _ = quadgk(f, a, b)
numericq(val)
```
###### Question
Let $A=1.98$ and $B=1.135$ and
$$
f(x) = \frac{1 - e^{-Ax}}{B\sqrt{\pi}x} e^{-x^2}.
$$
Find $\int_0^1 f(x) dx$
```{julia}
#| echo: false
let
A,B = 1.98, 1.135
f(x) = (1 - exp(-A*x))*exp(-x^2)/(B*sqrt(pi)*x)
val,_ = quadgk(f, 0, 1)
numericq(val)
end
```
###### Question
A bound for the complementary error function ( positive function) is
$$
\text{erfc}(x) \leq \frac{1}{2}e^{-2x^2} + \frac{1}{2}e^{-x^2} \leq e^{-x^2}
\quad x \geq 0.
$$
Let $f(x)$ be the first bound, $g(x)$ the second.
Assuming this is true, confirm numerically using `quadgk` that
$$
\int_0^3 f(x) dx \leq \int_0^3 g(x) dx
$$
The value of $\int_0^3 f(x) dx$ is
```{julia}
#| echo: false
let
f(x) = 1/2 * exp(-2x^2) + 1/2 * exp(-x^2)
val,_ = quadgk(f, 0, 3)
numericq(val)
end
```
The value of $\int_0^3 g(x) dx$ is
```{julia}
#| echo: false
let
g(x) = exp(-x^2)
val,_ = quadgk(g, 0, 3)
numericq(val)
end
```
###### Question

View File

@@ -26,12 +26,14 @@ $$
\int_a^b (f(x) - g(x)) dx
$$
can be interpreted as the "signed" area between $f(x)$ and $g(x)$ over $[a,b]$. If on this interval $[a,b]$ it is true that $f(x) \geq g(x)$, then this would just be the area, as seen in this figure. The rectangle in the figure has area: $(f(a)-g(a)) \cdot (b-a)$ which could be a term in a left Riemann sum of the integral of $f(x) - g(x)$:
can be interpreted as the "signed" area between $f(x)$ and $g(x)$ over $[a,b]$. If on this interval $[a,b]$ it is true that $f(x) \geq g(x)$, then this would just be the area, as seen in this figure. The rectangle in the figure has area: $(f(x_i)-g(x_i)) \cdot (x_{i+1}-x_i)$ for some $x_i, x_{i+1}$ suggestive of a term in a left Riemann sum of the integral of $f(x) - g(x)$:
```{julia}
#| hold: true
#| echo: false
#| label: fig-area-between-f-g-shade
#| fig-cap: "Area between two functions"
f1(x) = x^2
g1(x) = sqrt(x)
a,b = 1/4, 3/4
@@ -42,9 +44,9 @@ ts = vcat(f1.(xs), g1.(reverse(xs)))
plot(f1, 0, 1, legend=false)
plot!(g1)
plot!(ss, ts, fill=(0, :red))
plot!(xs, f1.(xs), linewidth=5, color=:green)
plot!(xs, g1.(xs), linewidth=5, color=:green)
plot!(ss, ts, fill=(0, :forestgreen, 0.25))
plot!(xs, f1.(xs), linewidth=5, color=:royalblue)
plot!(xs, g1.(xs), linewidth=5, color=:royalblue)
plot!(xs, f1.(xs), legend=false, linewidth=5, color=:blue)
@@ -53,7 +55,7 @@ u,v = .4, .5
plot!([u,v,v,u,u], [f1(u), f1(u), g1(u), g1(u), f1(u)], color=:black, linewidth=3)
```
For the figure, we have $f(x) = \sqrt{x}$, $g(x)= x^2$ and $[a,b] = [1/4, 3/4]$. The shaded area is then found by:
In @fig-area-between-f-g we have $f(x) = \sqrt{x}$, $g(x)= x^2$ and $[a,b] = [1/4, 3/4]$. The shaded area is then found by:
$$
@@ -62,7 +64,88 @@ $$
#### Examples
Find the area between
$$
\begin{align*}
f(x) &= \frac{x^3 \cdot (2-x)}{2} \text{ and } \\
g(x) &= e^{x/3} + (1-\frac{x}{1.7})^6 - 0.6
\end{align*}
$$
over the interval $[0.2, 1.7]$. The area is illustrated in the figure below.
```{julia}
f(x) = x^3*(2-x)/2
g(x) = exp(x/3) + (1 - (x/1.7))^6 - 0.6
a, b = 0.2, 1.7
h(x) = g(x) - f(x)
answer, _ = quadgk(h, a, b)
answer
```
::: {#fig-area-between-f-g}
```{julia}
#| echo: false
p = let
gr()
# area between graphs
# https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09areabetweengraphs.pdf
f(x) = 1/6+x^3*(2-x)/2
g(x) = 1/6+exp(x/3)+(1-x/1.7)^6-0.6
a,b =0.2, 1.7
A, B = 0, 2
A, B = A + .1, B - .1
n = 20
plot(; empty_style..., aspect_ratio=:equal, xlims=(A,B))
plot!(f, A, B; fn_style...)
plot!(g, A, B; fn_style...)
xp = range(a, b, n)
marked = n ÷ 2
for i in 1:n-1
x0, x1 = xp[i], xp[i+1]
mpt = (x0 + x1)/2
R = Shape([x0,x1,x1,x0], [f(mpt),f(mpt),g(mpt),g(mpt)])
color = i == marked ? :gray : :white
plot!(R; fill=(color, 0.5), line=(:black, 1))
end
# axis
plot!([(A,0),(B,0)]; axis_style...)
# highlight
x0, x1 = xp[marked], xp[marked+1]
_style = (;line=(:gray, 1, :dash))
plot!([(a,0), (a, f(a))]; _style...)
plot!([(b,0), (b,f(b))]; _style...)
plot!([(x0,0), (x0, f(x0))]; _style...)
plot!([(x1,0), (x1, f(x1))]; _style...)
annotate!([
(B, f(B), text(L"f(x)", 10, :left,:top)),
(B, g(B), text(L"g(x)", 10, :left, :bottom)),
(a, 0, text(L"a=x_0", 10, :top, :left)),
(b, 0, text(L"b=x_n", 10, :top, :left)),
(x0, 0, text(L"x_i", 10, :top)),
(x1, 0, text(L"x_{i+1}", 10, :top,:left))
])
current()
end
plotly()
p
```
Illustration of a Riemann sum approximation to estimate the area between $f(x)$ and $g(x)$ over an interval $[a,b]$. (Figure follows one by @Angenent.)
:::
##### Example
Find the area bounded by the line $y=2x$ and the curve $y=2 - x^2$.
@@ -156,12 +239,14 @@ summation(1/(n+1)/(n+2), (n, 1, oo))
##### Example
Verify [Archimedes'](http://en.wikipedia.org/wiki/The_Quadrature_of_the_Parabola) finding that the area of the parabolic segment is $4/3$rds that of the triangle joining $a$, $(a+b)/2$ and $b$.
Verify [Archimedes'](http://en.wikipedia.org/wiki/The_Quadrature_of_the_Parabola) finding that the area of the parabolic segment is $4/3$rds that of the triangle joining $a$, $(a+b)/2$ and $b$. @fig-area-between-f-g clearly shows the bigger parabolic segment area.
```{julia}
#| hold: true
#| echo: false
#| label: fig-archimedes-triangle
#| fig-cap: "Area of parabolic segment and triangle"
f(x) = 2 - x^2
a,b = -1, 1/2
c = (a + b)/2
@@ -169,10 +254,15 @@ xs = range(-sqrt(2), stop=sqrt(2), length=50)
rxs = range(a, stop=b, length=50)
rys = map(f, rxs)
plot(f, a, b, legend=false, linewidth=3)
plot(f, a, b, legend=false,
line=(3, :royalblue),
axis=([], false)
)
plot!([a,b], [f(a),f(b)], line=(3, :royalblue))
xs = [a,c,b,a]
plot!(xs, f.(xs), linewidth=3)
triangle = Shape(xs, f.(xs))
plot!(triangle, fill=(:forestgreen, 3, 0.25))
```
For concreteness, let $f(x) = 2-x^2$ and $[a,b] = [-1, 1/2]$, as in the figure. Then the area of the triangle can be computed through:
@@ -358,7 +448,7 @@ When doing problems by hand this latter style can often reduce the complications
Consider two overlapping circles, one with smaller radius. How much area is in the larger circle that is not in the smaller? The question came up on the `Julia` [discourse](https://discourse.julialang.org/t/is-there-package-or-method-to-calculate-certain-area-in-julia-symbolically-with-sympy/99751) discussion board. A solution, modified from an answer of `@rocco_sprmnt21`, follows.
Without losing too-much generality, we can consider the smaller circle to have radius $a$, the larger circle to have radius $b$ and centered at $(0,c)$.
We assume some overlap -- $a \ge c-b$, but not too much -- $c-b \ge 0$ or $0 \le c-b \le a$.
We assume some overlap---$a \ge c-b$, but not too much---$c-b \ge 0$ or $0 \le c-b \le a$.
```{julia}
@syms x::real y::real a::positive b::positive c::positive
@@ -485,7 +575,7 @@ box(f⁻¹(x₀-1Δ), x₀-2Δ, 1 - f⁻¹(x₀-1Δ), Δ, colᵣ)
box(f⁻¹(x₀-2Δ), x₀-3Δ, 1 - f⁻¹(x₀-2Δ), Δ, colᵣ)
```
The figure above suggests that the area under $f(x)$ over $[a,b]$ could be represented as the area between the curves $f^{-1}(y)$ and $y=b$ from $[f(a), f(b)]$.
The figure above suggests that the area under $f(x)$ over $[a,b]$ could be represented as the area between the curves $f^{-1}(y)$ and $x=b$ from $[f(a), f(b)]$.
---
@@ -515,6 +605,80 @@ a, b = 0, 1
quadgk(y -> f(y) - g(y), a, b)[1]
```
## The area enclosed in a simple polygon
A simple polygon is comprised of several non-intersecting line segments, save for the last segment ends where the first begins. These have an orientation, which we take to be counterclockwise. Polygons, as was seen when computing areas related to Archimedes efforts, can be partitioned into simple geometric shapes, for which known areas apply.
### The trapezoid formula
In this example, we see how trapezoids can be used to find the interior area encolosed by a simply polygon, avoiding integration.
The trapezoid formula to compute the area of a simple polygon is
$$
A = - \sum_{i=1}^n \frac{y_{i+1} + y_i}{2} \cdot (x_{i+1} - x_i).
$$
Where the polygon is described by points $(x_1,y_1), (x_2,y_2), \cdots, (x_n, y_n), (x_{n+1}, y_{n+1})$ *with* $(x_1,y_1) = (x_{n+1}, y_{n+1})$.
Each term describes the area of a trapezoid, possibly signed.
This figure illustrates for a simple case:
```{julia}
xs = [1, 3, 4, 2, 1] # n = 4 to give 5=n+1 values
ys = [1, 1, 2, 3, 1]
p = plot(xs, ys; line=(3, :black), ylims=(0,4), legend=false)
scatter!(p, xs, ys; marker=(7, :circle))
```
Going further, we draw the four trapezoids using different colors depending on the sign of the `xs[i+1] - xs[i]` terms:
```{julia}
for i in 1:4
col = xs[i+1] - xs[i] > 0 ? :yellow : :blue
S = Shape([(xs[i],0), (xs[i+1],0), (xs[i+1],ys[i+1]), (xs[i], ys[i])])
plot!(p, S, fill=(col, 0.25))
end
p
```
The yellow trapezoids appear to be colored grey, as they completely overlap with parts of the blue trapezoids and blue and yellow make grey with lights. As the signs of the differences of the $x$ values is different, these areas add to $0$ in the sum, leaving just the area of the interior when the sum is computed.
For this particular figure, the enclosed area is
```{julia}
- sum((ys[i+1] + ys[i]) / 2 * (xs[i+1] - xs[i]) for i in 1:length(xs)-1)
```
### The triangle formula
Similarly, we can create triangles to partition the polygon. The *signed* area of a triangle with vertices $(0,0), (x_i, y_i), (x_{i+1}, y_{i+1})$ can be computed by $\frac{1}{2} \cdot (x_i \cdot y_{i+1} - x_{i+1}\cdot y_i)$. (A formula that can be derived from a related one for the area of a parallelogram.
Visualizing, as before, we have the shape and the triangles after centering around the origin:
```{julia}
S = Shape(xs, ys)
c = Plots.center(S) # find centroid of the polygon
xs, ys = xs .- c[1], ys .- c[2]
p = plot(xs, ys; line=(3, :black), legend=false)
scatter!(p, xs, ys; marker=(7, :circle))
for i in 1:4
col = xs[i]*ys[i+1] - xs[i+1]*ys[i] > 0 ? :yellow : :blue
S = Shape([(0,0), (xs[i],ys[i]), (xs[i+1],ys[i+1])])
plot!(p, S, fill=(col, 0.25))
end
p
```
Here the triangles are all yellow, as each has a positive area to contribute to the following sum:
```{julia}
(1/2) * sum(xs[i]*ys[i+1] - xs[i+1]*ys[i] for i in 1:4)
```
## Questions
@@ -833,3 +997,56 @@ nothing
![Roberval, avoiding a trignoometric integral, instead used symmetry to show that the area under the companion curve was half the area of the rectangle, which in this figure is $2\pi$.
](./figures/companion-curve-bisects-rectangle.png)
###### Question
```{julia}
#| echo: false
#| label: fig-cavalieri-example
#| fig-cap: "Cavalieri example"
let
squareplus(x, b=2) = x/2 + sqrt(x^2 + b)/2
TeLU(x) = x * tanh(exp(x))
Δ(x) = squareplus(x) - TeLU(x)
a,b = -3, 3
xs = range(a, b, 10)
c = 3
Shift(f,c) = x -> c + f(x)
p = plot(Shift(squareplus, c), a, b;
legend=false,
line=(3, :royalblue),
axis=([], false))
plot!(Shift(TeLU, c),
line=(3, :royalblue)
)
plot!(Δ, line=(3, :forestgreen))
plot!(zero, line=(3, :forestgreen))
n = 20
xs = range(a, b, n+1)
for i in 1:n
S = Shape([xs[i],xs[i]],
[0, Δ(xs[i])])
plot!(Plots.translate(S, 0, TeLU(xs[i]) + c), fill=(:royalblue, 0.25))
plot!(S, fill=(:forestgreen, 0.25))
end
p
end
```
@fig-cavalieri-example shows on same scale the graphs of $f(x)$ and $g(x)$ and the graphs of $f(x) - g(x)$ and $0$ (the lower figure). Twenty lines were drawn with height $f(x) - g(x)$ on the lower figure and these were translated to the upper figure by an amount $g(x)$. All to illustrate that any parallel line in the $y$ direction intersects the two figures with the same length.
What does this imply:
```{julia}
#| hold: true
#| echo: false
choices = ["The two enclosed areas should be equal",
"The two enclosed areas are clearly different, as they do not overap"]
radioq(choices, 1)
```

View File

@@ -1,4 +1,4 @@
# Center of Mass
# Center of mass
{{< include ../_common_code.qmd >}}
@@ -158,23 +158,29 @@ The figure shows the approximating rectangles and circles representing their mas
Generalizing from this figure shows the center of mass for such an approximation will be:
$$
\begin{align*}
&\frac{\rho f(c_1) (x_1 - x_0) \cdot x_1 + \rho f(c_2) (x_2 - x_1) \cdot x_1 + \cdots + \rho f(c_n) (x_n- x_{n-1}) \cdot x_{n-1}}{\rho f(c_1) (x_1 - x_0) + \rho f(c_2) (x_2 - x_1) + \cdots + \rho f(c_n) (x_n- x_{n-1})} \\
&=\\
&\quad\frac{f(c_1) (x_1 - x_0) \cdot x_1 + f(c_2) (x_2 - x_1) \cdot x_1 + \cdots + f(c_n) (x_n- x_{n-1}) \cdot x_{n-1}}{f(c_1) (x_1 - x_0) + f(c_2) (x_2 - x_1) + \cdots + f(c_n) (x_n- x_{n-1})}.
\end{align*}
$$
But the top part is an approximation to the integral $\int_a^b x f(x) dx$ and the bottom part the integral $\int_a^b f(x) dx$. The ratio of these defines the center of mass.
::: {.callout-note icon=false}
## Center of Mass
> **Center of Mass**: The center of mass (in the $x$ direction) of a region in the $x-y$ plane described by the area under a (positive) function $f(x)$ between $a$ and $b$ is given by
>
> $\text{Center of mass} = \text{cm}_x = \frac{\int_a^b xf(x) dx}{\int_a^b f(x) dx}.$
>
> For regions described by a more complicated set of equations, the center of mass is found from the same formula where $f(x)$ is the total height in the $x$ direction for a given $x$.
The center of mass (in the $x$ direction) of a region in the $x-y$ plane described by the area under a (positive) function $f(x)$ between $a$ and $b$ is given by
$$
\text{Center of mass} =
\text{cm}_x = \frac{\int_a^b xf(x) dx}{\int_a^b f(x) dx}.
$$
For regions described by a more complicated set of equations, the center of mass is found from the same formula where $f(x)$ is the total height in the $x$ direction for a given $x$.
:::
For the triangular shape, we have by the fact that $f(x) = 1 - \lvert x \rvert$ is an even function that $xf(x)$ will be odd, so the integral around $-1,1$ will be $0$. So the center of mass formula applied to this problem agrees with our expectation.
@@ -497,7 +503,7 @@ numericq(val)
###### Question
Find the center of mass of the region in the first quadrant bounded by the function $f(x) = x^3(1-x)^4$.
Find the center of mass in the $x$ variable of the region in the first quadrant bounded by the function $f(x) = x^3(1-x)^4$.
```{julia}

Binary file not shown.

After

Width:  |  Height:  |  Size: 727 KiB

View File

@@ -100,7 +100,7 @@ where we define $g(i) = f(a + ih)h$. In the above, $n$ relates to $b$, but we co
Again, we fix a large $n$ and let $h=(b-a)/n$. And suppose $x = a + Mh$ for some $M$. Then writing out the approximations to both the definite integral and the derivative we have
$$
\begin{align*}
F'(x) = & \frac{d}{dx} \int_a^x f(u) du \\
& \approx \frac{F(x) - F(x-h)}{h} \\
@@ -113,17 +113,19 @@ F'(x) = & \frac{d}{dx} \int_a^x f(u) du \\
\left(f(a + 1h) + f(a + 2h) + \cdots + f(a + (M-1)h) \right) \\
&= f(a + Mh).
\end{align*}
$$
If $g(i) = f(a + ih)$, then the above becomes
$$
\begin{align*}
F'(x) & \approx D(S(g))(M) \\
&= f(a + Mh)\\
&= f(x).
\end{align*}
$$
That is $F'(x) \approx f(x)$.
@@ -138,13 +140,14 @@ $$
With these heuristics, we now have:
::: {.callout-note icon=false}
## The fundamental theorem of calculus
> **The fundamental theorem of calculus**
>
> Part 1: Let $f$ be a continuous function on a closed interval $[a,b]$ and define $F(x) = \int_a^x f(u) du$ for $a \leq x \leq b$. Then $F$ is continuous on $[a,b]$, differentiable on $(a,b)$ and moreover, $F'(x) =f(x)$.
>
> Part 2: Now suppose $f$ is any integrable function on a closed interval $[a,b]$ and $F(x)$ is *any* differentiable function on $[a,b]$ with $F'(x) = f(x)$. Then $\int_a^b f(x)dx=F(b)-F(a)$.
Part 1: Let $f$ be a continuous function on a closed interval $[a,b]$ and define $F(x) = \int_a^x f(u) du$ for $a \leq x \leq b$. Then $F$ is continuous on $[a,b]$, differentiable on $(a,b)$ and moreover, $F'(x) =f(x)$.
Part 2: Now suppose $f$ is any integrable function on a closed interval $[a,b]$ and $F(x)$ is *any* differentiable function on $[a,b]$ with $F'(x) = f(x)$. Then $\int_a^b f(x)dx=F(b)-F(a)$.
:::
:::{.callout-note}
@@ -153,10 +156,75 @@ In Part 1, the integral $F(x) = \int_a^x f(u) du$ is defined for any Riemann int
:::
This figure relating the area under some continuous $f(x)$ from $a$ to both $x$ and $x+h$ for some small $h$ helps to visualize the two fundamental theorems.
::: {#fig-FTC-derivative}
```{julia}
#| echo: false
let
gr()
f(x) = sin(x)
A(x) = cos(x)
a,b = 0, 6pi/13
h = pi/20
xs = range(a, b, 100)
p1 = plot(; empty_style...)
plot!([0,0] .- 0.05,[-0.1, 1]; line=(:gray,1), arrow=true, side=:head)
plot!([-0.1, b+h + pi/10], [0,0]; line=(:gray,1), arrow=true, side=:head)
xs = range(a, b, 100)
S = Shape(vcat(xs, reverse(xs)), vcat(f.(xs), zero.(xs)))
plot!(S; fill=(:gray90, 0.25), line=(nothing,))
plot!(f, a, b+h; line=(:black, 2))
xs = range(b, b+h, 100)
S = Shape(vcat(xs, reverse(xs)), vcat(f.(xs), zero.(xs)))
plot!(S; fill=(:gray70, 0.25), line=(nothing,))
plot!([b,b,b+h,b+h],[0,f(b),f(b),0]; line=(:black,1,:dash))
annotate!([
(a,0,text(L"a", :top, :left)),
(b, 0, text(L"x", :top)),
(b+h,0,text(L"x+h", :top)),
(2b/3, 1/2, text(L"A(x)")),
(b + h/2, 1/2, text(L"f(x)\cdot h \approx A(x+h)-A(x)", rotation=90))
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Area under curve between $a$ and $b$ labeled with $A(b)$ for $b=x$ and $b=x+h$.
:::
The last rectangle is exactly $f(x)h$ and approximately $A(x+h)-A(x)$, the difference being the small cap above the shaded rectangle. This gives the approximate derivative:
$$
A'(x) \approx \frac{A(x+h) - A(x)}{h} \approx \frac{f(x)\cdot h}{h} = f(x)
$$
That is, by taking limits, $A(x) = \int_a^x f(u) du$ is an antiderivative of $f(x)$. Moreover, from geometric considerations of area, if $a < c < b$, then
$$
A(b) - A(c) = \int_a^b f(x) dx - \int_a^c f(x) dx = \int_c^b f(x) dx
$$
That is $A(x)$ satisfies the two parts of the fundamental theorem.
## Using the fundamental theorem of calculus to evaluate definite integrals
The major use of the FTC is the computation of $\int_a^b f(x) dx$. Rather than resort to Riemann sums or geometric arguments, there is an alternative - *when possible*, find a function $F$ with $F'(x) = f(x)$ and compute $F(b) - F(a)$.
The most visible use of the FTC is the computation of definite integrals, $\int_a^b f(x) dx$. Rather than resort to Riemann sums or geometric arguments, there is an alternative - *when possible*, find a function $F$ with $F'(x) = f(x)$ and compute $F(b) - F(a)$.
Some examples:
@@ -210,21 +278,21 @@ The expression $F(b) - F(a)$ is often written in this more compact form:
$$
\int_a^b f(x) dx = F(b) - F(a) = F(x)\big|_{x=a}^b, \text{ or just expr}\big|_{x=a}^b.
\int_a^b f(x) dx = F(b) - F(a) = F(x)\Big|_{x=a}^b, \text{ or just expr}\Big|_{x=a}^b.
$$
The vertical bar is used for the *evaluation* step, in this case the $a$ and $b$ mirror that of the definite integral. This notation lends itself to working inline, as we illustrate with this next problem where we "know" a function "$F$", so just express it "inline":
$$
\int_0^{\pi/4} \sec^2(x) dx = \tan(x) \big|_{x=0}^{\pi/4} = 1 - 0 = 1.
\int_0^{\pi/4} \sec^2(x) dx = \tan(x) \Big|_{x=0}^{\pi/4} = 1 - 0 = 1.
$$
A consequence of this notation is:
$$
F(x) \big|_{x=a}^b = -F(x) \big|_{x=b}^a.
F(x) \Big|_{x=a}^b = -F(x) \Big|_{x=b}^a.
$$
This says nothing more than $F(b)-F(a) = -F(a) - (-F(b))$, though more compactly.
@@ -321,13 +389,13 @@ Answers may not be available as elementary functions, but there may be special f
integrate(x / sqrt(1-x^3), x)
```
The different cases explored by `integrate` are after the questions.
Different cases explored by `integrate` are mentioned after the questions.
## Rules of integration
There are some "rules" of integration that allow integrals to be re-expressed. These follow from the rules of derivatives.
There are some "rules" of integration that allow indefinite integrals to be re-expressed.
* The integral of a constant times a function:
@@ -350,7 +418,7 @@ $$
This follows immediately as if $F(x)$ and $G(x)$ are antiderivatives of $f(x)$ and $g(x)$, then $[F(x) + G(x)]' = f(x) + g(x)$, so the right hand side will have a derivative of $f(x) + g(x)$.
In fact, this more general form where $c$ and $d$ are constants covers both cases:
In fact, this more general form where $c$ and $d$ are constants covers both cases and referred to by the linearity of the integral:
$$
@@ -366,25 +434,27 @@ This statement is nothing more than the derivative formula $[cf(x) + dg(x)]' = c
* The antiderivative of the polynomial $p(x) = a_n x^n + \cdots + a_1 x + a_0$ follows from the linearity of the integral and the general power rule:
$$
\begin{align*}
\int (a_n x^n + \cdots + a_1 x + a_0) dx
&= \int a_nx^n dx + \cdots + \int a_1 x dx + \int a_0 dx \\
&= a_n \int x^n dx + \cdots + a_1 \int x dx + a_0 \int dx \\
&= a_n \int x^n dx + \cdots + a_1 \int x^1 dx + a_0 \int x^0 dx \\
&= a_n\frac{x^{n+1}}{n+1} + \cdots + a_1 \frac{x^2}{2} + a_0 \frac{x}{1}.
\end{align*}
$$
* More generally, a [Laurent](https://en.wikipedia.org/wiki/Laurent_polynomial) polynomial allows for terms with negative powers. These too can be handled by the above. For example
$$
\begin{align*}
\int (\frac{2}{x} + 2 + 2x) dx
&= \int \frac{2}{x} dx + \int 2 dx + \int 2x dx \\
&= 2\int \frac{1}{x} dx + 2 \int dx + 2 \int xdx\\
&= 2\log(x) + 2x + 2\frac{x^2}{2}.
\end{align*}
$$
* Consider this integral:
@@ -412,12 +482,14 @@ This seems like a lot of work, and indeed it is more than is needed. The followi
$$
\int_0^\pi 100 \sin(x) dx = 100(-\cos(x)) \big|_0^{\pi} = 100 \cos(x) \big|_{\pi}^0 = 100(1) - 100(-1) = 200.
\int_0^\pi 100 \sin(x) dx = 100(-\cos(x)) \Big|_0^{\pi} = 100 \cos(x) \Big|_{\pi}^0 = 100(1) - 100(-1) = 200.
$$
## The derivative of the integral
The relationship that $[\int_a^x f(u) du]' = f(x)$ is a bit harder to appreciate, as it doesn't help answer many ready made questions. Here we give some examples of its use.
@@ -428,12 +500,16 @@ $$
F(x) = \int_a^x f(u) du.
$$
The value of $a$ does not matter, as long as the integral is defined.
The value of $a$ does not matter, as long as the integral is defined. This $F$ satisfies the first fundamental theorem, as $F(a)=0$.
```{julia}
#| hold: true
#| echo: false
#| eval: false
##{{{ftc_graph}}}
gr()
function make_ftc_graph(n)
@@ -474,9 +550,9 @@ imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)
plotly()
ImageFile(imgfile, caption)
```
The picture for this, for non-negative $f$, is of accumulating area as $x$ increases. It can be used to give insight into some formulas:
#The picture for this, for non-negative $f$, is of accumulating area as $x$ increases. It can be used to give insight into some formulas:
```
For any function, we know that $F(b) - F(c) + F(c) - F(a) = F(b) - F(a)$. For this specific function, this translates into this property of the integral:
@@ -545,7 +621,7 @@ In probability theory, for a positive, continuous random variable, the probabili
For example, the exponential distribution with rate $1$ has $f(x) = e^{-x}$. Compute $F(x)$.
This is just $F(x) = \int_0^x e^{-u} du = -e^{-u}\big|_0^x = 1 - e^{-x}$.
This is just $F(x) = \int_0^x e^{-u} du = -e^{-u}\Big|_0^x = 1 - e^{-x}$.
The "uniform" distribution on $[a,b]$ has
@@ -584,7 +660,7 @@ The answer will either be at a critical point, at $0$ or as $x$ goes to $\infty$
$$
[\text{erf}(x)]' = \frac{2}{\pi}e^{-x^2}.
[\text{erf}(x)]' = \frac{2}{\sqrt{\pi}}e^{-x^2}.
$$
Oh, this is never $0$, so there are no critical points. The maximum occurs at $0$ or as $x$ goes to $\infty$. Clearly at $0$, we have $\text{erf}(0)=0$, so the answer will be as $x$ goes to $\infty$.
@@ -645,13 +721,14 @@ Under assumptions that the $X$ are identical and independent, the largest value,
This problem is constructed to take advantage of the FTC, and we have:
$$
\begin{align*}
\left[P(M \leq a)\right]'
&= \left[F(a)^n\right]'\\
&= n \cdot F(a)^{n-1} \left[F(a)\right]'\\
&= n F(a)^{n-1}f(a)
\end{align*}
$$
##### Example
@@ -749,7 +826,7 @@ A junior engineer at `Treadmillz.com` is tasked with updating the display of cal
**********
```
In this example display there was 1 calorie burned in the first minute, then 2, then 5, 5, 4, 3, 2, 2, 1. The total is $24$.
In this example display there was 1 calorie burned in the first minute, then 2, then 5, 5, 4, 4, 3, 2, 2, 1. The total is $29$.
In her work the junior engineer found this old function for updating the display
@@ -808,7 +885,7 @@ end
Then the "area" represented by the dots stays fixed over this time frame.
The engineer then thought a bit more, as the form of her answer seemed familiar. She decides to parameterize it in terms of $t$ and found with $h=1/n$: `c(t) = (C(t) - C(t-h))/h`. Ahh - the derivative approximation. But then what is the "area"? It is no longer just the sum of the dots, but in terms of the functions she finds that each column represents $c(t)\cdot h$, and the sum is just $c(t_1)h + c(t_2)h + \cdots c(t_n)h$ which looks like an approximate integral.
The engineer then thought a bit more, as the form of her answer seemed familiar. She decides to parameterize it in terms of $t$ and found with $h=1/n$: `c(t) = (C(t) - C(t-h))/h`. Ahh - the derivative approximation. But then what is the "area"? It is no longer just the sum of the dots, but in terms of the functions she finds that each column represents $c(t)\cdot h$, and the sum is just $c(t_1)h + c(t_2)h + \cdots + c(t_n)h$ which looks like an approximate integral.
If the display were to reach the modern age and replace LED "dots" with a higher-pixel display, then the function to display would be $c(t) = C'(t)$ and the area displayed would be $\int_{t-10}^t c(u) du$.
@@ -1114,6 +1191,192 @@ answ = 2
radioq(choices, answ)
```
###### Question
The error function (`erf`) is defined in terms of an integral:
$$
\text{erf}(x) = \frac{2}{\sqrt{\pi}} \int_0^x \exp(-t^2) dt, \quad{x \geq 0}
$$
The constant is chosen so that $\lim_{x \rightarrow \infty} \text{erf}(x) = 1$.
What is the derivative of $\text{erf}(x)$?
```{julia}
#| echo: false
choices = [L"\exp(-x^2)",
L"-2x \exp(-x^2)",
L"\frac{2}{\sqrt{\pi}} \exp(-x^2)"]
radioq(choices, 3; keep_order=true, explanation="Don't forget the scalar multiple")
```
Is the function $\text{erf(x)}$ *increasing* on $[0,\infty)$?
```{julia}
#| echo: false
choices = ["No",
"Yes, the derivative is positive on this interval",
"Yes, the derivative is negative on this interval",
"Yes, the derivative is increasing on this interval",
"Yes, the derivative is decreasing on this interval"]
radioq(choices, 2; keep_order=true)
```
Is the function $\text{erf(x)}$ *concave down* on $[0,\infty)$?
```{julia}
#| echo: false
choices = ["No",
"Yes, the derivative is positive on this interval",
"Yes, the derivative is negative on this interval",
"Yes, the derivative is increasing on this interval",
"Yes, the derivative is decreasing on this interval"]
radioq(choices, 5; keep_order=true)
```
For $x > 0$, consider the function
$$
F(x) = \frac{2}{\sqrt{\pi}} \int_{-x}^0 \exp(-t^2) dt
$$
Why is $F'(x) = \text{erf}'(x)$?
```{julia}
#| echo: false
choices = ["The integrand is an *even* function so the integral from ``0`` to ``x`` is the same as the integral from ``-x`` to ``0``",
"This isn't true"]
radioq(choices, 1; keep_order=true)
```
Consider the function
$$
F(x) = \frac{2}{\sqrt{\pi}} \int_0^{\sqrt{x}} \exp(-t^2) dt, \quad x \geq 0
$$
What is the derivative of $F$?
```{julia}
#| echo: false
choices = [L"\exp(-x^2)",
L"\frac{2}{\sqrt{\pi}} \exp(-x^2)",
L"\frac{2}{\sqrt{\pi}} \exp(-x^2) \cdot (-2x)"]
radioq(choices, 3; keep_order=true, explanation="Don't forget to apply the chain rule, as ``F(x) = \\text{erf}(\\sqrt{x})``")
```
###### Question
Define two function through the integrals:
$$
\begin{align*}
S(x) &= \int_0^x \sin(t^2) dt\\
C(x) &= \int_0^x \cos(t^2) dt
\end{align*}
$$
These are called *Fresnel Integrals*.
A non-performant implementation might look like:
```{julia}
S(x) = first(quadgk(t -> sin(t^2), 0, x))
```
Define a similar function for $C(x)$ and them make a parametric plot for $0 \le t \le 5$.
Describe the shape.
```{julia}
#| echo: false
choices = ["It makes a lovely star shape",
"It makes a lovely spiral shape",
"It makes a lovely circle"]
radioq(choices, 2; keep_order=true)
```
What is the value of $S'(x)^2 + C'(x)^2$ when $x=\pi$?
```{julia}
#| echo: false
numericq(1)
```
###### Question
Define a function with parameter $\alpha \geq 1$ by:
$$
\gamma(x; \alpha) = \int_0^x \exp(-t) t^{\alpha-1} dt, \quad x > 0
$$
What is the ratio of $\gamma'(2; 3) / \gamma'(2; 4)$?
```{julia}
#| echo: false
df(x,alpha) = exp(-x)*x^(alpha -1)
numericq(df(2,3)/df(2,4))
```
###### Question
Define a function
$$
i(x) = \int_0^{x^2} \exp(-t) t^{1/2} dt
$$
What is the derivative if $i$?
```{julia}
#| echo: false
choices = [L"\exp(-x) x^{1/2}",
L"\exp(-x) x^{1/2} \cdot 2x",
L"\exp(-x^2) (x^2)^{1/2}",
L"\exp(-x^2) (x^2)^{1/2}\cdot 2x"]
radioq(choices, 4; keep_order=true)
```
###### Question
The function `sinint` from `SpecialFunctions` computes
$$
F(x) = \int_0^x \frac{\sin(t)}{t} dt = \int_0^x \phi(t) dt,
$$
Where we define $\phi$ above to be $1$ when $t=0$, so that it will be continuous over $[0,x]$.
A related integral might be:
$$
G(x) = \int_0^x \frac{\sin(\pi t)}{\pi t} dt = \int_0^x \phi(\pi t) dt
$$
As this is an integral involving a simple transformation of $\phi(x)$, we can see that $G(x) = (1/\pi) F(\pi x)$. What is the derivative of $G$?
```{julia}
#| echo: false
choices = [
L"\phi(x)",
L"\phi(\pi x)",
L"\pi \phi(\pi x)"
]
radioq(choices, 2; keep_order=true)
```
###### Question
@@ -1138,12 +1401,14 @@ radioq(choices, answ, keep_order=true)
Barrow presented a version of the fundamental theorem of calculus in a 1670 volume edited by Newton, Barrow's student (cf. [Wagner](http://www.maa.org/sites/default/files/0746834234133.di020795.02p0640b.pdf)). His version can be stated as follows (cf. [Jardine](http://www.maa.org/publications/ebooks/mathematical-time-capsules)):
Consider the following figure where $f$ is a strictly increasing function with $f(0) = 0$. and $x > 0$. The function $A(x) = \int_0^x f(u) du$ is also plotted. The point $Q$ is $f(x)$, and the point $P$ is $A(x)$. The point $T$ is chosen to so that the length between $T$ and $x$ times the length between $Q$ and $x$ equals the length from $P$ to $x$. ($\lvert Tx \rvert \cdot \lvert Qx \rvert = \lvert Px \rvert$.) Barrow showed that the line segment $PT$ is tangent to the graph of $A(x)$. This figure illustrates the labeling for some function:
Consider the following figure where $f$ is a strictly increasing function with $f(0) = 0$. and $x > 0$. The function $A(x) = \int_0^x f(u) du$ is also plotted with a dashed red line. The point $Q$ is $f(x)$, and the point $P$ is $A(x)$. The point $T$ is chosen to so that the length between $T$ and $x$ times the length between $Q$ and $x$ equals the length from $P$ to $x$. ($\lvert Tx \rvert \cdot \lvert Qx \rvert = \lvert Px \rvert$.) Barrow showed that the line segment $PT$ is tangent to the graph of $A(x)$. This figure illustrates the labeling for some function:
```{julia}
#| hold: true
#| echo: false
let
gr()
f(x) = x^(2/3)
x = 2
A(x) = quadgk(f, 0, x)[1]
@@ -1154,14 +1419,21 @@ P = A(x)
secpt = u -> 0 + P/(x-T) * (u-T)
xs = range(0, stop=x+1/4, length=50
)
p = plot(f, 0, x + 1/4, legend=false)
plot!(p, A, 0, x + 1/4, color=:red)
p = plot(f, 0, x + 1/4, legend=false, line=(:black,2))
plot!(p, A, 0, x + 1/4, line=(:red, 2,:dash))
scatter!(p, [T, x, x, x], [0, 0, Q, P], color=:orange)
annotate!(p, collect(zip([T, x, x+.1, x+.1], [0-.15, 0-.15, Q-.1, P], ["T", "x", "Q", "P"])))
annotate!(p, collect(zip([T, x, x+.1, x+.1], [0-.15, 0-.15, Q-.1, P], [L"T", L"x", L"Q", L"P"])))
plot!(p, [T-1/4, x+1/4], map(secpt, [T-1/4, x + 1/4]), color=:orange)
plot!(p, [T, x, x], [0, 0, P], color=:green)
p
p
end
```
```{julia}
#| echo: false
plotly()
nothing
```
The fact that $\lvert Tx \rvert \cdot \lvert Qx \rvert = \lvert Px \rvert$ says what in terms of $f(x)$, $A(x)$ and $A'(x)$?

View File

@@ -1,4 +1,4 @@
# Improper Integrals
# Improper integrals
{{< include ../_common_code.qmd >}}
@@ -33,20 +33,26 @@ function make_sqrt_x_graph(n)
b = 1
a = 1/2^n
xs = range(1/2^8, stop=b, length=250)
x1s = range(a, stop=b, length=50)
xs = range(1/2^n, stop=b, length=1000)
x1s = range(a, stop=b, length=1000)
@syms x
f(x) = 1/sqrt(x)
val = N(integrate(f(x), (x, 1/2^n, b)))
title = "area under f over [1/$(2^n), $b] is $(rpad(round(val, digits=2), 4))"
plt = plot(f, range(a, stop=b, length=251), xlim=(0,b), ylim=(0, 15), legend=false, size=fig_size, title=title)
plot!(plt, [b, a, x1s...], [0, 0, map(f, x1s)...], linetype=:polygon, color=:orange)
title = L"area under $f$ over $[2^{-%$n}, %$b]$ is $%$(rpad(round(val, digits=2), 4))$"
plt = plot(f, range(a, stop=b, length=1000);
xlim=(0,b), ylim=(0, 15),
legend=false,
title=title)
plot!(plt, [b, a, x1s...], [0, 0, map(f, x1s)...];
linetype=:polygon, color=:orange)
plt
end
caption = L"""
Area under $1/\sqrt{x}$ over $[a,b]$ increases as $a$ gets closer to $0$. Will it grow unbounded or have a limit?
@@ -133,7 +139,7 @@ The limit is infinite, so does not exist except in an extended sense.
Before showing this, we recall the fundamental theorem of calculus. The limit existing is the same as saying the limit of $F(M) - F(a)$ exists for an antiderivative of $f(x)$.
For this particular problem, it can be shown by integration by parts that for positive, integer values of $n$ that an antiderivative exists of the form $F(x) = p(x)e^{-x}$, where $p(x)$ is a polynomial of degree $n$. But we've seen that for any $n>0$, $\lim_{x \rightarrow \infty} x^n e^{-x} = 0$, so the same is true for any polynomial. So, $\lim_{M \rightarrow \infty} F(M) - F(1) = -F(1)$.
For this particular problem, it can be shown with integration by parts that for positive, integer values of $n$ that an antiderivative exists of the form $F(x) = p(x)e^{-x}$, where $p(x)$ is a polynomial of degree $n$. But we've seen that for any $n>0$, $\lim_{x \rightarrow \infty} x^n e^{-x} = 0,$ so the same is true for any polynomial. So, $\lim_{M \rightarrow \infty} F(M) - F(1) = -F(1)$.
* The function $e^x$ is integrable over $(-\infty, a]$ but not
@@ -161,7 +167,7 @@ $$
= \int_{\log(e)}^{\log(M)} \frac{1}{u^{2}} du
= \frac{-1}{u} \big|_{1}^{\log(M)}
= \frac{-1}{\log(M)} - \frac{-1}{1}
= 1 - \frac{1}{M}.
= 1 - \frac{1}{\log(M)}.
$$
As $M$ goes to $\infty$, this will converge to $1$.
@@ -175,6 +181,87 @@ As $M$ goes to $\infty$, this will converge to $1$.
limit(sympy.Si(M), M => oo)
```
##### Example
To formally find the limit as $x\rightarrow \infty$ of
$$
\text{Si}(x) = \int_0^\infty \frac{\sin(t)}{t} dt
$$
we introduce a trick and rely on some theorems that have not been discussed.
First, we notice that $\Si(x)$ is the value of $I(\alpha)$ when $\alpha=0$ where
$$
I(\alpha) = \int_0^\infty \exp(-\alpha t) \frac{\sin(t)}{t} dt
$$
We differentiate $I$ in $\alpha$ to get:
$$
\begin{align*}
I'(\alpha) &= \frac{d}{d\alpha} \int_0^\infty \exp(-\alpha t) \frac{\sin(t)}{t} dt \\
&= \int_0^\infty \frac{d}{d\alpha} \exp(-\alpha t) \frac{\sin(t)}{t} dt \\
&= \int_0^\infty (-t) \exp(-\alpha t) \frac{\sin(t)}{t} dt \\
&= -\int_0^\infty \exp(-\alpha t) \sin(t) dt \\
\end{align*}
$$
As illustrated previously, this integral can be integrated by parts, though here we have infinite limits and have adjusted for the minus sign:
$$
\begin{align*}
-I'(\alpha) &= \int_0^\infty \exp(-\alpha t) \sin(t) dt \\
&=\sin(t) \frac{-\exp(-\alpha t)}{\alpha} \Big|_0^\infty -
\int_0^\infty \frac{-\exp(-\alpha t)}{\alpha} \cos(t) dt \\
&= 0 + \frac{1}{\alpha} \cdot \int_0^\infty \exp(-\alpha t) \cos(t) dt \\
&= \frac{1}{\alpha} \cdot \cos(t)\frac{-\exp(-\alpha t)}{\alpha} \Big|_0^\infty -
\frac{1}{\alpha} \cdot \int_0^\infty \frac{-\exp(-\alpha t)}{\alpha} (-\sin(t)) dt \\
&= \frac{1}{\alpha^2} - \frac{1}{\alpha^2} \cdot \int_0^\infty \exp(-\alpha t) \sin(t) dt
\end{align*}
$$
Combining gives:
$$
\left(1 + \frac{1}{\alpha^2}\right) \int_0^\infty \exp(-\alpha t) \sin(t) dt = \frac{1}{\alpha^2}
$$
Solving gives the desired integral as
$$
I'(\alpha) = -\frac{1}{\alpha^2} / (1 + \frac{1}{\alpha^2}) = -\frac{1}{1 + \alpha^2}.
$$
This has a known antiderivative: $I(\alpha) = -\tan^{-1}(\alpha) + C$. As $\alpha \rightarrow \infty$ *if* we can pass the limit *inside* the integral, then $I(\alpha) \rightarrow 0$. So $\lim_{x \rightarrow \infty} -\tan^{-1}(x) + C = 0$ or $C = \pi/2$.
As our question is answered by $I(0)$, we get $I(0) = \tan^{-1}(0) + C = C = \pi/2$.
The above argument requires two places where a *limit* is passed inside the integral. The first involved the derivative. The [Leibniz integral rule](https://en.wikipedia.org/wiki/Leibniz_integral_rule) can be used to verify the first use is valid:
:::{.callout-note icon=false}
## Leibniz integral rule
If $f(x,t)$ and the derivative in $x$ for a fixed $t$ is continuous (to be discussed later) in a region containing $a(x) \leq t \leq b(x)$ and $x_0 < x < x_1$ and both $a(x)$ and $b(x)$ are continuously differentiable, then
$$
\frac{d}{dx}\int_{a(x)}^{b(x)} f(x, t) dt =
\int_{a(x)}^{b(x)} \frac{d}{dx}f(x,t) dt +
f(x, b(x)) \frac{d}{dx}b(x) - f(x, a(x)) \frac{d}{dx}a(x).
$$
:::
This extends the fundamental theorem of calculus for cases where the integrand also depends on $x$. In our use, both $a'(x)$ and $b'(x)$ are $0$.
[Uniform convergence](https://en.wikipedia.org/wiki/Uniform_convergence) can be used to establish the other.
### Numeric integration
@@ -334,6 +421,125 @@ We want to just say $F'(x)= e^{-x}$ so $f(x) = e^{-x}$. But some care is needed.
Finally, at $x=0$ we have an issue, as $F'(0)$ does not exist. The left limit of the secant line approximation is $0$, the right limit of the secant line approximation is $1$. So, we can take $f(x) = e^{-x}$ for $x > 0$ and $0$ otherwise, noting that redefining $f(x)$ at a point will not effect the integral as long as the point is finite.
## Application to series
In this application, we compare a series to a related integral to decide convergence or divergence of the series.
:::{.callout-note appearance="minimal"}
#### The integral test
Consider a continuous, monotone decreasing function $f(x)$ defined on some interval of the form $[N,\infty)$. Let $a_n = f(n)$ and $s_n = \sum_{k=N}^n a_n$.
* If $\int_N^\infty f(x) dx < \infty$ then the partial sums converge.
* If $\int_N^\infty f(x) dx = \infty$ then the partial sums diverge.
:::
By the monotone nature of $f(x)$, we have on any interval of the type $[i, i+1)$ for $i$ an integer, that $f(i) \geq f(x) \geq f(i+1)$ when $x$ is in the interval. For integrals, this leads to
$$
f(i) = \int_i^{i+1} f(i) dx
\geq \int_i^{i+1} f(x) dx
\geq \int_i^{i+1} f(i+1) dx = f(i+1)
$$
Now if $N$ is an integer we have
$$
\int_N^\infty f(x) dx = \sum_{i=N}^\infty \int_i^{i+1} f(x) dx
$$
This translates to this bound
$$
\sum_{i=N}^\infty f(i)
\leq \int_N^\infty f(x) dx
\leq \sum_{i=N}^\infty f(i+1) = \sum_{i=N+1}^\infty f(i)
$$
If the integral converges, the first inequality implies the series converges; if the integral diverges, the second inequality implies the series diverges.
### Example
The $p$-series test is an immediate consequence, as the integral
$$
\int_1^\infty \frac{1}{x^p} dx
$$
converges when $p>1$ and diverges when $p \leq 1$.
### Example
Let
$$
a_n = \frac{1}{n \cdot \ln(n) \cdot \ln(\ln(n))^2}
$$
Does $\sum a_n$ *converge*?
We use `SymPy` to integrate:
```{julia}
@syms x::real
f(x) = 1 / (x * log(x) * log(log(x))^2)
integrate(f(x), (x, 3^2, oo))
```
That this is finite shows the series converges.
---
The integral of a power series can be computed easily for some $x$:
:::{.callout-note appearance="minimal"}
### The integral of a power series
Suppose $f(x) = \sum_n a_n (x-c)^n$ is a power series about $x=c$ with radius of convergence $r > 0$. [Then](https://en.wikipedia.org/wiki/Power_series#Differentiation_and_integration) the limits of the integral and the sum can be switched around when $x$ is within the radius of convergence:
$$
\begin{align*}
\int f(x) dx
&= \int \sum_n a_n(x-c)^n dx\\
&= \sum_{n=0}^\infty \int a_n(x-c)^n dx\\
= \sum_{n=0}^\infty a_n \frac{(x-c)^{n+1}}{n+1}
\end{align*}
$$
The radius of convergence of this new power series is also $r$.
:::
This geometric series has a well known value ($|r| < 1$):
$$
\frac{1}{1-r} = 1 + r + r^2 + \cdots
$$
This gives rise to
$$
\ln(1 - r) = -(r + r^2/2 + r^3/3 + \cdots).
$$
The power series for $e^x$ is
$$
e^x = \sum_{n=0}^\infty \frac{x^n}{n!}
$$
This has integral then given by
$$
\int e^x dx = \sum_{n=0}^\infty \frac{x^{n+1}}{n+1}\frac{1}{n!}
= \sum_{n=0}^\infty \frac{x^{n+1}}{(n+1)!}
= \sum_{n=1}^\infty \frac{x^n}{n!}
= e^x - 1
$$
The $-1$ is just a constant so the antiderivative of $e^x$ is $e^x$.
## Questions

View File

@@ -1,4 +1,4 @@
# Integration By Parts
# Integration by parts
{{< include ../_common_code.qmd >}}
@@ -27,56 +27,98 @@ So far we have seen that the *derivative* rules lead to *integration rules*. In
* The sum rule $[au(x) + bv(x)]' = au'(x) + bv'(x)$ gives rise to an integration rule: $\int (au(x) + bv(x))dx = a\int u(x)dx + b\int v(x))dx$. (That is, the linearity of the derivative means the integral has linearity.)
* The chain rule $[f(g(x))]' = f'(g(x)) g'(x)$ gives $\int_a^b f(g(x))g'(x)dx=\int_{g(a)}^{g(b)}f(x)dx$. That is, substitution reverses the chain rule.
Now we turn our attention to the implications of the *product rule*: $[uv]' = u'v + uv'$. The resulting technique is called integration by parts.
::: {.callout-note}
## Integration by parts
The following illustrates integration by parts of the integral $(uv)'$ over $[a,b]$ [original](http://en.wikipedia.org/wiki/Integration_by_parts#Visualization).
By the fundamental theorem of calculus:
$$
[u(x)\cdot v(x)]\Big|_a^b = \int_a^b [u(x) v(x)]' dx = \int_a^b u'(x) \cdot v(x) dx + \int_a^b u(x) \cdot v'(x) dx.
$$
Or,
$$
\int_a^b u(x) v'(x) dx = [u(x)v(x)]\Big|_a^b - \int_a^b v(x) u'(x)dx.
$$
:::
The following visually illustrates integration by parts:
```{julia}
#| echo: false
#| label: fig-integration-by-parts
#| fig-cap: "Integration by parts figure ([original](http://en.wikipedia.org/wiki/Integration_by_parts#Visualization))"
let
## parts picture
u(x) = sin(x*pi/2)
v(x) = x
xs = range(0, stop=1, length=50)
a,b = 1/4, 3/4
p = plot(u, v, 0, 1, legend=false)
plot!(p, zero, 0, 1)
scatter!(p, [u(a), u(b)], [v(a), v(b)], color=:orange, markersize=5)
gr()
u(x) = sin(x*pi/2)
v(x) = x
xs = range(0, stop=1, length=50)
a,b = 1/4, 3/4
plot!(p, [u(a),u(a),0, 0, u(b),u(b),u(a)],
[0, v(a), v(a), v(b), v(b), 0, 0],
linetype=:polygon, fillcolor=:orange, alpha=0.25)
annotate!(p, [(0.65, .25, "A"), (0.4, .55, "B")])
annotate!(p, [(u(a),v(a) + .08, "(u(a),v(a))"), (u(b),v(b)+.08, "(u(b),v(b))")])
p = plot(u, v, 0, 1; legend=false, axis=([], false), line=(:black,2))
plot!([0, u(1)], [0,0]; line=(:gray, 1), arrow=true, side=:head)
plot!([0, 0], [0, v(1) ]; line=(:gray, 1), arrow=true, side=:head)
xs = range(a, b, length=50)
plot!(Shape(vcat(u.(xs), reverse(u.(xs))),
vcat(zero.(xs), v.(reverse(xs)))),
fill=(:red, 0.15),
xlims=(-0.07, 1)
)
plot!(Shape([0,u(a),u(a),0],[0,0,v(a),v(a)]), fill=(:royalblue, 0.5))
scatter!(p, [u(a), u(b)], [v(a), v(b)], color=:mediumorchid3, markersize=5)
plot!(p, [u(a),u(a),0, 0, u(b),u(b),u(a)],
[0, v(a), v(a), v(b), v(b), 0, 0],
linetype=:polygon, fill=(:brown3, 0.25))
annotate!(p, [(0.65, .25, text(L"A")),
(0.4, .55, text(L"B")),
(u(a),v(a), text(L"(u(a),v(a))", :bottom, :right)),
(u(b),v(b), text(L"(u(b),v(b))", :bottom, :right)),
(u(a),0, text(L"u(a)", :top)),
(u(b),0, text(L"u(b)", :top)),
(0, v(a), text(L"v(a)", :right)),
(0, v(b), text(L"v(b)", :right)),
(0,0, text(L"(0,0)", :top))
])
end
```
The figure is a parametric plot of $(u,v)$ with the points $(u(a), v(a))$ and $(u(b), v(b))$ marked. The difference $u(b)v(b) - u(a)v(a) = u(x)v(x) \mid_a^b$ is shaded. This area breaks into two pieces, $A$ and $B$, partitioned by the curve. If $u$ is increasing and the curve is parameterized by $t \rightarrow u^{-1}(t)$, then $A=\int_{u^{-1}(a)}^{u^{-1}(b)} v(u^{-1}(t))dt$. A $u$-substitution with $t = u(x)$ changes this into the integral $\int_a^b v(x) u'(x) dx$. Similarly, for increasing $v$, it can be seen that $B=\int_a^b u(x) v'(x) dx$. This suggests a relationship between the integral of $u v'$, the integral of $u' v$ and the value $u(b)v(b) - u(a)v(a)$.
```{julia}
#| echo: false
plotly()
nothing
```
@fig-integration-by-parts shows a parametric plot of $(u(t),v(t))$ for $a \leq t \leq b$..
In terms of formulas, by the fundamental theorem of calculus:
The total shaded area, a rectangle, is $u(b)v(b)$, the area of $A$ and $B$ combined is just $u(b)v(b) - u(a)v(a)$ or $[u(x)v(x)]\Big|_a^b$. We will show that $A$ is $\int_a^b v(x)u'(x)dx$ and $B$ is $\int_a^b u(x)v'(x)dx$ giving the formula.
We can compute $A$ by a change of variables with $x=u^{-1}(t)$ (so $u'(x)dx = dt$):
$$
u(x)\cdot v(x)\big|_a^b = \int_a^b [u(x) v(x)]' dx = \int_a^b u'(x) \cdot v(x) dx + \int_a^b u(x) \cdot v'(x) dx.
\begin{align*}
A &= \int_{u(a)}^{u(b)} v(u^{-1}(t)) dt & \text{let } x = u^{-1}(t) \text{ or }u(x) = t \\
&= \int_{u^{-1}(u(a))}^{u^{-1}(u(b))} v(x) u'(x) dx \\
&= \int_a^b v(x) u'(x) dx.
\end{align*}
$$
This is re-expressed as
$B$ is similar with the roles of $u$ and $v$ reversed.
---
$$
\int_a^b u(x) \cdot v'(x) dx = u(x) \cdot v(x)\big|_a^b - \int_a^b v(x) \cdot u'(x) dx,
$$
Or, more informally, as $\int udv = uv - \int v du$.
This can sometimes be confusingly written as:
Informally, the integration by parts formula is sometimes seen as $\int udv = uv - \int v du$, as well can be somewhat confusingly written as:
$$
@@ -86,8 +128,7 @@ $$
(The confusion coming from the fact that the indefinite integrals are only defined up to a constant.)
How does this help? It allows us to differentiate parts of an integral in hopes it makes the result easier to integrate.
How does this formula help? It allows us to differentiate parts of an integral in hopes it makes the result easier to integrate.
An illustration can clarify.
@@ -95,15 +136,16 @@ An illustration can clarify.
Consider the integral $\int_0^\pi x\sin(x) dx$. If we let $u=x$ and $dv=\sin(x) dx$, then $du = 1dx$ and $v=-\cos(x)$. The above then says:
$$
\begin{align*}
\int_0^\pi x\sin(x) dx &= \int_0^\pi u dv\\
&= uv\big|_0^\pi - \int_0^\pi v du\\
&= x \cdot (-\cos(x)) \big|_0^\pi - \int_0^\pi (-\cos(x)) dx\\
&= uv\Big|_0^\pi - \int_0^\pi v du\\
&= x \cdot (-\cos(x)) \Big|_0^\pi - \int_0^\pi (-\cos(x)) dx\\
&= \pi (-\cos(\pi)) - 0(-\cos(0)) + \int_0^\pi \cos(x) dx\\
&= \pi + \sin(x)\big|_0^\pi\\
&= \pi + \sin(x)\Big|_0^\pi\\
&= \pi.
\end{align*}
$$
The technique means one part is differentiated and one part integrated. The art is to break the integrand up into a piece that gets easier through differentiation and a piece that doesn't get much harder through integration.
@@ -129,14 +171,15 @@ $$
Putting together gives:
$$
\begin{align*}
\int_1^2 x \log(x) dx
&= (\log(x) \cdot \frac{x^2}{2}) \big|_1^2 - \int_1^2 \frac{x^2}{2} \frac{1}{x} dx\\
&= (2\log(2) - 0) - (\frac{x^2}{4})\big|_1^2\\
&= (\log(x) \cdot \frac{x^2}{2}) \Big|_1^2 - \int_1^2 \frac{x^2}{2} \frac{1}{x} dx\\
&= (2\log(2) - 0) - (\frac{x^2}{4})\Big|_1^2\\
&= 2\log(2) - (1 - \frac{1}{4}) \\
&= 2\log(2) - \frac{3}{4}.
\end{align*}
$$
##### Example
@@ -145,14 +188,15 @@ Putting together gives:
This related problem, $\int \log(x) dx$, uses the same idea, though perhaps harder to see at first glance, as setting `dv=dx` is almost too simple to try:
$$
\begin{align*}
u &= \log(x) & dv &= dx\\
du &= \frac{1}{x}dx & v &= x
\end{align*}
$$
$$
\begin{align*}
\int \log(x) dx
&= \int u dv\\
@@ -161,13 +205,14 @@ du &= \frac{1}{x}dx & v &= x
&= x \log(x) - \int dx\\
&= x \log(x) - x
\end{align*}
$$
Were this a definite integral problem, we would have written:
$$
\int_a^b \log(x) dx = (x\log(x))\big|_a^b - \int_a^b dx = (x\log(x) - x)\big|_a^b.
\int_a^b \log(x) dx = (x\log(x))\Big|_a^b - \int_a^b dx = (x\log(x) - x)\Big|_a^b.
$$
##### Example
@@ -177,14 +222,14 @@ Sometimes integration by parts is used two or more times. Here we let $u=x^2$ an
$$
\int_a^b x^2 e^x dx = (x^2 \cdot e^x)\big|_a^b - \int_a^b 2x e^x dx.
\int_a^b x^2 e^x dx = (x^2 \cdot e^x)\Big|_a^b - \int_a^b 2x e^x dx.
$$
But we can do $\int_a^b x e^xdx$ the same way:
$$
\int_a^b x e^x = (x\cdot e^x)\big|_a^b - \int_a^b 1 \cdot e^xdx = (xe^x - e^x)\big|_a^b.
\int_a^b x e^x = (x\cdot e^x)\Big|_a^b - \int_a^b 1 \cdot e^xdx = (xe^x - e^x)\Big|_a^b.
$$
Combining gives the answer:
@@ -192,8 +237,8 @@ Combining gives the answer:
$$
\int_a^b x^2 e^x dx
= (x^2 \cdot e^x)\big|_a^b - 2( (xe^x - e^x)\big|_a^b ) =
e^x(x^2 - 2x + 2) \big|_a^b.
= (x^2 \cdot e^x)\Big|_a^b - 2( (xe^x - e^x)\Big|_a^b ) =
e^x(x^2 - 2x + 2) \Big|_a^b.
$$
In fact, it isn't hard to see that an integral of $x^m e^x$, $m$ a positive integer, can be handled in this manner. For example, when $m=10$, `SymPy` gives:
@@ -210,14 +255,29 @@ The general answer is $\int x^n e^xdx = p(x) e^x$, where $p(x)$ is a polynomial
##### Example
The same technique is attempted for this integral, but ends differently. First in the following we let $u=\sin(x)$ and $dv=e^x dx$:
The same technique is attempted for the integral of $e^x\sin(x)$, but ends differently.
First we let $u=\sin(x)$ and $dv=e^x dx$, then
$$
du = \cos(x)dx \quad \text{and}\quad v = e^x.
$$
So:
$$
\int e^x \sin(x)dx = \sin(x) e^x - \int \cos(x) e^x dx.
$$
Now we let $u = \cos(x)$ and again $dv=e^x dx$:
Now we let $u = \cos(x)$ and again $dv=e^x dx$, then
$$
du = -\sin(x)dx \quad \text{and}\quad v = e^x.
$$
So:
$$
@@ -244,13 +304,14 @@ $$
Positive integer powers of trigonometric functions can be addressed by this technique. Consider $\int \cos(x)^n dx$. We let $u=\cos(x)^{n-1}$ and $dv=\cos(x) dx$. Then $du = (n-1)\cos(x)^{n-2}(-\sin(x))dx$ and $v=\sin(x)$. So,
$$
\begin{align*}
\int \cos(x)^n dx &= \cos(x)^{n-1} \cdot (\sin(x)) + \int (\sin(x)) ((n-1)\sin(x) \cos(x)^{n-2}) dx \\
&= \sin(x) \cos(x)^{n-1} + (n-1)\int \sin^2(x) \cos(x)^{n-2} dx\\
&= \sin(x) \cos(x)^{n-1} + (n-1)\int (1 - \cos(x)^2) \cos(x)^{n-2} dx\\
&= \sin(x) \cos(x)^{n-1} + (n-1)\int \cos(x)^{n-2}dx - (n-1)\int \cos(x)^n dx.
\end{align*}
$$
We can then solve for the unknown ($\int \cos(x)^{n}dx$) to get this *reduction formula*:
@@ -263,7 +324,7 @@ $$
This is called a reduction formula as it reduces the problem from an integral with a power of $n$ to one with a power of $n - 2$, so could be repeated until the remaining indefinite integral required knowing either $\int \cos(x) dx$ (which is $-\sin(x)$) or $\int \cos(x)^2 dx$, which by a double angle formula application, is $x/2 + \sin(2x)/4$.
`SymPy` is quite able to do this repeated bookkeeping. For example with $n=10$:
`SymPy` is able and willing to do this repeated bookkeeping. For example with $n=10$:
```{julia}
@@ -279,12 +340,13 @@ The visual interpretation of integration by parts breaks area into two pieces, t
Let $uv = x f^{-1}(x)$. Then we have $[uv]' = u'v + uv' = f^{-1}(x) + x [f^{-1}(x)]'$. So, up to a constant $uv = \int [uv]'dx = \int f^{-1}(x)dx + \int x [f^{-1}(x)]'dx$. Re-expressing gives:
$$
\begin{align*}
\int f^{-1}(x) dx
&= xf^{-1}(x) - \int x [f^{-1}(x)]' dx\\
&= xf^{-1}(x) - \int f(u) du.\\
\end{align*}
$$
The last line follows from the $u$-substitution: $u=f^{-1}(x)$ for then $du = [f^{-1}(x)]' dx$ and $x=f(u)$.
@@ -293,12 +355,13 @@ The last line follows from the $u$-substitution: $u=f^{-1}(x)$ for then $du = [f
We use this to find an antiderivative for $\sin^{-1}(x)$:
$$
\begin{align*}
\int \sin^{-1}(x) dx &= x \sin^{-1}(x) - \int \sin(u) du \\
&= x \sin^{-1}(x) + \cos(u) \\
&= x \sin^{-1}(x) + \cos(\sin^{-1}(x)).
\end{align*}
$$
Using right triangles to simplify, the last value $\cos(\sin^{-1}(x))$ can otherwise be written as $\sqrt{1 - x^2}$.
@@ -310,7 +373,7 @@ Using right triangles to simplify, the last value $\cos(\sin^{-1}(x))$ can other
The [trapezoid](http://en.wikipedia.org/wiki/Trapezoidal_rule) rule is an approximation to the definite integral like a Riemann sum, only instead of approximating the area above $[x_i, x_i + h]$ by a rectangle with height $f(c_i)$ (for some $c_i$), it uses a trapezoid formed by the left and right endpoints. That is, this area is used in the estimation: $(1/2)\cdot (f(x_i) + f(x_i+h)) \cdot h$.
Even though we suggest just using `quadgk` for numeric integration, estimating the error in this approximation is still of some theoretical interest.
Even though we suggest just using `quadgk` for numeric integration, estimating the error in this approximation is of theoretical interest.
Recall, just using *either* $x_i$ or $x_{i-1}$ for $c_i$ gives an error that is "like" $1/n$, as $n$ gets large, though the exact rate depends on the function and the length of the interval.
@@ -319,17 +382,18 @@ Recall, just using *either* $x_i$ or $x_{i-1}$ for $c_i$ gives an error that is
This [proof](http://www.math.ucsd.edu/~ebender/20B/77_Trap.pdf) for the error estimate is involved, but is reproduced here, as it nicely integrates many of the theoretical concepts of integration discussed so far.
First, for convenience, we consider the interval $x_i$ to $x_i+h$. The actual answer over this is just $\int_{x_i}^{x_i+h}f(x) dx$. By a $u$-substitution with $u=x-x_i$ this becomes $\int_0^h f(t + x_i) dt$. For analyzing this we integrate once by parts using $u=f(t+x_i)$ and $dv=dt$. But instead of letting $v=t$, we choose to add - as is our prerogative - a constant of integration $A$, so $v=t+A$:
First, for convenience, we consider the interval $x_i$ to $x_i+h$. The actual answer over this is just $\int_{x_i}^{x_i+h}f(x) dx$. By a $u$-substitution with $u=x-x_i$ this becomes $\int_0^h f(t + x_i) dt$. For analyzing this we integrate once by parts using $u=f(t+x_i)$ and $dv=dt$. But instead of letting $v=t$, we choose to add---as is our prerogative---a constant of integration $A$, so $v=t+A$:
$$
\begin{align*}
\int_0^h f(t + x_i) dt &= uv \big|_0^h - \int_0^h v du\\
&= f(t+x_i)(t+A)\big|_0^h - \int_0^h (t + A) f'(t + x_i) dt.
\int_0^h f(t + x_i) dt &= uv \Big|_0^h - \int_0^h v du\\
&= f(t+x_i)(t+A)\Big|_0^h - \int_0^h (t + A) f'(t + x_i) dt.
\end{align*}
$$
We choose $A$ to be $-h/2$, any constant is possible, for then the term $f(t+x_i)(t+A)\big|_0^h$ becomes $(1/2)(f(x_i+h) + f(x_i)) \cdot h$, or the trapezoid approximation. This means, the error over this interval - actual minus estimate - satisfies:
We choose $A$ to be $-h/2$, any constant is possible, for then the term $f(t+x_i)(t+A)\Big|_0^h$ becomes $(1/2)(f(x_i+h) + f(x_i)) \cdot h$, or the trapezoid approximation. This means, the error over this interval - actual minus estimate - satisfies:
$$
@@ -339,18 +403,19 @@ $$
For this, we *again* integrate by parts with
$$
\begin{align*}
u &= f'(t + x_i) & dv &= (t + A)dt\\
du &= f''(t + x_i) & v &= \frac{(t + A)^2}{2} + B
\end{align*}
$$
Again we added a constant of integration, $B$, to $v$. The error becomes:
$$
\text{error}_i = -(\frac{(t+A)^2}{2} + B)f'(t+x_i)\big|_0^h + \int_0^h (\frac{(t+A)^2}{2} + B) \cdot f''(t+x_i) dt.
\text{error}_i = -\left(\frac{(t+A)^2}{2} + B\right)f'(t+x_i)\Big|_0^h + \int_0^h \left(\frac{(t+A)^2}{2} + B\right) \cdot f''(t+x_i) dt.
$$
With $A=-h/2$, $B$ is chosen so $(t+A)^2/2 + B = 0$ at endpoints, or $B=-h^2/8$. The error becomes
@@ -364,14 +429,14 @@ Now, we assume the $\lvert f''(t)\rvert$ is bounded by $K$ for any $a \leq t \le
$$
\lvert \text{error}_i \rvert \leq K \int_0^h \lvert (\frac{(t-h/2)^2}{2} - \frac{h^2}{8}) \rvert dt.
\lvert \text{error}_i \rvert \leq K \int_0^h \lVert \left(\frac{(t-h/2)^2}{2} - \frac{h^2}{8}\right) \rVert dt.
$$
But what is the function in the integrand? Clearly it is a quadratic in $t$. Expanding gives $1/2 \cdot (t^2 - ht)$. This is negative over $[0,h]$ (and $0$ at these endpoints, so the integral above is just:
$$
\frac{1}{2}\int_0^h (ht - t^2)dt = \frac{1}{2} (\frac{ht^2}{2} - \frac{t^3}{3})\big|_0^h = \frac{h^3}{12}
\frac{1}{2}\int_0^h (ht - t^2)dt = \frac{1}{2} \left(\frac{ht^2}{2} - \frac{t^3}{3}\right)\Big|_0^h = \frac{h^3}{12}
$$
This gives the bound: $\vert \text{error}_i \rvert \leq K h^3/12$. The *total* error may be less, but is not more than the value found by adding up the error over each of the $n$ intervals. As our bound does not depend on the $i$, we have this sum satisfies:
@@ -418,13 +483,14 @@ We added a rectangle for a Riemann sum for $t_i = \pi/3$ and $t_{i+1} = \pi/3 +
Taking this Riemann sum approach, we can approximate the area under the curve parameterized by $(u(t), v(t))$ over the time range $[t_i, t_{i+1}]$ as a rectangle with height $y(t_i)$ and base $x(t_{i}) - x(t_{i+1})$. Then we get, as expected:
$$
\begin{align*}
A &\approx \sum_i y(t_i) \cdot (x(t_{i}) - x(t_{i+1}))\\
&= - \sum_i y(t_i) \cdot (x(t_{i+1}) - x(t_{i}))\\
&= - \sum_i y(t_i) \cdot \frac{x(t_{i+1}) - x(t_i)}{t_{i+1}-t_i} \cdot (t_{i+1}-t_i)\\
&\approx -\int_a^b y(t) x'(t) dt.
\end{align*}
$$
So with a counterclockwise rotation, the actual answer for the area includes a minus sign. If the area is traced out in a *clockwise* manner, there is no minus sign.

View File

@@ -15,7 +15,8 @@ files = (
"center_of_mass",
"volumes_slice",
"arc_length",
"surface_area",
"surface_area",
"orthogonal_polynomials",
"twelve-qs",
)

View File

@@ -66,14 +66,18 @@ $$
\text{average} = \frac{1}{\pi-0} \int_0^\pi \sin(x) dx = \frac{1}{\pi} (-\cos(x)) \big|_0^\pi = \frac{2}{\pi}
$$
Visually, we have:
Visually:
```{julia}
plot(sin, 0, pi)
plot!(x -> 2/pi)
#| label: fig-integral-mean-value
#| fig-cap: "Area under sine curve is equal to area of rectangle"
plot(sin, 0, pi, legend=false, fill=(:forestgreen, 0.25, 0))
plot!(x -> 2/pi, fill=(:royalblue, 0.25, 0))
```
In @fig-integral-mean-value the area under the sine curve ($2 = (-\cos(\pi)) - (-\cos(0))$) is equal to the area under the average (also $2 = 2/\pi \cdot \pi$).
##### Example
@@ -89,21 +93,22 @@ Though not continuous, $f(x)$ is integrable as it contains only jumps. The integ
What is the average value of the function $e^{-x}$ between $0$ and $\log(2)$?
$$
\begin{align*}
\text{average} = \frac{1}{\log(2) - 0} \int_0^{\log(2)} e^{-x} dx\\
\text{average} &= \frac{1}{\log(2) - 0} \int_0^{\log(2)} e^{-x} dx\\
&= \frac{1}{\log(2)} (-e^{-x}) \big|_0^{\log(2)}\\
&= -\frac{1}{\log(2)} (\frac{1}{2} - 1)\\
&= \frac{1}{2\log(2)}.
\end{align*}
$$
Visualizing, we have
```{julia}
plot(x -> exp(-x), 0, log(2))
plot!(x -> 1/(2*log(2)))
plot(x -> exp(-x), 0, log(2), legend=false, fill=(:forestgreen, 0.25, 0))
plot!(x -> 1/(2*log(2)), fill=(:royalblue, 0.25, 0))
```
## The mean value theorem for integrals
@@ -118,11 +123,15 @@ $$
When we assume that $f(x)$ is continuous, we can describe $K$ as a value in the range of $f$:
::: {.callout-note icon=false}
## The mean value theorem for integrals
> **The mean value theorem for integrals**: Let $f(x)$ be a continuous function on $[a,b]$ with $a < b$. Then there exists $c$ with $a \leq c \leq b$ with
>
> $f(c) \cdot (b-a) = \int_a^b f(x) dx.$`
Let $f(x)$ be a continuous function on $[a,b]$ with $a < b$. Then there exists $c$ with $a \leq c \leq b$ with
$$
f(c) \cdot (b-a) = \int_a^b f(x) dx.
$$
:::
The proof comes from the intermediate value theorem and the extreme value theorem. Since $f$ is continuous on a closed interval, there exists values $m$ and $M$ with $f(c_m) = m \leq f(x) \leq M=f(c_M)$, for some $c_m$ and $c_M$ in the interval $[a,b]$. Since $m \leq f(x) \leq M$, we must have:
@@ -135,10 +144,10 @@ $$
So in particular $K$ is in $[m, M]$. But $m$ and $M$ correspond to values of $f(x)$, so by the intermediate value theorem, $K=f(c)$ for some $c$ that must lie in between $c_m$ and $c_M$, which means as well that it must be in $[a,b]$.
##### Proof of second part of Fundamental Theorem of Calculus
##### Proof of the second part of the Fundamental Theorem of Calculus
The mean value theorem is exactly what is needed to prove formally the second part of the Fundamental Theorem of Calculus. Again, suppose $f(x)$ is continuous on $[a,b]$ with $a < b$. For any $a < x < b$, we define $F(x) = \int_a^x f(u) du$. Then the derivative of $F$ exists and is $f$.
The mean value theorem is exactly what is needed to formally prove the second part of the Fundamental Theorem of Calculus. Again, suppose $f(x)$ is continuous on $[a,b]$ with $a < b$. For any $a < x < b$, we define $F(x) = \int_a^x f(u) du$. Then the derivative of $F$ exists and is $f$.
Let $h>0$. Then consider the forward difference $(F(x+h) - F(x))/h$. Rewriting gives:

View File

@@ -0,0 +1,724 @@
# Orthogonal polynomials
{{< include ../_common_code.qmd >}}
This section uses these add-on packages:
```{julia}
using SymPy
using QuadGK
using Roots
using ForwardDiff: derivative
```
This section takes a detour to give some background on why the underlying method of `quadgk` is more efficient than those of Riemann sums. Orthogonal polynomials play a key role. There are many families of such polynomials. We highlight two.
## Inner product
Define an operation between two integrable, real-valued functions $f(x)$ and $g(x)$ by:
$$
\langle f, g \rangle = \int_{-1}^1 f(x)g(x) dx
$$
The properties of the integral mean this operation satisfies these three main properties:
* symmetry: $\langle f, g \rangle = \langle g,f \rangle$
* positive definiteness: $\langle f, f \rangle > 0$ *unless* $f(x)=0$.
* linearity: if $a$ and $b$ are scalars, then $\langle af + bg, h \rangle = a\langle f, h \rangle + b \langle g, h \rangle$.
The set of integrable functions forms a *vector space*, which simply means two such functions can be added to yield another integrable function and an integrable function times a scalar is still an integrable function. Many different collections of objects form a vector space. In particular, other sets of functions form a vector space, for example the collection of polynomials of degree $n$ or less or just the set of all polynomials.
For a vector space, an operation like the above satisfying these three properties is called an *inner product*; the combination of an inner product and a vector space is called an *inner product space*. In the following, we assume $f$ and $g$ are from a vector space with a real-valued inner product.
Inner products introduce a sense of size through a *norm*:
$\lVert f \rVert = \sqrt{\langle f, f\rangle }$.
Norms satisfy two main properties:
* scalar: $\lVert af \rVert = |a|\lVert f\rVert$
* triangle inequality: $\lVert f + g \rVert \leq \lvert f \rVert + \lVert g \rVert$
Two elements of an inner product space, $f$ and $g$, are *orthogonal* if $\langle f, g \rangle = 0$. This is a generalization of perpendicular. The Pythagorean theorem for orthogonal elements holds: $\lVert f\rVert^2 + \lVert g\rVert^2 = \lVert f+g\rVert^2$.
As we assume a real-valued inner product, the angle between two elements can be defined by:
$$
\angle(f,g) = \cos^{-1}\left(\frac{\langle f, g\rangle}{\lVert f \rVert \lVert g \rVert}\right).
$$
This says, the angle between two orthogonal elements is $90$ degrees (in some orientation)
The Cauchy-Schwarz inequality, $|\langle f, g \rangle| \leq \lVert f \rVert \lVert g\rVert$, for an inner product space, ensures the argument to $\cos^{-1}$ is between $-1$ and $1$.
These properties generalize two-dimensional vectors, with components $\langle x, y\rangle$. Recall, these can be visualized by placing a tail at the origin and a tip at the point $(x,y)$. Such vectors can be added by placing the tail of one at the tip of the other and using the vector from the other tail to the other tip.
With this, we have a vector anchored at the origin can be viewed as a line segment with slope $y/x$ (rise over run). A perpendicular line segment would have slope $-x/y$ (the negative reciprocal) which would be associated with the vector $\langle y, -x \rangle$. The dot product is just the sum of the multiplied components, or for these two vectors $x\cdot y + y\cdot (-x)$, which is $0$, as the line segments are perpendicular (orthogonal).
Consider now two vectors, say $f$, $g$. We can make a new vector that is orthogonal to $f$ by combining $g$ with a piece of $f$. But what piece?
Consider this
$$
\begin{align*}
\langle f, g - \frac{\langle f,g\rangle}{\langle f, f\rangle} f \rangle
&= \langle f, g \rangle - \langle f, \frac{\langle f,g\rangle}{\langle f, f\rangle} f \rangle \\
&= \langle f, g \rangle - \frac{\langle f,g\rangle}{\langle f, f\rangle}\langle f,f \rangle \\
&= \langle f, g \rangle - \langle f, g \rangle = 0
\end{align*}
$$
Define
$$
proj_f(g) = \frac{\langle f,g\rangle }{\langle f, f\rangle} f,
$$
then we have $u_1 = f$ and $u_2 = g-proj_f(g)$, $u_1$ and $u_2$ are orthogonal.
A similar calculation shows if $h$ is added to the set of elements, then
$u_3 = h - proj_{u_1}(h) - proj_{u_2}(h)$ will be orthogonal to $u_1$ and $u_2$. etc.
This process, called the [Gram-Schmidt](https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process) process, can turn any set of vectors into a set of orthogonal vectors, assuming they are all non zero and no non-trivial linear combination makes them zero.
## Legendre
Consider now polynomials of degree $n$ or less with the normalization that $p(1) = 1$. We begin with two such polynomials: $u_0(x) = 1$ and $u_1(x) = x$.
These are orthogonal with respect to $\int_{-1}^1 f(x) g(x) dx$, as
$$
\int_{-1}^1 u_0(x) u_1(x) dx =
\int_{-1}^1 1 \cdot x dx =
x^2 \mid_{-1}^1 = 1^2 - (-1)^2 = 0.
$$
Now consider a quadratic polynomial, $u_2(x) = ax^2 + bx + c$, we want a polynomial which is orthogonal to $u_0$ and $u_1$ with the extra condition that $u_2(1) = c =1$ (or $c=1$.). We can do this using Gram-Schmidt as above, or as here through a system of two equations:
```{julia}
@syms a b c d x
u0 = 1
u1 = x
u2 = a*x^2 + b*x + c
eqs = (integrate(u0 * u2, (x, -1, 1)) ~ 0,
integrate(u1 * u2, (x, -1, 1)) ~ 0)
sols = solve(eqs, (a, b, c)) # b => 0, a => -3c
u2 = u2(sols...)
u2 = simplify(u2 / u2(x=>1)) # make u2(1) = 1 and fix c
```
The quadratic polynomial has $3$ unknowns and the orthgonality conditions give two equations. Solving these equations leaves one unknown (`c`). But the normalization condition (that $u_i(1) = 1$) allows `c` to be simplified out.
We can do this again with $u_3$:
```{julia}
u3 = a*x^3 + b*x^2 + c*x + d
eqs = (integrate(u0 * u3, (x, -1, 1)) ~ 0,
integrate(u1 * u3, (x, -1, 1)) ~ 0,
integrate(u2 * u3, (x, -1, 1)) ~ 0)
sols = solve(eqs, (a, b, c, d)) # a => -5c/3, b=>0, d=>0
u3 = u3(sols...)
u3 = simplify(u3/u3(x=>1)) # make u3(1) = 1
```
In theory, this can be continued up until any $n$. The resulting
polynomials are called the
[Legendre](https://en.wikipedia.org/wiki/Legendre_polynomials)
polynomials.
Rather than continue this, we develop easier means to generate these polynomials.
## General weight function
Let $w(x)$ be some non-negative function and consider the new inner product between two polynomials:
$$
\langle p, q\rangle = \int_I p(x) q(x) w(x) dx
$$
where $I$ is an interval and $w(x)$ is called a weight function. In the above discussion $I=[-1,1]$ and $w(x) = 1$.
Suppose we have *orthogonal* polynomials $p_i(x)$, $i=0,1, \dots, n$, where $p_i$ is a polynomial of degree $i$ ($p_i(x) = k_i x^i + \cdots$, where $k_i \neq 0$), and
$$
\langle p_m, p_n \rangle =
\int_I p_m(x) p_n(x) w(x) dx =
\begin{cases}
0 & m \neq n\\
h_m & m = n
\end{cases}
$$
Unique elements can be defined by specifying some additional property. For Legendre, it was $p_n(1)=1$, for other orthogonal families this may be specified by having leading coefficient of $1$ (monic), or a norm of $1$ (orthonormal), etc.
The above is the *absolutely continuous* case, generalizations of the integral allow this to be more general.
Orthogonality can be extended: If $q(x)$ is any polynomial of degree $m < n$, then
$\langle q, p_n \rangle = \int_I q(x) p_n(x) w(x) dx = 0$. (See the questions for more detail.)
Some names used for the characterizing constants are:
* $p_n(x) = k_n x^n + \cdots$ ($k_n$ is the leading term)
* $h_n = \langle p_n, p_n\rangle$
### Three-term reccurence
Orthogonal polynomials, as defined above through a weight function, satisfy a *three-term recurrence*:
$$
p_{n+1}(x) = (A_n x + B_n) p_n(x) - C_n p_{n-1}(x),
$$
where $n \geq 0$ and $p_{n-1}(x) = 0$.
(With this and knowledge of $A_n$, $B_n$, and $C_n$, the polynomials can be recursively generated from just specifying a value for the constant $p_0(x)$.
First, $p_{n+1}$ has leading term $k_{n+1}x^{n+1}$. Looking on the right hand side for the coefficient of $x^{n+1}$ we find $A_n k_n$, so $A_n = k_{n+1}/k_n$.
Next, we look at $u(x) = p_{n+1}(x) - A_n x p_n(x)$, a polynomial of degree $n$ or less.
As this has degree $n$ or less, it can be expressed in terms of $p_0, p_1, \dots, p_n$. Write it as $u(x) = \sum_{j=0}^n d_j p_j(x)$. Now, take any $m < n-1$ and consider $p_m$. We consider the inner product of $u$ and $p_m$ two ways:
$$
\begin{align*}
\int_I p_m(x) u(x) w(x) dx &=
\int_I p_m(x) \sum_{j=0}^n p_j(x) w(x) dx \\
&= \int_I p_m(x) \left(p_m(x) + \textcolor{red}{\sum_{j=0, j\neq m}^{n} p_j(x)}\right) w(x) dx \\
&= \int_I p_m(x) p_m(x) w(x) dx = h_m
\end{align*}
$$
As well
$$
\begin{align*}
\int_I p_m(x) u(x) w(x) dx
&= \int_I p_m(x) (p_{n+1}(x) - A_n x p_n(x)) w(x) dx \\
&= \int_I p_m(x) \textcolor{red}{p_{n+1}(x)} w(x) dx - \int_I p_m(x) A_n x p_n(x) w(x) dx\\
&= 0 - A_n \int_I (\textcolor{red}{x p_m(x)}) p_n(x) w(x) dx\\
&= 0
\end{align*}
$$
The last integral being $0$ as $xp_m(x)$ has degree $n-1$ or less and hence is orthogonal to $p_n$.
That is $p_{n+1} - A_n x p_n(x) = d_n p_n(x) + d_{n-1} p_{n-1}(x)$. Setting $B_n=d_n$ and $C_{n-1} = -d_{n-1}$ shows the three-term recurrence applies.
#### Example: Legendre polynomials
With this notation, the Legendre polynomials have:
$$
\begin{align*}
w(x) &= 1\\
I &= [-1,1]\\
A_n &= \frac{2n+1}{n+1}\\
B_n &= 0\\
C_n & = \frac{n}{n+1}\\
k_{n+1} &= \frac{2n+1}{n+1}k_n - \frac{n}{n-1}k_{n-1}, k_1=k_0=1\\
h_n &= \frac{2}{2n+1}
\end{align*}
$$
#### Favard theorem
In an efficient review of the subject, [Koornwinder](https://arxiv.org/pdf/1303.2825) states conditions on the recurrence that ensure that if a $n$-th degree polynomials $p_n$ satisfy a three-term recurrence, then there is an associated weight function (suitably generalized). The conditions use this form of a three-term recurrence:
$$
\begin{align*}
xp_n(x) &= a_n p_{n+1}(x) + b_n p_n(x) + c_n p_{n-1}(x),\quad (n > 0)\\
xp_0(x) &= a_0 p_1(x) + b_0 p_0(x)
\end{align*}
$$
where the constants are real and $a_n c_{n+1} > 0$. These force $a_n = k_n/k_{n+1}$ and $c_n/h_{n+1} = a_n/h_n$
#### Clenshaw algorithm
When introducing polynomials, the synthetic division algorithm was given to compute $p(x) / (x-r)$. This same algorithm also computed $p(r)$ efficiently and is called Horner's method. The `evalpoly` method in `Julia`'s base implements this.
For a set of polynomials $p_0(x), p_1(x), \dots, p_n(x)$ satisfying a three-term recurrence $p_{n+1}(x) = (A_n x + B_n) p_n(x) - C_n p_{n-1}(x)$, the Clenshaw algorithm gives an efficient means to compute an expression of a linear combination of the polynomials, $q(x) = a_0 p_0(x) + a_1 p_1(x) + \cdots + a_n p_n(x)$.
The [method](https://en.wikipedia.org/wiki/Clenshaw_algorithm) uses a reverse recurrence formula starting with
$$
b_{n+1}(x) = b_{n+2}(x) = 0
$$
and then computing for $k = n, n-1, \dots, 1$
$$
b_k(x) = a_k + (A_k x + B_k) b_{k+1}(x) - C_k b_{k+2}(x)
$$
Finally finishing by computing $a_0 p_0(x) + b_1 p_1(x) - C(1) p_0(x) b_2$.
For example, with the Legendre polynomials, we have
```{julia}
A(n) = (2n+1)//(n+1)
B(n) = 0
C(n) = n // (n+1)
```
Say we want to compute $a_0 u_0(x) + a_1 u_1(x) + a_2 u_2(x) + a_3 u_3(x) + a_4 u_4(x)$. The necessary inputs are the coefficients, the value of $x$, and polynomials $p_0$ and $p_1$.
```{julia}
function clenshaw(x, as, p0, p1)
n = length(as) - 1
bn1, bn2 = 0, 0
a(k) = as[k + 1] # offset
for k in n:-1:1
bn1, bn2 = a(k) + (A(k) * x + B(k)) * bn1 - C(k+1) * bn2, bn1
end
b1, b2 = bn1, bn2
p0(x) * a(0) + p1(x) * b1 - C(1) * p0(x) * b2
end
```
This function can be purposed to generate additional Legendre polynomials. For example, to compute $u_4$ we pass in a symbolic value for $x$ and mask out all by $a_4$ in our coefficients:
```{julia}
p₀(x) = 1
p₁(x) = x # Legendre
@syms x
clenshaw(x, (0,0,0,0,1), p₀, p₁) |> expand |> simplify
```
:::{.callout-note}
### Differential equations approach
A different description of families of orthogonal polynomials is that they satisfy a differential equation of the type
$$
\sigma(x) y''(x) + \tau(x) y'(x) + \lambda_n y(x) = 0,
$$
where $\sigma(x) = ax^2 + bx + c$, $\tau(x) = dx + e$, and $\lambda_n = -(a\cdot n(n-1) + dn)$.
With this parameterization, values for $A_n$, $B_n$, and $C_n$ can be given in terms of the leading coefficient, $k_n$ (cf. [Koepf and Schmersau](https://arxiv.org/pdf/math/9612224)):
$$
\begin{align*}
A_n &= \frac{k_{n+1}}{k_n}\\
B_n &= \frac{k_{n+1}}{k_n} \frac{2bn(a(n-1)+d) + e(d-2a)}{(2a(n-1) + d)(2an+d)}\\
C_n &= \frac{k_{n+1}}{k_{n-1}}
\frac{n(a(n-1) + d)(a(n-2)+d)(n(an+d))(4ac-b^2)+ae^2+cd^2-bde}{
(a(n-1)+d)(a(2n-1)+d)(a(2n-3)+d)(2a(n-1)+d)^2}
\end{align*}
$$
There are other relations between derivatives and the orthogonal polynomials. For example, another three-term recurrence is:
$$
\sigma(x) p_n'(x) = (\alpha_n x + \beta_n)p_n(x) + \gamma_n p_{n-1}(x)
$$
The same reference has formulas for $\alpha$, $\beta$, and $\gamma$ in terms of $a,b,c,d$, and $e$ along with many others.
:::
## Chebyshev
The Chebyshev polynomials (of the first kind) satisfy the three-term recurrence
$$
T_{n+1}(x) = 2x T_n(x) - T_{n-1}(x)
$$
with $T_0(x)= 1$ and $T_1(x)=x$.
These polynomials have domain $(-1,1)$ and weight function $(1-x^2)^{-1/2}$.
(The Chebyshev polynomials of the second kind satisfy the same three-term recurrence but have $U_0(x)=1$ and $U_1(x)=2x$.)
These polynomials are related to trigonometry through
$$
T_n(\cos(\theta)) = \cos(n\theta)
$$
This characterization makes it easy to find the zeros of the
polynomial $T_n$, as they happen when $\cos(n\theta)$ is $0$, or when
$n\theta = \pi/2 + k\pi$ for $k$ in $0$ to $n-1$. Solving for $\theta$
and taking the cosine, we get the zeros of the $n$th degree polynomial
$T_n$ are $\cos(\pi(k + 1/2)/n)$ for $k$ in $0$ to $n-1$.
These evenly spaced angles lead to roots more concentrated at the edges of the interval $(-1,1)$.
##### Example
Chebyshev polynomials have a minimal property that makes them fundamental for use with interpolation.
Define the *infinity* norm over $[-1,1]$ to be the maximum value of the absolute value of the function over these values.
Let $f(x) = 2^{-n+1}T_n(x)$ be a monic version of the Chebyshev polynomial.
> If $q(x)$ is any monic polynomial of degree $n$, then the infinity norm of $q(x)$ is greater than or equal to that of $f$.
Using the trigonometric representation of $T_n$, we have
* $f(x)$ has infinity norm of $2^{-n+1}$ and these maxima occur at $x=\cos((k\pi)/n)$, where $0 \leq k \leq n$. (There is a cosine curve with known peaks, oscillating between $-1$ and $1$.)
* $f(x) > 0$ at $x = \cos((2k\pi)/n)$ for $0 \leq 2k \leq n$
* $f(x) < 0$ at $x = \cos(((2k+1)\pi)/n)$ for $0 \leq 2k+1 \leq n$
Suppose $w(x)$ is a monic polynomial of degree $n$ and suppose it has smaller infinity norm. Consider $u(x) = f(x) - w(x)$. At the extreme points of $f(x)$, we must have $|f(x)| \geq |w(x)|$. But this means
* $u(x) > 0$ at $x = \cos((2k\pi)/n)$ for $0 \leq 2k \leq n$
* $u(x) < 0$ at $x = \cos(((2k+1)\pi)/n)$ for $0 \leq 2k+1 \leq n$
As $u$ is continuous, this means there are at least $n$ sign changes, hence $n$ or more zeros. But as both $f$ and $w$ are monic, $u$ is of degree $n-1$, at most. This is a contradiction unless $u(x)$ is the zero polynomial, which it can't be by assumption.
### Integration
Recall, a Riemann sum can be thought of in terms of weights, $w_i$ and nodes $x_i$ for which $\int_I f(x) dx \approx \sum_{i=0}^{n-1} w_i f(x_i)$.
For a right-Riemann sum with partition given by $a_0 < a_1 < \cdots < a_n$ the nodes are $x_i = a_i$ and the weights are $w_i = (a_i - a_{i-1})$ (or in the evenly spaced case, $w_i = (a_n - a_0)/n$.
More generally, this type of expression can represent integrals of the type $\int_I f(x) w(x) dx$, with $w(x)$ as in an inner product. Call such a sum a Gaussian quadrature.
We will see that the zeros of orthogonal polynomials have special properties as nodes.
> For orthogonal polynomials over the interval $I$ with weight function $w(x)$, each $p_n$ has $n$ distinct real zeros in $I$.
Suppose that $p_n$ had only $k<n$ sign changes at $x_1, x_2, \dots, x_k$. Then for some choice of $\delta$, $(-1)^\delta p(x) (x-x_1)(x-x_2)\cdots(x-x_k) \geq 0$. Since this is non zero, it must be that
$$
(-1)^\delta \int_I p(x) \left( (x-x_1)(x-x_2)\cdots(x-x_k)\right) w(x) dx > 0
$$
But, the product is of degree $k < n$, so by orthogonality must be $0$. Hence, it can't be that $k < n$, so there must be $n$ sign changes in $I$ by $p_n$. Each corresponds to a zero, as $p_n$ is continuous.
This next statement says that using the zeros of $p_n$ for the nodes of Gaussian quadrature and appropriate weights that the quadrature is exact for higher degree polynomials.
> For a fixed $n$, suppose $p_0, p_1, \dots, p_n$ are orthogonal polynomials over $I$ with weight function $w(x)$. If the zeros of $p_n$ are the nodes $x_i$, then there exists $n$ weights so that the any polynomial of degree $2n-1$ or less, the Gaussian quadrature is exact.
That is if $q(x)$ is a polynomial with degree $2n-1$ or less, we have for some choice of $w_i$:
$$
\int_I q(x) w(x) dx = \sum_{i=1}^n w_i q(x_i)
$$
To compare, recall, Riemann sums ($1$-node) were exact for constant functions (degree $0$), the trapezoid rule ($2$-nodes) is exact for linear polynomials (degree $1$), and Simpson's rule ($3$ nodes) are exact for cubic polynomials (degree $3$).
We follow [Wikipedia](https://en.wikipedia.org/wiki/Gaussian_quadrature#Fundamental_theorem) to see this key fact.
Take $h(x)$ of degree $2n-1$ or less. Then by polynomial long division, there are polynomials $q(x)$ and $r(x)$ where
$$
h(x) = q(x) p_n(x) + r(x)
$$
and the degree of $r(x)$ is less than $n-1$, the degree of $p_n(x)$. Further, the degree of $q(x)$ is also less than $n-1$, as were it more, then the degree of $q(x)p_n(x)$ would be more than $n-1+n$ or $2n-1$. Let's note that if $x_i$ is a zero of $p_n(x)$ that $h(x_i)= r(x_i)$.
So
$$
\begin{align*}
\int_I h(x) w(x) dx &= \int_I \textcolor{red}{q(x)} p_n(x) w(x) dx + \int_I r(x) w(x)dx\\
&= 0 + \int r(x) w(x) dx.
\end{align*}
$$
Now consider the polynomials made from the zeros of $p_n(x)$
$$
l_i(x) = \prod_{j \ne i} \frac{x - x_j}{x_i - x_j}
$$
These are called Lagrange interpolating polynomials and have the property that $l_i(x_i) = 1$ and $l_i(x_j) = 0$ if $i \neq j$.
These allow the expression of
$$
\begin{align*}
r(x) &= l_1(x)r(x_1) + l_2(x) r(x_2) + \cdots + l_n(x) r(x_n) \\
&= \sum_{i=1}^n l_i(x) r(x_i)
\end{align*}
$$
This isn't obviously true, but this expression agrees with an at-most degree $n-1$ polynomial ($r(x)$) at $n$ points hence it must be the same polynomial.)
With this representation, the integral becomes
$$
\begin{align*}
\int_I h(x) w(x) dx &= \int_I r(x) w(x)dx \\
&= \int_I \sum_{i=1}^n l_i(x) r(x_i) w(x) dx\\
&= \sum_{i=1}^n r(x_i) \int_I l_i(x) w(x) dx \\
&= \sum_{i=1}^n r(x_i) w_i\\
&= \sum_{i=1}^n w_i h(x_i)
\end{align*}
$$
That is there are weights, $w_i = \int_I l_i(x) w(x) dx$, for which the integration is exactly found by Gaussian quadrature using the roots of $p_n$ as the nodes.
The general formula for the weights can be written in terms of the polynomials $p_i = k_ix^i + \cdots$:
$$
\begin{align*}
w_i &= \int_I l_i(x) w(x) dx \\
&= \frac{k_n}{k_{n-1}}
\frac{\int_I p_{n-1}(x)^2 w(x) dx}{p'_n(x_i) p_{n-1}(x_i)}.
\end{align*}
$$
To see this, consider:
$$
\begin{align*}
\prod_{j \neq i} (x - x_j) &=
\frac{\prod_j (x-x_j)}{x-x_i} \\
&= \frac{1}{k_n}\frac{k_n \prod_j (x - x_j)}{x - x_i} \\
&= \frac{1}{k_n} \frac{p_n(x)}{x-x_i}\\
&= \frac{1}{k_n} \frac{p_n(x) - p_n(x_i)}{x-x_i}\\
&\rightarrow \frac{p'_n(x_i)}{k_n}, \text{ as } x \rightarrow x_i.
\end{align*}
$$
Thus
$$
\prod_{j \neq i} (x_i - x_j) = \frac{p'_n(x_i)}{k_n}.
$$
This gives
$$
\begin{align*}
w_i &= \int_i \frac{k_n \prod_j (x-x_j)}{p'_n(x_i)} w(x) dx\\
&= \frac{1}{p'_n(x_i)} \int_i \frac{p_n(x)}{x-x_i} w(x) dx
\end{align*}
$$
To work on the last term, a trick (see the questions for detail) can show that for any $k \leq n$ that
$$
\int_I \frac{x^k p_n(x)}{x - x_i} w(x) dx
= x_i^k \int_I \frac{p_n(x)}{x - x_i} w(x) dx
$$
Hence for any degree $n$ or less polynomial: we have
$$
q(x_i) \int_I \frac{p_n(x)}{x - x_i} w(x) dx =
\int_I \frac{q(x) p_n(x)}{x - x_i} w(x) dx
$$.
We will use this for $p_{n-1}$. First, as $x_i$ is a zero of $p_n(x)$ we have
$$
\frac{p_n(x)}{x-x_i} = k_n x^{n-1}+ r(x),
$$
where $r(x)$ has degree $n-2$ at most. This is due to $p_n$ being divided by a monic polynomial, hence leaving a degree $n-1$ polynomial with leading coefficient $k_n$.
But then
$$
\begin{align*}
w_i &= \frac{1}{p'_n(x_i)} \int_I \frac{p_n(x)}{x-x_i} w(x) dx \\
&= \frac{1}{p'_n(x_i)} \frac{1}{p_{n-1}(x_i)} \int_I \frac{p_{n-1}(x) p_n(x)}{x - x_i} w(x) dx\\
&= \frac{1}{p'_n(x_i)p_{n-1}(x_i)} \int_I p_{n-1}(x)
(k_n x^{n-1} + \textcolor{red}{r(x)}) w(x) dx\\
&= \frac{k_n}{p'_n(x_i)p_{n-1}(x_i)} \int_I p_{n-1}(x) x^{n-1} w(x) dx\\
&= \frac{k_n}{p'_n(x_i)p_{n-1}(x_i)} \int_I p_{n-1}(x)
\left(
\textcolor{red}{\left(x^{n-1} - \frac{p_{n-1}(x)}{k_{n-1}}\right) }
+ \frac{p_{n-1}(x)}{k_{n-1}}\right) w(x) dx\\
&= \frac{k_n}{p'_n(x_i)p_{n-1}(x_i)} \int_I p_{n-1}(x)\frac{p_{n-1}(x)}{k_{n-1}} w(x) dx\\
&= \frac{k_n}{k_{n-1}} \frac{1}{p'_n(x_i)p_{n-1}(x_i)} \int_I p_{n-1}(x)^2 w(x) dx.
\end{align*}
$$
### Examples of quadrature formula
The `QuadGK` package uses a modification to Gauss quadrature to estimate numeric integrals. Let's see how. Behind the scenes, `quadgk` calls `kronrod` to compute nodes and weights.
We have from earlier that
```{julia}
u₃(x) = x*(5x^2 - 3)/2
u₄(x) = 35x^4 / 8 - 15x^2 / 4 + 3/8
```
```{julia}
xs = find_zeros(u₄, -1, 1)
```
From this we can compute the weights from the derived general formula:
```{julia}
k₃, k₄ = 5/2, 35/8
w(x) = 1
I = first(quadgk(x -> u₃(x)^2 * w(x), -1, 1))
ws = [k₄/k₃ * 1/(derivative(u₄,xᵢ) * u₃(xᵢ)) * I for xᵢ ∈ xs]
(xs, ws)
```
We compare now to the values returned by `kronrod` in `QuadGK`
```{julia}
kxs, kwts, wts = kronrod(4, -1, 1)
[ws wts xs kxs[2:2:end]]
```
(The `kronrod` function computes $2n-1$ nodes and weights. The Gauss-Legendre nodes are $n$ of those, and extracted by taking the 2nd, 4th, etc.)
To compare integrations of some smooth function we have
```{julia}
u(x) = exp(x)
GL = sum(wᵢ * u(xᵢ) for (xᵢ, wᵢ) ∈ zip(xs, ws))
KL = sum(wᵢ * u(xᵢ) for (xᵢ, wᵢ) ∈ zip(kxs, kwts))
QL, esterror = quadgk(u, -1, 1)
(; GL, KL, QL, esterror)
```
The first two are expected to not be as accurate, as they utilize a fixed number of nodes.
## Questions
###### Question
Let $p_i$ for $i$ in $0$ to $n$ be polynomials of degree $i$. It is true that for any polynomial $q(x)$ of degree $k \leq n$ that there is a linear combination such that $q(x) = a_0 p_0(x) + \cdots + a_k p_k(x)$.
First it is enough to do this for a monic polynomial $x^k$, why?
```{julia}
#| echo: false
choices = [raw"If you can do it for each $x^i$ then if $q(x) = b_0 + b_1x + b_2x^2 + \cdots + b_k x^k$ we just multiply the coefficients for each $x^i$ by $b_i$ and add.",
raw"It isn't true"]
radioq(choices, 1)
```
Suppose $p_0 = k_0$ and $p_1 = k_1x + a$. How would you make $x=x^1$?
```{julia}
#| echo: false
choices = [raw"$(p1 - (a/k_0) p_0)/k_1$",
raw"$p1 - p0$"]
radioq(choices, 1)
```
Let $p_i = k_i x^i + \cdots$ ($k_i$ is the leading term)
To reduce $p_3 = k_3x^3 + a_2x^2 + a_1x^1 + a_0$ to $k_3x^3$ we could try:
* form $q_3 = p_3 - p_2 (a_2/k_2)$. As $p_2$ is degree $2$, this leaves $k_3x^3$ alone, but it
```{julia}
#| echo: false
choices = [raw"It leaves $0$ as the coefficient of $x^2$",
raw"It leaves all the other terms as $0$"]
radioq(choices, 1)
```
* We then use $p_1$ times some multiple $a/k_1$ to remove the $x$ term
* we then use $p_0$ times some multiple $a/k_0$ to remove the constant term
Would this strategy work to reduce $p_n$ to $k_n x^n$?
```{julia}
#| echo: false
radioq(["Yes", "No"], 1)
```
###### Question
Suppose $p(x)$ and $q(x)$ are polynomials of degree $n$ and there are $n+1$ points for which $p(x_i) = q(x_i)$.
First, is it true or false that a polynomial of degree $n$ has *at most* n zeros?
```{julia}
#| echo: false
radioq(["true, unless it is the zero polynomial", "false"], 1)
```
What is the degree of $h(x) = p(x) - q(x)$?
```{julia}
#| echo: false
radioq([raw"At least $n+1$", raw"At most $n$"], 2)
```
At least how many zeros does the polynomial $h(x)$ have?
```{julia}
#| echo: false
radioq([raw"At least $n+1$", raw"At most $n$"], 1)
```
Is $p(x) = q(x)$ with these assumptions?
```{julia}
#| echo: false
radioq(["yes", "no"], 1)
```
###### Question
We wish to show that for any $k \leq n$ that
$$
\int_I \frac{x^k p_n(x)}{x - x_i} w(x) dx
= x_i^k \int_I \frac{p_n(x)}{x - x_i} w(x) dx
$$
We have for $u=x/x_i$ that
$$
\frac{1}{x - x_i} = \frac{1 - u^k}{x - x_i} + \frac{u^k}{x - x_i}
$$
But the first term, $(1-u^k)/(x-x_i)$ is a polynomial of degree $k-1$. Why?
```{julia}
#| echo: false
choices = [raw"""
Because we can express this as $x_i^k - x^k$ which factors as $(x_i - x) \cdot u(x)$ where $u(x)$ has degree $k-1$, at most.
""",
raw"""
It isn't true, it clearly has degree $k$
"""]
radioq(choices, 1)
```
This gives if $k \leq n$ and with $u=x/x_i$:
$$
\begin{align*}
\int_I \frac{p_n(x)}{x - x_i} w(x) dx
&= \int_I p_n(x) \left( \textcolor{red}{\frac{1 - u^k}{x - x_i}} + \frac{u^k}{x - x_i} \right) w(x) dx\\
&= \int_I p_n(x) \frac{\frac{x^k}{x_i^k}}{x - x_i} w(x) dx\\
&= \frac{1}{x_i^k} \int_I \frac{x^k p_n(x)}{x - x_i} w(x) dx
\end{align*}
$$

View File

@@ -1,4 +1,4 @@
# Partial Fractions
# Partial fractions
{{< include ../_common_code.qmd >}}
@@ -14,7 +14,9 @@ using SymPy
Integration is facilitated when an antiderivative for $f$ can be found, as then definite integrals can be evaluated through the fundamental theorem of calculus.
However, despite differentiation being an algorithmic procedure, integration is not. There are "tricks" to try, such as substitution and integration by parts. These work in some cases. However, there are classes of functions for which algorithms exist. For example, the `SymPy` `integrate` function mostly implements an algorithm that decides if an elementary function has an antiderivative. The [elementary](http://en.wikipedia.org/wiki/Elementary_function) functions include exponentials, their inverses (logarithms), trigonometric functions, their inverses, and powers, including $n$th roots. Not every elementary function will have an antiderivative comprised of (finite) combinations of elementary functions. The typical example is $e^{x^2}$, which has no simple antiderivative, despite its ubiquitousness.
However, despite differentiation being an algorithmic procedure, integration is not. There are "tricks" to try, such as substitution and integration by parts. These work in some cases---but not all!
However, there are classes of functions for which algorithms exist. For example, the `SymPy` `integrate` function mostly implements an algorithm that decides if an elementary function has an antiderivative. The [elementary](http://en.wikipedia.org/wiki/Elementary_function) functions include exponentials, their inverses (logarithms), trigonometric functions, their inverses, and powers, including $n$th roots. Not every elementary function will have an antiderivative comprised of (finite) combinations of elementary functions. The typical example is $e^{x^2}$, which has no simple antiderivative, despite its ubiquitousness.
There are classes of functions where an (elementary) antiderivative can always be found. Polynomials provide a case. More surprisingly, so do their ratios, *rational functions*.
@@ -28,13 +30,16 @@ Let $f(x) = p(x)/q(x)$, where $p$ and $q$ are polynomial functions with real co
The function $q(x)$ will factor over the real numbers. The fundamental theorem of algebra can be applied to say that $q(x)=q_1(x)^{n_1} \cdots q_k(x)^{n_k}$ where $q_i(x)$ is a linear or quadratic polynomial and $n_k$ a positive integer.
::: {.callout-note icon=false}
## Partial Fraction Decomposition
> **Partial Fraction Decomposition**: There are unique polynomials $a_{ij}$ with degree $a_{ij} <$ degree $q_i$ such that
>
> $$
> \frac{p(x)}{q(x)} = a(x) + \sum_{i=1}^k \sum_{j=1}^{n_i} \frac{a_{ij}(x)}{q_i(x)^j}.
> $$
There are unique polynomials $a_{ij}$ with degree $a_{ij} <$ degree $q_i$ such that
$$
\frac{p(x)}{q(x)} = a(x) + \sum_{i=1}^k \sum_{j=1}^{n_i} \frac{a_{ij}(x)}{q_i(x)^j}.
$$
:::
The method is attributed to John Bernoulli, one of the prolific Bernoulli brothers who put a stamp on several areas of math. This Bernoulli was a mentor to Euler.
@@ -109,7 +114,7 @@ What remains is to establish that we can take $A(x) = a(x)\cdot P(x)$ with a deg
In Proposition 3.8 of [Bradley](http://www.m-hikari.com/imf/imf-2012/29-32-2012/cookIMF29-32-2012.pdf) and Cook we can see how. Recall the division algorithm, for example, says there are $q_k$ and $r_k$ with $A=q\cdot q_k + r_k$ where the degree of $r_k$ is less than that of $q$, which is linear or quadratic. This is repeatedly applied below:
$$
\begin{align*}
\frac{A}{q^k} &= \frac{q\cdot q_k + r_k}{q^k}\\
&= \frac{r_k}{q^k} + \frac{q_k}{q^{k-1}}\\
@@ -119,6 +124,7 @@ In Proposition 3.8 of [Bradley](http://www.m-hikari.com/imf/imf-2012/29-32-2012/
&= \cdots\\
&= \frac{r_k}{q^k} + \frac{r_{k-1}}{q^{k-1}} + \cdots + q_1.
\end{align*}
$$
So the term $A(x)/q(x)^k$ can be expressed in terms of a sum where the numerators or each term have degree less than $q(x)$, as expected by the statement of the theorem.
@@ -208,13 +214,14 @@ integrate(B/((a*x)^2 - 1)^4, x)
In [Bronstein](http://www-sop.inria.fr/cafe/Manuel.Bronstein/publications/issac98.pdf) this characterization can be found - "This method, which dates back to Newton, Leibniz and Bernoulli, should not be used in practice, yet it remains the method found in most calculus texts and is often taught. Its major drawback is the factorization of the denominator of the integrand over the real or complex numbers." We can also find the following formulas which formalize the above exploratory calculations ($j>1$ and $b^2 - 4c < 0$ below):
$$
\begin{align*}
\int \frac{A}{(x-a)^j} &= \frac{A}{1-j}\frac{1}{(x-a)^{j-1}}\\
\int \frac{A}{x-a} &= A\log(x-a)\\
\int \frac{Bx+C}{x^2 + bx + c} &= \frac{B}{2} \log(x^2 + bx + c) + \frac{2C-bB}{\sqrt{4c-b^2}}\cdot \arctan\left(\frac{2x+b}{\sqrt{4c-b^2}}\right)\\
\int \frac{Bx+C}{(x^2 + bx + c)^j} &= \frac{B' x + C'}{(x^2 + bx + c)^{j-1}} + \int \frac{C''}{(x^2 + bx + c)^{j-1}}
\end{align*}
$$
The first returns a rational function; the second yields a logarithm term; the third yields a logarithm and an arctangent term; while the last, which has explicit constants available, provides a reduction that can be recursively applied;
@@ -233,7 +240,11 @@ $$
#### Examples
Find an antiderivative for $1/(x\cdot(x^2+1)^2)$.
Find an antiderivative for
$$
\frac{1}{x\cdot(x^2+1)^2}.
$$
We have a partial fraction decomposition is:
@@ -254,7 +265,11 @@ integrate(1/q, x)
---
Find an antiderivative of $1/(x^2 - 2x-3)$.
Find an antiderivative of
$$
\frac{1}{x^2 - 2x-3}.
$$
We again just let `SymPy` do the work. A partial fraction decomposition is given by:
@@ -288,7 +303,7 @@ The answers found can become quite involved. [Corless](https://arxiv.org/pdf/171
ex = (x^2 - 1) / (x^4 + 5x^2 + 7)
```
But the integral is something best suited to a computer algebra system:
But the integral is something best suited for a computer algebra system:
```{julia}
@@ -451,7 +466,7 @@ answ = 2
radioq(choices, answ, keep_order=true)
```
If $m < n$, then why can we cancel out the $(x-c)^n$ and not have a concern?
If $m < n$, then why can we cancel out the $(x-c)^m$ and not have a concern?
```{julia}
@@ -482,11 +497,12 @@ How to see that these give rise to real answers on integration is the point of t
Breaking the terms up over $a$ and $b$ we have:
$$
\begin{align*}
I &= \frac{a}{x - (\alpha + i \beta)} + \frac{a}{x - (\alpha - i \beta)} \\
II &= i\frac{b}{x - (\alpha + i \beta)} - i\frac{b}{x - (\alpha - i \beta)}
\end{align*}
$$
Integrating $I$ leads to two logarithmic terms, which are combined to give:

View File

@@ -41,13 +41,14 @@ $$
So,
$$
\begin{align*}
\int_a^b g(u(t)) \cdot u'(t) dt &= \int_a^b (G \circ u)'(t) dt\\
&= (G\circ u)(b) - (G\circ u)(a) \quad\text{(the FTC, part II)}\\
&= G(u(b)) - G(u(a)) \\
&= \int_{u(a)}^{u(b)} g(x) dx. \quad\text{(the FTC part II)}
\end{align*}
$$
That is, this substitution formula applies:
@@ -181,7 +182,7 @@ when $-1 \leq x \leq 1$ and $0$ otherwise. The area under $f$ is just $1$ - the
Let $u(x) = (x-c)/h$ and $g(x) = (1/h) \cdot f(u(x))$. Then, as $du = 1/h dx$
$$
\begin{align*}
\int_{c-h}^{c+h} g(x) dx
&= \int_{c-h}^{c+h} \frac{1}{h} f(u(x)) dx\\
@@ -189,6 +190,7 @@ Let $u(x) = (x-c)/h$ and $g(x) = (1/h) \cdot f(u(x))$. Then, as $du = 1/h dx$
&= \int_{-1}^1 f(u) du\\
&= 1.
\end{align*}
$$
So the area of this transformed function is still $1$. The shifting by $c$ we know doesn't effect the area, the scaling by $h$ inside of $f$ does, but is balanced out by the multiplication by $1/h$ outside of $f$.
@@ -248,13 +250,14 @@ $$
But $u^3/3 - 4u/3 = (1/3) \cdot u(u-2)(u+2)$, so between $-2$ and $0$ it is positive and between $0$ and $1$ negative, so this integral is:
$$
\begin{align*}
\int_{-2}^0 (u^3/3 - 4u/3 ) du + \int_{0}^1 -(u^3/3 - 4u/3) du
&= (\frac{u^4}{12} - \frac{4}{3}\frac{u^2}{2}) \big|_{-2}^0 - (\frac{u^4}{12} - \frac{4}{3}\frac{u^2}{2}) \big|_{0}^1\\
&= \frac{4}{3} - -\frac{7}{12}\\
&= \frac{23}{12}.
\end{align*}
$$
##### Example
@@ -270,13 +273,14 @@ $$
Integrals involving this function are typically transformed by substitution. For example:
$$
\begin{align*}
\int_a^b f(x; \mu, \sigma) dx
&= \int_a^b \frac{1}{\sqrt{2\pi}}\frac{1}{\sigma} \exp(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2) dx \\
&= \int_{u(a)}^{u(b)} \frac{1}{\sqrt{2\pi}} \exp(-\frac{1}{2}u^2) du \\
&= \int_{u(a)}^{u(b)} f(u; 0, 1) du,
\end{align*}
$$
where $u = (x-\mu)/\sigma$, so $du = (1/\sigma) dx$.
@@ -285,22 +289,23 @@ where $u = (x-\mu)/\sigma$, so $du = (1/\sigma) dx$.
This shows that integrals involving a normal density with parameters $\mu$ and $\sigma$ can be computed using the *standard* normal density with $\mu=0$ and $\sigma=1$. Unfortunately, there is no elementary antiderivative for $\exp(-u^2/2)$, so integrals for the standard normal must be numerically approximated.
There is a function `erf` in the `SpecialFunctions` package (which is loaded by `CalculusWithJulia`) that computes:
There is a function `erf` in the `SpecialFunctions` package (which is loaded by `CalculusWithJulia`) defined by:
$$
\int_0^x \frac{2}{\sqrt{\pi}} \exp(-t^2) dt
\text{erf}(x) = \frac{2}{\sqrt{\pi}}\int_0^x \exp(-t^2) dt
$$
A further change of variables by $t = u/\sqrt{2}$ (with $\sqrt{2}dt = du$) gives:
$$
\begin{align*}
\int_a^b f(x; \mu, \sigma) dx &=
\int_{t(u(a))}^{t(u(b))} \frac{\sqrt{2}}{\sqrt{2\pi}} \exp(-t^2) dt\\
&= \frac{1}{2} \int_{t(u(a))}^{t(u(b))} \frac{2}{\sqrt{\pi}} \exp(-t^2) dt
\end{align*}
$$
Up to a factor of $1/2$ this is `erf`.
@@ -309,13 +314,14 @@ Up to a factor of $1/2$ this is `erf`.
So we would have, for example, with $\mu=1$,$\sigma=2$ and $a=1$ and $b=3$ that:
$$
\begin{align*}
t(u(a)) &= (1 - 1)/2/\sqrt{2} = 0\\
t(u(b)) &= (3 - 1)/2/\sqrt{2} = \frac{1}{\sqrt{2}}\\
\int_1^3 f(x; 1, 2)
&= \frac{1}{2} \int_0^{1/\sqrt{2}} \frac{2}{\sqrt{\pi}} \exp(-t^2) dt.
\end{align*}
$$
Or
@@ -488,7 +494,7 @@ integrate(1 / (a^2 + (b*x)^2), x)
The expression $1-x^2$ can be attacked by the substitution $\sin(u) =x$ as then $1-x^2 = 1-\sin(u)^2 = \cos(u)^2$. Here we see this substitution being used successfully:
$$
\begin{align*}
\int \frac{1}{\sqrt{9 - x^2}} dx &= \int \frac{1}{\sqrt{9 - (3\sin(u))^2}} \cdot 3\cos(u) du\\
&=\int \frac{1}{3\sqrt{1 - \sin(u)^2}}\cdot3\cos(u) du \\
@@ -496,6 +502,7 @@ The expression $1-x^2$ can be attacked by the substitution $\sin(u) =x$ as then
&= u \\
&= \sin^{-1}(x/3).
\end{align*}
$$
Further substitution allows the following integral to be solved for an antiderivative:
@@ -513,23 +520,25 @@ integrate(1 / sqrt(a^2 - b^2*x^2), x)
The expression $x^2 - 1$ is a bit different, this lends itself to $\sec(u) = x$ for a substitution, for $\sec(u)^2 - 1 = \tan(u)^2$. For example, we try $\sec(u) = x$ to integrate:
$$
\begin{align*}
\int \frac{1}{\sqrt{x^2 - 1}} dx &= \int \frac{1}{\sqrt{\sec(u)^2 - 1}} \cdot \sec(u)\tan(u) du\\
&=\int \frac{1}{\tan(u)}\sec(u)\tan(u) du\\
&= \int \sec(u) du.
\end{align*}
$$
This doesn't seem that helpful, but the antiderivative to $\sec(u)$ is $\log\lvert (\sec(u) + \tan(u))\rvert$, so we can proceed to get:
$$
\begin{align*}
\int \frac{1}{\sqrt{x^2 - 1}} dx &= \int \sec(u) du\\
&= \log\lvert (\sec(u) + \tan(u))\rvert\\
&= \log\lvert x + \sqrt{x^2-1} \rvert.
\end{align*}
$$
SymPy gives a different representation using the arccosine:
@@ -566,13 +575,14 @@ $$
The identify $\cos(u)^2 = (1 + \cos(2u))/2$ makes this tractable:
$$
\begin{align*}
4ab \int \cos(u)^2 du
&= 4ab\int_0^{\pi/2}(\frac{1}{2} + \frac{\cos(2u)}{2}) du\\
&= 4ab(\frac{1}{2}u + \frac{\sin(2u)}{4})\big|_0^{\pi/2}\\
&= 4ab (\pi/4 + 0) = \pi ab.
\end{align*}
$$
Keeping in mind that that a circle with radius $a$ is an ellipse with $b=a$, we see that this gives the correct answer for a circle.

View File

@@ -1,4 +1,4 @@
# Surface Area
# Surface area
{{< include ../_common_code.qmd >}}
@@ -50,19 +50,24 @@ revolution, there is an easier way. (Photo credit to
[firepanjewellery](http://firepanjewellery.com/).)
](./figures/gehry-hendrix.jpg)
> The surface area generated by rotating the graph of $f(x)$ between $a$ and $b$ about the $x$-axis is given by the integral
>
> $$
> \int_a^b 2\pi f(x) \cdot \sqrt{1 + f'(x)^2} dx.
> $$
>
> If the curve is parameterized by $(g(t), f(t))$ between $a$ and $b$ then the surface area is
>
> $$
> \int_a^b 2\pi f(t) \cdot \sqrt{g'(t)^2 + f'(t)^2} dx.
> $$
>
> These formulas do not add in the surface area of either of the ends.
::: {.callout-note icon=false}
## Surface area of a rotated curve
The surface area generated by rotating the graph of $f(x)$ between $a$ and $b$ about the $x$-axis is given by the integral
$$
\int_a^b 2\pi f(x) \cdot \sqrt{1 + f'(x)^2} dx.
$$
If the curve is parameterized by $(g(t), f(t))$ between $a$ and $b$ then the surface area is
$$
\int_a^b 2\pi f(t) \cdot \sqrt{g'(t)^2 + f'(t)^2} dt.
$$
These formulas do not add in the surface area of either of the ends.
:::
@@ -85,11 +90,343 @@ To see why this formula is as it is, we look at the parameterized case, the firs
Let a partition of $[a,b]$ be given by $a = t_0 < t_1 < t_2 < \cdots < t_n =b$. This breaks the curve into a collection of line segments. Consider the line segment connecting $(g(t_{i-1}), f(t_{i-1}))$ to $(g(t_i), f(t_i))$. Rotating this around the $x$ axis will generate something approximating a disc, but in reality will be the frustum of a cone. What will be the surface area?
::: {#fig-surface-area}
```{julia}
#| echo: false
let
gr()
function projection_plane(v)
vx, vy, vz = v
a = [-vy, vx, 0] # v ⋅ a = 0
b = v × a # so v ⋅ b = 0
return (a/norm(a), b/norm(b))
end
function project(x, v)
â, b̂ = projection_plane(v)
(x ⋅ â, x ⋅ b̂) # (x ⋅ â) â + (x ⋅ b̂) b̂
end
radius(t) = 1 / (1 + exp(t))
t₀, tₙ = 0, 3
surf(t, θ) = [t, radius(t)*cos(θ), radius(t)*sin(θ)]
Consider a right-circular cone parameterized by an angle $\theta$ and the largest radius $r$ (so that the height satisfies $r/h=\tan(\theta)$). If this cone were made of paper, cut up a side, and laid out flat, it would form a sector of a circle, whose area would be $R^2\gamma/2$ where $R$ is the radius of the circle (also the side length of our cone), and $\gamma$ an angle that we can figure out from $r$ and $\theta$. To do this, we note that the arc length of the circle's edge is $R\gamma$ and also the circumference of the bottom of the cone so $R\gamma = 2\pi r$. With all this, we can solve to get $A = \pi r^2/\sin(\theta)$. But we have a frustum of a cone with radii $r_0$ and $r_1$, so the surface area is a difference: $A = \pi (r_1^2 - r_0^2) /\sin(\theta)$.
v = [2, -2, 1]
function plot_axes()
empty_style = (xaxis = ([], false),
yaxis = ([], false),
legend=false)
plt = plot(; empty_style...)
axis_values = [[(0,0,0), (3.5,0,0)], # x axis
[(0,0,0), (0, 2.0 * radius(0), 0)], # yaxis
[(0,0,0), (0, 0, 1.5 * radius(0))]] # z axis
for (ps, ax) ∈ zip(axis_values, ("x", "y", "z"))
p0, p1 = ps
a, b = project(p0, v), project(p1, v)
annotate!([(b...,text(ax, :bottom))])
plot!([a, b]; arrow=true, head=:tip, line=(:gray, 1)) # gr() allows arrows
end
plt
end
function psurf(v)
(t,θ) -> begin
v1, v2 = project(surf(t, θ), v)
[v1, v2] # or call collect to make a tuple into a vector
end
end
function detJ(F, t, θ)
∂θ = ForwardDiff.derivative(θ -> F(t, θ), θ)
∂t = ForwardDiff.derivative(t -> F(t, θ), t)
(ax, ay), (bx, by) = ∂θ, ∂t
ax * by - ay * bx
end
function cap!(t, v; kwargs...)
θs = range(0, 2pi, 100)
S = Shape(project.(surf.(t, θs), (v,)))
plot!(S; kwargs...)
end
## ----
G = psurf(v)
fold(F, t, θmin, θmax) = find_zero(θ -> detJ(F, t, θ), (θmin, θmax))
plt = plot_axes()
Relating this to our values in terms of $f$ and $g$, we have $r_1=f(t_i)$, $r_0 = f(t_{i-1})$, and $\sin(\theta) = \Delta f / \sqrt{(\Delta g)^2 + (\Delta f)^2}$, where $\Delta f = f(t_i) - f(t_{i-1})$ and similarly for $\Delta g$.
ts = range(t₀, tₙ, 100)
back_edge = fold.(G, ts, 0, pi)
front_edge = fold.(G, ts, pi, 2pi)
db = Dict(t => v for (t,v) in zip(ts, back_edge))
df = Dict(t => v for (t,v) in zip(ts, front_edge))
# basic shape
plt = plot_axes()
plot!(project.(surf.(ts, back_edge), (v,)); line=(:black, 1))
plot!(project.(surf.(ts, front_edge), (v,)); line=(:black, 1))
# add caps
cap!(t₀, v; fill=(:gray, 0.33))
cap!(tₙ, v; fill=(:gray, 0.33))
# add rotated surface segment
i,j = 33,38
a = ts[i]
θs = range(db[ts[i]], df[ts[i]], 100)
θs = reverse(range(db[ts[j]], df[ts[j]], 100))
function 𝐺(t,θ)
v1, v2 = G(t, θ)
(v1, v2)
end
S = Shape(vcat(𝐺.(ts[i], θs), 𝐺.(ts[j], θs)))
plot!(S)
θs = range(df[ts[i]], 2pi + db[ts[i]], 100)
plot!([𝐺(ts[i], θ) for θ in θs]; line=(:black, 1, :dash))
θs = range(df[ts[j]], 2pi + db[ts[j]], 100)
plot!([𝐺(ts[j], θ) for θ in θs]; line=(:black, 1))
plot!([project((ts[i], 0,0),v), 𝐺(ts[i],db[ts[i]])]; line=(:black, 1, :dot), arrow=true)
plot!([project((ts[j], 0,0),v), 𝐺(ts[j],db[ts[j]])]; line=(:black, 1, :dot), arrow=true)
# add shading
lightpt = [2, -2, 5] # from further above
H = psurf(lightpt)
light_edge = fold.(H, ts, pi, 2pi);
for (i, (t, top, bottom)) in enumerate(zip(ts, light_edge, front_edge))
λ = iseven(i) ? 1.0 : 0.8
top = bottom + λ*(top - bottom)
curve = [project(surf(t, θ), v) for θ in range(bottom, top, 20)]
plot!(curve, line=(:black, 1))
end
# annotations
_x, _y, _z = surf(ts[i],db[ts[i]])
__x, __y = project((_x, _y/2, _z/2), v)
_x, _y, _z = surf(ts[j],db[ts[j]])
__x, __y = project((_x, _y/2, _z/2), v)
# annotations
annotate!([
(__x, __y, text(L"r_i", :left, :top)),
(__x, __y, text(L"r_{i+1}",:left, :top)),
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration of function $(g(t), f(t))$ rotated about the $x$ axis with a section shaded.
:::
Consider a right-circular cone parameterized by an angle $\theta$ which at a given height has radius $r$ and slant height $l$ (so that the height satisfies $r/l=\sin(\theta)$). If this cone were made of paper, cut up a side, and laid out flat, it would form a sector of a circle, as illustrated below:
::: {#fig-frustum-cone-area}
```{julia}
#| echo: false
p1 = let
gr()
function projection_plane(v)
vx, vy, vz = v
a = [-vy, vx, 0] # v ⋅ a = 0
b = v × a # so v ⋅ b = 0
return (a/norm(a), b/norm(b))
end
function project(x, v)
â, b̂ = projection_plane(v)
(x ⋅ â, x ⋅ b̂) # (x ⋅ â) â + (x ⋅ b̂) b̂
end
function plot_axes(v)
empty_style = (xaxis = ([], false),
yaxis = ([], false),
legend=false)
plt = plot(; empty_style..., aspect_ratio=:equal)
a,b,c,d,e = project.([(0,0,2), (0,0,3), surf(3, 3pi/2), surf(2, 3pi/2),(0,0,0)], (v,))
pts = [a,b,c,d,a]#project.([a,b,c,d,a], (v,))
plot!(pts; line=(:gray, 1))
plot!([c,d]; line=(:black, 2))
plot!([d, e,a]; line=(:gray, 1,1))
#plot!(project.([e,a,d,e],(v,)); line=(:gray, 1))
plt
end
function psurf(v)
(t,θ) -> begin
v1, v2 = project(surf(t, θ), v)
[v1, v2] # or call collect to make a tuple into a vector
end
end
function detJ(F, t, θ)
∂θ = ForwardDiff.derivative(θ -> F(t, θ), θ)
∂t = ForwardDiff.derivative(t -> F(t, θ), t)
(ax, ay), (bx, by) = ∂θ, ∂t
ax * by - ay * bx
end
function cap!(t, v; kwargs...)
θs = range(0, 2pi, 100)
S = Shape(project.(surf.(t, θs), (v,)))
plot!(S; kwargs...)
end
function fold(F, t, θmin, θmax)
𝐹(θ) = detJ(F, t, θ)
𝐹(θmin) * 𝐹(θmax) <= 0 || return NaN
find_zero(𝐹, (θmin, θmax))
end
radius(t) = t/2
t₀, tₙ = 0, 3
surf(t, θ) = [radius(t)*cos(θ), radius(t)*sin(θ), t] # z axis
v = [2, -2, 1]
G = psurf(v)
ts = range(t₀, tₙ, 100)
back_edge = fold.(G, ts, 0, pi)
front_edge = fold.(G, ts, pi, 2pi)
db = Dict(t => v for (t,v) in zip(ts, back_edge))
df = Dict(t => v for (t,v) in zip(ts, front_edge))
plt = plot_axes(v)
plot!(project.(surf.(ts, back_edge), (v,)); line=(:black, 1))
plot!(project.(surf.(ts, front_edge), (v,)); line=(:black, 1))
cap!(tₙ, v; fill=(:gray80, 0.33))
i = 67
tᵢ = ts[i] # tᵢ = 2.0
plot!(project.([surf.(tᵢ, θ) for θ in range(df[tᵢ], 2pi + db[tᵢ], 100)], (v,)))
# add surface to rotate
## add light
lightpt = [2, -2, 5] # from further above
H = psurf(lightpt)
light_edge = fold.(H, ts, pi, 2pi);
for (i, (t, top, bottom)) in enumerate(zip(ts, light_edge, front_edge))
λ = iseven(i) ? 1.0 : 0.8
(isnan(top) || isnan(bottom)) && continue
top = bottom + λ*(top - bottom)
curve = [project(surf(t, θ), v) for θ in range(bottom, top, 20)]
#plot!(curve, line=(:black, 1))
end
a,b,c = project(surf(tₙ, 3pi/2), v), project(surf(2, 3pi/2),v), project((0,0,0), v)
#plot!([a,b], line=(:black, 3))
#plot!([b,c]; line=(:black,2))
# annotations
_x,_y,_z = surf(tₙ, 3pi/2)
r1 = project((_x/2, _y/2, _z), v)
_x,_y,_z = surf(2, 3pi/2)
r2 = project((_x/2, _y/2, _z), v)
_x, _y, _z = surf(1/2, 3pi/2)
theta = project((_x/2, _y/2, _z), v)
a, b = project.((surf(3, 3pi/2), surf(2, 3pi/2)), (v,))
annotate!([
(r1..., text(L"r_2",:bottom)),
(r2..., text(L"r_1",:bottom)),
(theta..., text(L"\theta")),
(a..., text(L"l_2",:right, :top)),
(b..., text(L"l_1", :right, :top))
])
current()
end
p2 = let
θ = 2pi - pi/3
θs = range(2pi-θ, 2pi, 100)
r1, r2 = 2, 3
empty_style = (xaxis = ([], false),
yaxis = ([], false),
legend=false,
aspect_ratio=:equal)
plt = plot(; empty_style...)
plot!(r1.*cos.(θs), r1 .* sin.(θs); line=(:black, 1))
plot!(r2.*cos.(θs), r2 .* sin.(θs); line=(:black, 1))
plot!([(0,0),(r1,0)]; line=(:gray, 1, :dash))
plot!([(r1,0),(r2,0)]; line=(:black, 1))
s, c = sincos(2pi-θ)
plot!([(0,0),(r1,0)]; line=(:gray, 1, :dash))
plot!([(0,0), (r1*c, r1*s)]; line=(:gray, 1, :dash))
plot!([(r1,0),(r2,0)]; line=(:black, 1))
plot!([(r1*c, r1*s), (r2*c, r2*s)]; line=(:black, 2))
s,c = sincos((2pi - θ)/2)
annotate!([
(1/2*c, 1/2*s, text(L"\gamma")),
(r1*c, r1*s, text(L"l_1",:left, :top)),
(r2*c, r2*s, text(L"l_2", :left, :top)),
])
#=
δ = pi/8
scs = reverse(sincos.(range(2pi-θ, 2pi - θ + pi - δ,100)))
plot!([1/2 .* (c,s) for (s,c) in scs]; line=(:gray, 1,:dash), arrow=true, side=:head)
scs = sincos.(range(2pi - θ + pi + δ, 2pi,100))
plot!([1/2 .* (c,s) for (s,c) in scs]; line=(:gray, 1,:dash), arrow=true, side=:head)
=#
end
plot(p1, p2)
```
```{julia}
#| echo: false
plotly()
nothing
```
The surface of a frustum of a cone and the same area spread out flat. Angle $\gamma = 2\pi(1 - \sin(\theta)$.
:::
By comparing circumferences, it is seen that the angles $\theta$ and $\gamma$ are related by $\gamma = 2\pi(1 - \sin(\theta))$ (as $2\pi r_2 = 2\pi l_2\sin(\theta) = (2\pi-\gamma)/(2\pi) \cdot 2\pi l_2$). The values $l_i$ and $r_i$ are related by $r_i = l_i \sin(\theta)$. The area in both pictures is: $(\pi l_2^2 - \pi l_1^2) \cdot (2\pi-\gamma)/(2\pi)$ which simplifies to $\pi (l_2 + l_1) \cdot \sin(\theta) \cdot (l_2 - l_1)$ or $2\pi \cdot (r_2 - r_1)/2 \cdot \text{slant height}$.
Relating this to our values in terms of $f$ and $g$, we have $r_1=f(t_i)$, $r_0 = f(t_{i-1})$, and the slant height is related by $(l_2-l_1)^2 = (g(t_2)-g(t_1))^2 + (f(t_2) - f(t_1))^2$.
Putting this altogether we get that the surface area generarated by rotating the line segment around the $x$ axis is
@@ -97,7 +434,7 @@ Putting this altogether we get that the surface area generarated by rotating the
$$
\text{sa}_i = \pi (f(t_i)^2 - f(t_{i-1})^2) \cdot \sqrt{(\Delta g)^2 + (\Delta f)^2} / \Delta f =
\pi (f(t_i) + f(t_{i-1})) \cdot \sqrt{(\Delta g)^2 + (\Delta f)^2}.
2\pi \frac{f(t_i) + f(t_{i-1})}{2} \cdot \sqrt{(\Delta g)^2 + (\Delta f)^2}.
$$
(This is $2 \pi$ times the average radius times the slant height.)
@@ -117,7 +454,9 @@ $$
\text{SA} = \int_a^b 2\pi f(t) \sqrt{g'(t)^2 + f'(t)^2} dt.
$$
If we assume integrability of the integrand, then as our partition size goes to zero, this approximate surface area converges to the value given by the limit. (As with arc length, this needs a technical adjustment to the Riemann integral theorem as here we are evaluating the integrand function at four points ($t_i$, $t_{i-1}$, $\xi$ and $\psi$) and not just at some $c_i$. An figure appears at the end.
If we assume integrability of the integrand, then as our partition size goes to zero, this approximate surface area converges to the value given by the limit. (As with arc length, this needs a technical adjustment to the Riemann integral theorem as here we are evaluating the integrand function at four points ($t_i$, $t_{i-1}$, $\xi$ and $\psi$) and not just at some $c_i$.
#### Examples
@@ -129,7 +468,7 @@ Lets see that the surface area of an open cone follows from this formula, even t
A cone can be envisioned as rotating the function $f(x) = x\tan(\theta)$ between $0$ and $h$ around the $x$ axis. This integral yields the surface area:
$$
\begin{align*}
\int_0^h 2\pi f(x) \sqrt{1 + f'(x)^2}dx
&= \int_0^h 2\pi x \tan(\theta) \sqrt{1 + \tan(\theta)^2}dx \\
@@ -137,6 +476,7 @@ A cone can be envisioned as rotating the function $f(x) = x\tan(\theta)$ between
&= \pi \tan(\theta) \sec(\theta) h^2 \\
&= \pi r^2 / \sin(\theta).
\end{align*}
$$
(There are many ways to express this, we used $r$ and $\theta$ to match the work above. If the cone is parameterized by a height $h$ and radius $r$, then the surface area of the sides is $\pi r\sqrt{h^2 + r^2}$. If the base is included, there is an additional $\pi r^2$ term.)
@@ -170,7 +510,7 @@ F(1) - F(0)
### Plotting surfaces of revolution
The commands to plot a surface of revolution will be described more clearly later; for now we present them as simply a pattern to be followed in case plots are desired. Suppose the curve in the $x-y$ plane is given parametrically by $(g(u), f(u))$ for $a \leq u \leq b$.
The commands to plot a surface of revolution will be described more clearly later; for now we present them as simply a pattern to be followed in case plots are desired. Suppose the curve in the $x-z$ plane is given parametrically by $(g(u), f(u))$ for $a \leq u \leq b$.
To be concrete, we parameterize the circle centered at $(6,0)$ with radius $2$ by:
@@ -189,14 +529,14 @@ The plot of this curve is:
#| hold: true
us = range(a, b, length=100)
plot(g.(us), f.(us), xlims=(-0.5, 9), aspect_ratio=:equal, legend=false)
plot!([0,0],[-3,3], color=:red, linewidth=5) # y axis emphasis
plot!([3,9], [0,0], color=:green, linewidth=5) # x axis emphasis
plot!([(0, -3), (0, 3)], line=(:red, 5)) # z axis emphasis
plot!([(3, 0), (9, 0)], line=(:green, 5)) # x axis emphasis
```
Though parametric plots have a convenience constructor, `plot(g, f, a, b)`, we constructed the points with `Julia`'s broadcasting notation, as we will need to do for a surface of revolution. The `xlims` are adjusted to show the $y$ axis, which is emphasized with a layered line. The line is drawn by specifying two points, $(x_0, y_0)$ and $(x_1, y_1)$ in the form `[x0,x1]` and `[y0,y1]`.
Though parametric plots have a convenience constructor, `plot(g, f, a, b)`, we constructed the points with `Julia`'s broadcasting notation, as we will need to do for a surface of revolution. The `xlims` are adjusted to show the $y$ axis, which is emphasized with a layered line. The line is drawn by specifying two points, $(x_0, y_0)$ and $(x_1, y_1)$ using tuples and wrapping in a vector.
Now, to rotate this about the $y$ axis, creating a surface plot, we have the following pattern:
Now, to rotate this about the $z$ axis, creating a surface plot, we have the following pattern:
```{julia}
S(u,v) = [g(u)*cos(v), g(u)*sin(v), f(u)]
@@ -204,23 +544,22 @@ us = range(a, b, length=100)
vs = range(0, 2pi, length=100)
ws = unzip(S.(us, vs')) # reorganize data
surface(ws..., zlims=(-6,6), legend=false)
plot!([0,0], [0,0], [-3,3], color=:red, linewidth=5) # y axis emphasis
plot!([(0,0,-3), (0,0,3)], line=(:red, 5)) # z axis emphasis
```
The `unzip` function is not part of base `Julia`, rather part of `CalculusWithJulia`. This function rearranges data into a form consumable by the plotting methods like `surface`. In this case, the result of `S.(us,vs')` is a grid (matrix) of points, the result of `unzip` is three grids of values, one for the $x$ values, one for the $y$ values, and one for the $z$ values. A manual adjustment to the `zlims` is used, as `aspect_ratio` does not have an effect with the `plotly()` backend and errors on 3d graphics with `pyplot()`.
The `unzip` function is not part of base `Julia`, rather part of `CalculusWithJulia` (it is really `SplitApplyCombine`'s `invert` function). This function rearranges data into a form consumable by the plotting methods like `surface`. In this case, the result of `S.(us,vs')` is a grid (matrix) of points, the result of `unzip` is three grids of values, one for the $x$ values, one for the $y$ values, and one for the $z$ values. A manual adjustment to the `zlims` is used, as `aspect_ratio` does not have an effect with the `plotly()` backend.
To rotate this about the $x$ axis, we have this pattern:
```{julia}
#| hold: true
S(u,v) = [g(u), f(u)*cos(v), f(u)*sin(v)]
us = range(a, b, length=100)
vs = range(0, 2pi, length=100)
ws = unzip(S.(us,vs'))
surface(ws..., legend=false)
plot!([3,9], [0,0],[0,0], color=:green, linewidth=5) # x axis emphasis
plot([(3,0,0), (9,0,0)], line=(:green,5)) # x axis emphasis
surface!(ws..., legend=false)
```
The above pattern covers the case of rotating the graph of a function $f(x)$ of $a,b$ by taking $g(t)=t$.
@@ -257,7 +596,7 @@ ws = unzip(S.(us,vs'))
surface(ws..., alpha=0.75)
```
We compare this answer to that of the frustum of a cone with radii $1$ and $(3/2)^2$, formed by rotating the line segment connecting $(0,f(0))$ with $(3/2,f(3/2))$. From looking at the graph of the surface, these values should be comparable. The surface area of the cone part is $\pi (r_1^2 + r_0^2) / \sin(\theta) = \pi (r_1 + r_0) \cdot \sqrt{(\Delta h)^2 + (r_1-r_0)^2}$.
We compare this answer to that of the frustum of a cone with radii $1$ and $(3/2)^2$, formed by rotating the line segment connecting $(0,f(0))$ with $(3/2,f(3/2))$. From looking at the graph of the surface, these values should be comparable. The surface area of the cone part is $\pi (r_1^2 - r_0^2) / \sin(\theta) = \pi (r_1 + r_0) \cdot \sqrt{(\Delta h)^2 + (r_1-r_0)^2}$.
```{julia}
@@ -322,13 +661,14 @@ plot(g, f, 0, 1pi)
The integrand simplifies to $8\sqrt{2}\pi \sin(t) (1 + \cos(t))^{3/2}$. This lends itself to $u$-substitution with $u=\cos(t)$.
$$
\begin{align*}
\int_0^\pi 8\sqrt{2}\pi \sin(t) (1 + \cos(t))^{3/2}
&= 8\sqrt{2}\pi \int_1^{-1} (1 + u)^{3/2} (-1) du\\
&= 8\sqrt{2}\pi (2/5) (1+u)^{5/2} \big|_{-1}^1\\
&= 8\sqrt{2}\pi (2/5) 2^{5/2} = \frac{2^7 \pi}{5}.
\end{align*}
$$
## The first Theorem of Pappus
@@ -347,7 +687,7 @@ That is, the surface area is simply the circumference of the circle traced out b
##### Example
The surface area of of an open cone can be computed, as the arc length is $\sqrt{h^2 + r^2}$ and the centroid of the line is a distance $r/2$ from the axis. This gives SA$=2\pi (r/2) \sqrt{h^2 + r^2} = \pi r \sqrt{h^2 + r^2}$.
The surface area of an open cone can be computed, as the arc length is $\sqrt{h^2 + r^2}$ and the centroid of the line is a distance $r/2$ from the axis. This gives SA$=2\pi (r/2) \sqrt{h^2 + r^2} = \pi r \sqrt{h^2 + r^2}$.
##### Example
@@ -378,11 +718,12 @@ surface(ws..., legend=false, zlims=(-12,12))
The surface area of sphere will be SA$=2\pi \rho (\pi r) = 2 \pi^2 r \cdot \rho$. What is $\rho$? The centroid of an arc formula can be derived in a manner similar to that of the centroid of a region. The formulas are:
$$
\begin{align*}
\text{cm}_x &= \frac{1}{L} \int_a^b g(t) \sqrt{g'(t)^2 + f'(t)^2} dt\\
\text{cm}_y &= \frac{1}{L} \int_a^b f(t) \sqrt{g'(t)^2 + f'(t)^2} dt.
\end{align*}
$$
Here, $L$ is the arc length of the curve.
@@ -543,46 +884,3 @@ a, b = 0, pi
val, _ = quadgk(t -> 2pi* f(t) * sqrt(g'(t)^2 + f'(t)^2), a, b)
numericq(val)
```
# Appendix
```{julia}
#| hold: true
#| echo: false
gr()
## For **some reason** having this in the natural place messes up the plots.
## {{{approximate_surface_area}}}
xs,ys = range(-1, stop=1, length=50), range(-1, stop=1, length=50)
f(x,y)= 2 - (x^2 + y^2)
dr = [1/2, 3/4]
df = [f(dr[1],0), f(dr[2],0)]
function sa_approx_graph(i)
p = plot(xs, ys, f, st=[:surface], legend=false)
for theta in range(0, stop=i/10*2pi, length=10*i )
path3d!(p,sin(theta)*dr, cos(theta)*dr, df)
end
p
end
n = 10
anim = @animate for i=1:n
sa_approx_graph(i)
end
imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)
caption = L"""
Surface of revolution of $f(x) = 2 - x^2$ about the $y$ axis. The lines segments are the images of rotating the secant line connecting $(1/2, f(1/2))$ and $(3/4, f(3/4))$. These trace out the frustum of a cone which approximates the corresponding surface area of the surface of revolution. In the limit, this approximation becomes exact and a formula for the surface area of surfaces of revolution can be used to compute the value.
"""
plotly()
ImageFile(imgfile, caption)
```

View File

@@ -5,40 +5,60 @@ This section uses these packages:
```{julia}
using SymPy
using Plots
plotly()
using Roots
```
----
```{julia}
#| echo: false
using LaTeXStrings
gr();
```
---
In the March 2003 issue of the College Mathematics Journal, Leon M Hall posed 12 questions related to the following figure:
```{julia}
#| echo: false
f(x) = x^2
fp(x) = 2x
a₀ = 7/8
q₀ = -a₀ - 1/(2a₀)
f(x) = x^2
fp(x) = 2x
tangent(x) = f(a₀) + fp(a₀) * (x - a₀)
normal(x) = f(a₀) - (1 / fp(a₀)) * (x - a₀)
function make_plot(a₀=7/8, q₀=-a₀ - 1/2a₀)
plt = plot(; xlim=(-2,2), ylim=(-1, (1.5)^2),
xticks=nothing, yticks=nothing,
aspect_ratio=:equal, border=:none, legend=false)
function make_plot()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
tangent(x) = f(a₀) + fp(a₀) * (x - a₀)
normal(x) = f(a₀) - (1 / fp(a₀)) * (x - a₀)
plt = plot(; empty_style...,
xlims=(-2,2), ylims=(-1, (1.5)^2))
f(x) = x^2
fp(x) = 2x
plot!(f, -1.5, 1.5)
plot!(zero)
plot!(f, -1.5, 1.5, line=(2, :black))
plot!([-1.6, 1.6], [0,0]; axis_style...)
tl = x -> f(a₀) + fp(a₀) * (x-a₀)
nl = x -> f(a₀) - 1/(fp(a₀)) * (x-a₀)
plot!(tl, -0.02, 1.6; linecolor=:black)
plot!(nl, -1.6, 1; linecolor=:black)
plot!(tl, -0.02, 1.6; line=(1, :forestgreen))
plot!(nl, -1.6, 1; line=(1, :forestgreen))
# add in right triangle
scatter!([a₀, q₀], f.([a₀, q₀]), markersize=5)
Δ = 0.01
annotate!([(a₀ + Δ, nl(a₀+Δ), "P", :bottom),
(q₀ - Δ, nl(q₀-Δ), "Q", :top)])
plt
annotate!([(a₀ + Δ, nl(a₀+Δ), text(L"P", :top)),
(q₀ - Δ, nl(q₀-Δ), text(L"Q", :bottom, :left))
])
current()
end
make_plot()
```
@@ -60,7 +80,7 @@ zs = solve(f(x) ~ nl, x)
q = only(filter(!=(a), zs))
```
----
---
The first question is simply:
@@ -95,7 +115,7 @@ In the remaining examples we don't show the code by default.
:::
----
---
> 1b. The length of the line segment $PQ$
@@ -113,7 +133,7 @@ lseg = sqrt((f(a) - f(q))^2 + (a - q)^2);
```
----
---
> 2a. The horizontal distance between $P$ and $Q$
@@ -131,7 +151,7 @@ plot!([q₀, a₀], [f(a₀), f(a₀)], linewidth=5)
hd = a - q;
```
----
---
> 2b. The area of the parabolic segment
@@ -152,7 +172,7 @@ plot!(xs, ys, fill=(:green, 0.25, 0))
A = simplify(integrate(nl - f(x), (x, q, a)));
```
----
---
> 2c. The volume of the rotated solid formed by revolving the parabolic segment around the vertical line $k$ units to the right of $P$ or to the left of $Q$ where $k > 0$.
@@ -162,10 +182,10 @@ A = simplify(integrate(nl - f(x), (x, q, a)));
#| code-fold: true
#| code-summary: "Show the code"
@syms k::nonnegative
V = simplify(integrate(PI * (nl - f(x) - k)^2, (x, q, a)));
V = simplify(integrate(2PI*(nl-f(x))*(a - x + k),(x, q, a)));
```
----
---
> 3. The $y$ coordinate of the centroid of the parabolic segment
@@ -194,7 +214,7 @@ yₘ = integrate( (1//2) * (nl^2 - f(x)^2), (x, q, a)) / A
yₘ = simplify(yₘ);
```
----
---
> 4. The length of the arc of the parabola between $P$ and $Q$
@@ -213,9 +233,9 @@ p
L = integrate(sqrt(1 + fp(x)^2), (x, q, a));
```
----
---
> 5. The $y$ coordinate of the midpoint ofthe line segment $PQ$
> 5. The $y$ coordinate of the midpoint of the line segment $PQ$
```{julia}
@@ -234,7 +254,7 @@ p
mp = nl(x => (a + q)/2);
```
----
---
> 6. The area of the trapezoid bound by the normal line, the $x$-axis, and the vertical lines through $P$ and $Q$.
@@ -253,7 +273,7 @@ p
trap = 1//2 * (f(q) + f(a)) * (a - q);
```
----
---
> 7. The area bounded by the parabola and the $x$ axis and the vertical lines through $P$ and $Q$
@@ -275,7 +295,7 @@ p
pa = integrate(x^2, (x, q, a));
```
----
---
> 8. The area of the surface formed by revolving the arc of the parabola between $P$ and $Q$ around the vertical line through $P$
@@ -297,11 +317,11 @@ end
#| code-summary: "Show the code"
# use parametric and 2π ∫ u(t) √(u'(t)^2 + v'(t)^2) dt
uu(x) = a - x
vv(x) = f(uu(x))
vv(x) = f(a - uu(x))
SA = 2PI * integrate(uu(x) * sqrt(diff(uu(x),x)^2 + diff(vv(x),x)^2), (x, q, a));
```
----
---
> 9. The height of the parabolic segment (i.e. the distance between the normal line and the tangent line to the parabola that is parallel to the normal line)
@@ -330,7 +350,7 @@ segment_height = sqrt((b-b)^2 + (f(b) - nl(x=>b))^2);
```
----
---
> 10. The volume of the solid formed by revolving the parabolic segment around the $x$-axis
@@ -351,7 +371,7 @@ end
Vₓ = integrate(pi * (nl^2 - f(x)^2), (x, q, a));
```
----
---
> 11. The area of the triangle bound by the normal line, the vertical line through $Q$ and the $x$-axis
@@ -372,7 +392,7 @@ plot!([p₀,q₀,q₀,p₀], [0,f(q₀),0,0];
triangle = 1/2 * f(q) * (a - f(a)/(-1/fp(a)) - q);
```
----
---
> 12. The area of the quadrilateral bound by the normal line, the tangent line, the vertical line through $Q$ and the $x$-axis
@@ -391,13 +411,13 @@ plot!([a₀,q₀,q₀,a₀-f(a₀)/fp(a₀),a₀],
# v1, v2, v3 = [[x[i]-x[1],y[i]-y[1], 0] for i in 2:4]
# area = 1//2 * last(cross(v3,v2) + cross(v2, v1)) # 1/2 area of parallelogram
# print(simplify(area))
# -(x₁ - x₂)*(y₁ - y₃)/2 + (x₁ - x₃)*(y₁ - y₂)/2 - (x₁ - x₃)*(y₁ - y₄)/2 + (x₁ - x₄)*(y₁ - y₃)/2
# (x₁ - x₂)*(y₁ - y₃)/2 - (x₁ - x₃)*(y₁ - y₂)/2 + (x₁ - x₃)*(y₁ - y₄)/2 - (x₁ - x₄)*(y₁ - y₃)/2
tl₀ = a - f(a) / fp(a)
x₁,x₂,x₃,x₄ = (a,q,q,tl₀)
y₁, y₂, y₃, y₄ = (f(a), f(q), 0, 0)
quadrilateral = -(x₁ - x₂)*(y₁ - y₃)/2 + (x₁ - x₃)*(y₁ - y₂)/2 - (x₁ - x₃)*(y₁ - y₄)/2 + (x₁ - x₄)*(y₁ - y₃)/2;
quadrilateral = (x₁ - x₂)*(y₁ - y₃)/2 - (x₁ - x₃)*(y₁ - y₂)/2 + (x₁ - x₃)*(y₁ - y₄)/2 - (x₁ - x₄)*(y₁ - y₃)/2;
```
----
---
The answers appear here in sorted order, some given as approximate floating point values:
@@ -412,7 +432,7 @@ article_answers = (1/(2sqrt(2)), 1/2, sqrt(3/10), 0.558480, 0.564641,
#| echo: false
# check
problems = ("1a"=>yvalue, "1b"=>lseg, "1c"=>hd,
"2a" => A, "2b" => V,
"2a" => A, "2b" => V(k=>1),
"3" => yₘ,
"4" => L,
"5" => mp,
@@ -426,7 +446,7 @@ problems = ("1a"=>yvalue, "1b"=>lseg, "1c"=>hd,
)
≈ₒ(a,b) = isapprox(a, b; atol=1e-5, rtol=sqrt(eps()))
∂ = Differential(a)
solutions = [k => only(solve(∂(p) ~ 0, a)) for (k,p) in problems]
solutions = [k => (find_zero(∂(p), 0.5)) for (k,p) in problems]
[(sol=k, correct=(any(isapprox.(s, article_answers; atol=1e-5)))) for (k,s) ∈ solutions]
nothing
```

View File

@@ -19,7 +19,44 @@ using SymPy
```{julia}
#| echo: false
#| results: "hidden"
import LinearAlgebra: norm
import LinearAlgebra: norm, cross
using SplitApplyCombine
nothing
```
```{julia}
#| echo: false
# commands used for plotting from https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09surf_of_rotation2.py
#linear projection of R^3 onto R^2
function _proj(X, v)
# a is ⟂ to v and b is v × a
vx, vy, vz = v
a = [-vy, vx, 0]
b = cross([vx,vy,vz], a)
a, b = a/norm(a), b/norm(b)
return (a ⋅ X, b ⋅ X)
end
# project a curve in R3 onto R2
pline(viewp, ps...) = [_proj(p, viewp) for p in ps]
# determinant of Jacobian; area multiplier
# det(J); used to identify folds
function jac(X, u, v)
return det(ForwardDiff.jacobian(xs -> collect(X(xs...)), [u,v]))
end
function _fold(F, t, θmin, θmax)
λ = θ -> jac(F, t, θ) # F is projected surface, psurf
iszero(λ(θmin)) && return θmin
iszero(λ(θmax)) && return θmax
return solve(ZeroProblem(λ, (θmin, θmax)))
end
nothing
```
@@ -122,6 +159,103 @@ The formula is for a rotation around the $x$-axis, but can easily be generalized
:::
::: {#fig-solid-of-revolution}
```{julia}
#| echo: false
plt = let
gr()
# Follow lead of # https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09surf_of_rotation2.py
# plot surface of revolution around x axis between [0, 3]
# best if r(t) decreases
rad(x) = 2/(1 + exp(x))
trange = (0,3)
θrange = (0, 2pi)
viewp = [2,-2, 1]
##
proj(X) = _proj(X, viewp)
# surface of revolution
surf(t, z) = [t, rad(t)*cos(z), rad(t)*sin(z)]
# project the surface at (t, a=theta)
psurf(t,z) = proj(surf(t,z))
# create shape holding project disc
drawdiscF(t) = Shape(invert([psurf(t, 2*i*pi/100) for i in 1:101])...)
α = 1.0 # opacity
line_style = (; line=(:black, 1))
plot(; empty_style..., aspect_ratio=:equal)
# by layering, we get x-axis as desired
plot!(pline(viewp, [-1,0,0], [0,0,0]); line_style...)
plot!(drawdiscF(0); fill =(:lightgray, α))
plot!(pline(viewp, [0,0,0], [1,0,0]); line_style...)
plot!(drawdiscF(1); fill =(:black, α)) # black to lightgray gives thickness
plot!(drawdiscF(1.1); fill=(:lightgray, α))
plot!(pline(viewp, [1.1,0,0], [2,0,0]); line_style...)
plot!(drawdiscF(2); fill=(:lightgray, α))
plot!(pline(viewp, [2,0,0], [3,0,0]); line_style...)
plot!(drawdiscF(3); fill=(:lightgray, α))
plot!(pline(viewp, [3,0,0], [4,0,0]); line_style..., arrow=true, side=:head)
plot!(pline(viewp, [0,0,0], [0,0,1.25]); line_style..., arrow=true, side=:head)
tt = range(trange..., 30)
curve = psurf.(tt, pi/2)
plot!(curve; line=(:black, 2))
f1 = [(t, _fold(psurf, t, 0, pi)) for t in tt]
curve = [psurf(f[1], f[2]) for f in f1]
plot!(curve; line=(:black,1))
f2 = [(t, _fold(psurf, t, pi, 2*pi)) for t in tt]
curve = [psurf(f[1], f[2]) for f in f2]
plot!(curve; line=(:black,1))
## find bottom edge (t,θ) again
tt = range(0, 3, 120)
f1 = [(t, _fold(psurf, t, pi, 2*pi)) for t in range(trange..., 100)]
# shade bottom by adding bigger density of lines near bottom
for (i,f) ∈ enumerate(f1)
λ = iseven(i) ? 6 : 4 # adjust density by have some lines only extend to 6
isnan(f[1]) || isnan(f[2]) && continue
curve = [psurf(f[1], θ) for θ in range(f[2] - 0.2*(λ - f[1]), f[2], 20)]
plot!(curve; line=(:black, 1))
end
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration of a figure being rotated around the $x$-axis. The discs have approximate volume given by the area of the base times the height or $\pi r(x)^2 \Delta x$. (Figure ported from @Angenent.)
:::
For a numeric example, we consider the original Red [Solo](http://en.wikipedia.org/wiki/Red_Solo_Cup) Cup. The dimensions of the cup were basically: a top diameter of $d_1 = 3~ \frac{3}{4}$ inches, a bottom diameter of $d_0 = 2~ \frac{1}{2}$ inches and a height of $h = 4~ \frac{3}{4}$ inches.
@@ -352,6 +486,109 @@ $$
V = \int_a^b \pi \cdot (R(x)^2 - r(x)^2) dx.
$$
::: {#fig-washer-illustration}
```{julia}
#| echo: false
plt = let
gr()
# Follow lead of # https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09surf_of_rotation2.py
# plot surface of revolution around x axis between [0, 3]
# best if r(t) decreases
rad(x) = 2/(1 + exp(x))
trange = (0, 3)
θrange = (0, 2pi)
viewp = [2,-2,1]
##
proj(X) = _proj(X, viewp)
# surface of revolution
surf(t, z) = [t, rad(t)*cos(z), rad(t)*sin(z)]
surf2(t, z) = (t, rad(t)*cos(z)/2, rad(t)*sin(z)/2)
# project the surface at (t, a=theta)
psurf(t,z) = proj(surf(t,z))
psurf2(t, z) = proj(surf2(t,z))
# create shape holding project disc
drawdiscF(t) = Shape(invert([psurf(t, 2*i*pi/100) for i in 1:101])...)
drawdiscI(t) = Shape([psurf2(t, 2*i*pi/100) for i in 1:101])
α = 1.0
line_style = (; line=(:black, 1))
plot(; empty_style..., aspect_ratio=:equal)
# by layering, we get x-axis as desired
plot!(pline(viewp, [-1,0,0], [0,0,0]); line_style...)
plot!(drawdiscF(0); fill =(:lightgray, α))
plot!(drawdiscI(0); fill=(:white, .5))
plot!(pline(viewp, [0,0,0], [1,0,0]); line_style...)
plot!(drawdiscF(1); fill =(:black, α)) # black to lightgray gives thickness
plot!(drawdiscI(1); fill=(:white, .5))
plot!(drawdiscF(1.1); fill=(:lightgray, α))
plot!(drawdiscI(1.1); fill=(:white, .5))
plot!(pline(viewp, [1.1,0,0], [2,0,0]); line_style...)
plot!(drawdiscF(2); fill=(:lightgray, α))
plot!(drawdiscI(2); fill=(:white, .5))
plot!(pline(viewp, [2,0,0], [3,0,0]); line_style...)
plot!(drawdiscF(3); fill=(:lightgray, α))
plot!(drawdiscI(3); fill=(:white, .5))
plot!(pline(viewp, [3,0,0], [4,0,0]); line_style..., arrow=true, side=:head)
plot!(pline(viewp, [0,0,0], [0,0,1.25]); line_style..., arrow=true, side=:head)
## bounding curves
### main spine
tt = range(trange..., 30)
curve = [psurf(t, pi/2) for t in tt]
plot!(curve; line=(:black, 2))
### the folds
f1 = [(t, _fold(psurf, t, 0, pi)) for t in tt]
curve = [psurf(f[1], f[2]) for f in f1]
plot!(curve; line=(:black,))
f2 = [(t, _fold(psurf, t, pi, 2*pi)) for t in tt]
curve = [psurf(f[1], f[2]) for f in f2]
plot!(curve; line=(:black,))
## add shading
### find bottom edge (t,θ) again
f1 = [[t, _fold(psurf, t, pi, 2*pi)] for t in range(trange..., 120)]
### shade bottom by adding bigger density of lines near bottom
for (i,f) ∈ enumerate(f1)
λ = iseven(i) ? 6 : 4 # adjust density by have some lines only extend to 6
isnan(f[1]) || isnan(f[2]) && continue
curve = [psurf(f[1], θ) for θ in range(f[2] - 0.2*(λ - f[1]), f[2], 20)]
plot!(curve; line=(:black, 1))
end
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Modification of earlier figure to show washer method. The interior volume would be given by $\int_a^b \pi r(x)^2 dx$, the entire volume by $\int_a^b \pi R(x)^2 dx$. The difference then is the volume computed by the washer method.
:::
##### Example
@@ -410,21 +647,86 @@ For a general cone, we use this [definition](http://en.wikipedia.org/wiki/Cone):
Let $h$ be the distance from the apex to the base. Consider cones with the property that all planes parallel to the base intersect the cone with the same shape, though perhaps a different scale. This figure shows an example, with the rays coming from the apex defining the volume.
::: {#fig-generic-cone}
```{julia}
#| echo: false
plt = let
gr()
rad(t) = 3/2 - t
trange = (0, 3/2)
θrange = (0, 2pi)
viewp = [2,-1/1.5,1/2+.2]
##
proj(X) = _proj(X, viewp)
# our surface
R, r, rho = 1, 1/4, 1/4
f(t) = (R-r) * cos(t) + rho * cos((R-r)/r * t)
g(t) = (R-r) * sin(t) - rho * sin((R-r)/r * t)
surf(t, θ) = (rad(t)*f(θ), rad(t)*g(θ), t)
psurf(t,θ) = proj(surf(t,θ))
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
drawdiscF(t) = Shape([psurf(t, 2*i*pi/100) for i in 1:101])
plot(; empty_style..., aspect_ratio=:equal)
for (i,t) in enumerate(range(0, 3/2, 30))
plot!(drawdiscF(t); fill=(:gray,1), line=(:black,1))
end
θ = 0; plot!([psurf(0, θ), psurf(3/2, θ)]; line=(:black, 2))
θ = pi/2; plot!([psurf(0, θ), psurf(3/2, θ)]; line=(:black, 1))
θ = 3pi/2; plot!([psurf(0, θ), psurf(3/2, θ)]; line=(:black, 1))
current()
end
plt
```
```{julia}
#| hold: true
#| echo: false
h = 5
R, r, rho = 1, 1/4, 1/4
f(t) = (R-r) * cos(t) + rho * cos((R-r)/r * t)
g(t) = (R-r) * sin(t) - rho * sin((R-r)/r * t)
ts = range(0, 2pi, length=100)
plotly()
nothing
```
p = plot(f.(ts), g.(ts), zero.(ts), legend=false)
for t ∈ range(0, 2pi, length=25)
plot!(p, [0,f(t)], [0,g(t)], [h, 0], linecolor=:red)
A "cone" formed from the parameterized curve
$r(t) = \langle
(R-r) \cdot \cos(t) + \rho \cdot \cos((R-r)/r \cdot t),
(R-r) \cdot \sin(t) - \rho \cdot \sin((R-r)/r \cdot t)
\rangle$ with apex at the point $[0,0,3/2]$ and rays extending down through the origin following $3/2-z$.
:::
```{julia}
#| echo: false
#| eval: false
plt = let
h = 5
R, r, rho = 1, 1/4, 1/4
f(t) = (R-r) * cos(t) + rho * cos((R-r)/r * t)
g(t) = (R-r) * sin(t) - rho * sin((R-r)/r * t)
ts = range(0, 2pi, length=100)
plot(f.(ts), g.(ts), zero.(ts), legend=false)
for t ∈ range(0, 2pi, length=25)
plot!([0,f(t)], [0,g(t)], [h, 0], linecolor=:red)
end
current()
end
p
plt
```
A right circular cone is one where this shape is a circle. This definition can be more general, as a square-based right pyramid is also such a cone. After possibly reorienting the cone in space so the base is at $u=0$ and the apex at $u=h$ the volume of the cone can be found from:
@@ -450,6 +752,78 @@ $$
This gives a general formula for the volume of such cones.
::: {#fig-cross-sections}
```{julia}
#| echo: false
plt = let
gr()
# sections
# https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09Xsections.py
x(h,z) = 0.3*h^2+(0.6-0.2*h)*cos(z)
y(h,z) = h+(0.3-0.2*h)*sin(z)+0.05*sin(4*z)
r(h,z) = (x(h,z), y(h,z))
r1(h,z) = (2,0) .+ r(h,z)
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
Nh=30
heights = range(-1/2, 1/2, Nh)
h0=heights[Nh ÷ 2]
h1=heights[Nh ÷ 2 + 1]
hs = [heights[1], h0, h1, heights[end]]
ts = range(0, 2pi, 300)
plot(; empty_style..., aspect_ratio=:equal)
# stack the curves
for h in heights
curve = r.(h, ts)
plot!(Shape(curve); fill=(:white, 1.0), line=(:black, 1))
end
# shape pull outs; use black to give thickness
for (h, color) in zip(hs, (:white, :black, :white, :white))
curve = r1.(h,ts)
plot!(Shape(curve); fill=(color,1.0), line=(:black, 1,))
end
# axis with marked points
plot!([(-1,-1), (-1, 1)]; axis_style...)
pts = [(-1, y(h, pi)) for h in hs]
scatter!(pts, marker=(5, :circle))
# connect with dashes
for h in hs
plot!([(-1, y(h, pi)), r(h,pi)]; line=(:black, 1, :dash))
plot!([r(h,0), r1(h,pi)]; line=(:black, 1, :dash))
end
current()
end
plt
```
This figure shows the volume of a figure being comprised of slices. A discrete approximation would be found by estimating the volume of each slice by the cross sectional area times a small $\Delta h$. This leads to a formula
$V = \int_a^b A(h)dh$, where $A$ computes the cross sectional area.
(This figure was ported from @Angenent.)
:::
```{julia}
#| echo: false
plotly()
nothing
```
### Cavalieri's method
@@ -457,39 +831,236 @@ This gives a general formula for the volume of such cones.
[Cavalieri's](http://tinyurl.com/oda9xd9) Principle is "Suppose two regions in three-space (solids) are included between two parallel planes. If every plane parallel to these two planes intersects both regions in cross-sections of equal area, then the two regions have equal volumes." (Wikipedia).
With the formula for the volume of solids based on cross sections, this is a trivial observation, as the functions giving the cross-sectional area are identical. Still, it can be surprising. Consider a sphere with an interior cylinder bored out of it. (The [Napkin](http://tinyurl.com/o237v83) ring problem.) The bore has height $h$ - for larger radius spheres this means very wide bores.
::: {#fig-Cavalieris-first}
```{julia}
#| echo: false
plt = let
gr()
x(h,z) = (0.6-0.2*h) * cos(z)
y(h,z) = h + (0.2-0.15*h) * sin(z) + 0.01 * sin(4*z)
xa(h,z) = 2 + 0.1 * cos(7*pi*h) + (0.6-0.2*h)*cos(z)
heights = range(-1/2, 1/2, 50)
ts = range(0, 2pi, 300)
h0 = heights[25]
h1 = heights[26]
plot(; empty_style..., aspect_ratio=:equal)
for h in heights
curve=[(x(h, t), y(h, t)) for t in ts]
plot!(Shape(curve); fill=(:white,), line=(:black,1))
curve=[(xa(h, t), y(h, t)) for t in ts]
plot!(Shape(curve); fill=(:white,), line=(:black,1))
end
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration of Cavalieri's first principle. The discs from the left are moved around to form the left volume, but as the volumes of each cross-sectional disc remains the same, the two valumes are equally approximated. (This figure ported from @Angenent.)
:::
With the formula for the volume of solids based on cross sections, this is a trivial observation, as the functions giving the cross-sectional area are identical. Still, it can be surprising.
Consider a sphere with an interior cylinder bored out of it. (The [Napkin](http://tinyurl.com/o237v83) ring problem.) The bore has height $h$ - for larger radius spheres this means very wide bores.
::: {#fig-napkin-ring-1}
```{julia}
#| echo: false
plt = let
# Follow lead of # https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09surf_of_rotation2.py
# plot surface of revolution around x axis between [0, 3]
# best if r(t) decreases
rad(t) = (t = clamp(t, -1, 1); sqrt(1 - t^2))
rad2(t) = 1/2
viewp = [2,-2,1]
##
function _proj(X, v)
# a is ⟂ to v and b is v × a
vx, vy, vz = v
a = [-vy, vx, 0]
b = cross([vx,vy,vz], a)
a, b = a/norm(a), b/norm(b)
return (a ⋅ X, b ⋅ X)
end
# project a curve in R3 onto R2
pline(viewp, ps...) = [_proj(p, viewp) for p in ps]
# determinant of Jacobian; area multiplier
# det(J); used to identify folds
function jac(X, u, v)
return det(ForwardDiff.jacobian(xs -> collect(X(xs...)), [u,v]))
end
function _fold(F, t, θmin, θmax)
λ = θ -> jac(F, t, θ) # F is projected surface, psurf
iszero(λ(θmin)) && return θmin
iszero(λ(θmax)) && return θmax
return solve(ZeroProblem(λ, (θmin, θmax)))
end
##
proj(X) = _proj(X, viewp)
# surface of revolution about the z axis
surf(t, z) = (rad(t)*cos(z), rad(t)*sin(z), t)
surf2(t, z) = (rad2(t)*cos(z), rad2(t)*sin(z), t)
# project the surface at (t, a=theta)
psurf(t,z) = proj(surf(t,z))
psurf2(t, z) = proj(surf2(t,z))
bisect(f, a, b) = find_zero(f, (a,b), Bisection())
# create shape holding project disc
drawdiscF(t) = Shape([psurf(t, 2*i*pi/100) for i in 1:101])
drawdiscI(t) = Shape([psurf2(t, 2*i*pi/100) for i in 1:101])
α = 1.0
line_style = (; line=(:black, 1))
plot(; empty_style..., aspect_ratio=:equal)
# washer
t0 = sqrt(3/4)
Δ = .03
δ = 0.785398 + 0.05
x₀ = -.25
plot!(drawdiscF(x₀-Δ); fill=(:black,), line=(:black,1))
plot!(drawdiscF(x₀); fill=(:orange,), line=(:black,1))
plot!(drawdiscI(x₀); fill=(:white,1.0), line=(:black,1))
x₀ = 0.35
plot!(drawdiscF(x₀-Δ); fill=(:black,), line=(:black,1))
plot!(drawdiscF(x₀); fill=(:orange,), line=(:black,1))
plot!(drawdiscI(x₀); fill=(:white,1.0), line=(:black,1))
z0 = 3pi/2 - δ
plot!(pline(viewp, surf(t0, z0), surf(-t0, z0)); line=(:black, 1))
plot!(pline(viewp, surf(t0, z0+pi), surf(-t0, z0+pi)); line=(:black, 1))
# caps
curve = [psurf(t0, θ) for θ in range(0, 2pi, 100)]
plot!(curve, line=(:black, 2))
curve = [psurf(-t0, θ) for θ in range(0, 2pi, 100)]
plot!(curve, line=(:black, 2))
## folds
tθs = [(t, _fold(psurf, t, 0,pi)) for t in range(-t0, t0, 50)]
curve = [psurf(t, θ) for (t,θ) ∈ tθs]
plot!(curve, line=(:black, 3))
tθs = [(t, _fold(psurf, t, pi, 2pi)) for t in range(-t0, t0, 50)]
curve = [psurf(t, θ) for (t,θ) ∈ tθs]
plot!(curve, line=(:black, 3))
# Shade lines
δ = pi/6
Δₜ = (4pi/2 - (3pi/2 - δ))/(2*25)
for θ ∈ range(3pi/2-δ, 4pi/2, 25)
curve = [psurf(t, θ) for t in
range(-t0, max(-t0, -t0 + 1/2*sin(θ+δ+pi/2 + pi/2)), 20)]
plot!(curve, line=(:black, 1))
curve = [psurf(t, θ+Δₜ) for t in
range(-t0, max(-t0, -t0 + 1/3*sin(θ+δ+pi/2 + pi/2)), 20)]
plot!(curve, line=(:black, 1))
end
#=
f1 = [[t, _fold(psurf, t, 0, pi/2)] for t in range(-0.5, -0.1, 26)]
for f in f1
plot!([psurf( f[1], f[2]-k*0.01*(6-f[1]) )
for k in 1:21]; line=(:black, 1))
end
=#
current()
end
plt
```
Figure showing sphere with interior cylinder bored out.
:::
This cross-sectional figure is used to better understand the key dimensions.
::: {#fig-napkin-ring-2}
```{julia}
#| hold: true
#| echo: false
#The following illustrates $R=5$ and $h=8$.
#The following illustrates $R=1$ and $h=2sqrt(3/4$.
plt = let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
R =5; h1 = 2*4
R = 1; h = 2*sqrt(3/4)
theta = asin(h1/2/R)
thetas = range(-theta, stop=theta, length=100)
ts = range(-pi, stop=pi, length=100)
y = h1/4
θ = theta = asin(h/2/R)
thetas = range(-theta, stop=theta, length=100)
ts = range(-pi, stop=pi, length=100)
y = h/4
p = plot(legend=false, aspect_ratio=:equal);
plot!(p, R*cos.(ts), R*sin.(ts));
plot!(p, R*cos.(thetas), R*sin.(thetas), color=:orange);
plot(; empty_style..., aspect_ratio=:equal)
plot!(R*cos.(ts), R*sin.(ts); line=(:black,));
plot!(R*cos.(thetas), R*sin.(thetas), line=(:orange,1));
plot!(p, [R*cos.(theta), R*cos.(theta)], [h1/2, -h1/2], color=:orange);
plot!(p, [R*cos.(theta), sqrt(R^2 - y^2)], [y, y], color=:orange)
plot!([R*cos.(theta), R*cos.(theta)], [h/2, -h/2]; color=:orange);
plot!([R*cos.(theta), sqrt(R^2 - y^2)], [y, y]; line=(:orange,3))
plot!(p, [0, R*cos.(theta)], [0,0], color=:red);
plot!(p,[ 0, R*cos.(theta)], [0,h1/2], color=:red);
plot!([0, R*cos.(theta)], [0,0], color=:red);
plot!([ 0, R*cos.(theta)], [0,h/2], color=:red);
annotate!(p, [(.5, -2/3, "sqrt(R²- (h/2)²)"),
(R*cos.(theta)-.6, h1/4, "h/2"),
(1.5, 1.75*tan.(theta), "R")])
p
x₀ = sqrt(R^2 - (h/2)^2)
annotate!( [
(x₀/2, 0, text(L"\sqrt{R^2- (\frac{h}{2})^2}",10, :top)),
(x₀, h/4, text(L"\frac{h}{2}",:right)),
(R/2*cos(θ),R/2*sin(θ), text(L"R", :bottom; rotation=rad2deg(θ)))
])
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Side view illustrating key dimensions of napkin ring problem with $R$ being the radius of the sphere and $h$ being the height of the resulting interior cylinder.
:::
The small orange line is rotated, so using the washer method we get the cross sections given by $\pi(r_0^2 - r_i^2)$, the outer and inner radii, as a function of $y$.

View File

@@ -2,7 +2,12 @@
CalculusWithJulia = "a2e0e22d-7d4c-5312-9169-8b992201a882"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
Richardson = "708f8203-808e-40c0-ba2d-98a6953ed40d"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -16,6 +16,8 @@ using SymPy
---
![A Möbius strip by Koo Jeong A](figures/korean-mobius.jpg){width=40%}
The definition Google finds for *continuous* is *forming an unbroken whole; without interruption*.
@@ -54,12 +56,15 @@ However, [Cauchy](http://en.wikipedia.org/wiki/Cours_d%27Analyse) defined contin
The [modern](http://en.wikipedia.org/wiki/Continuous_function#History) definition simply pushes the details to the definition of the limit:
::: {.callout-note icon=false}
## Definition of continuity at a point
> A function $f(x)$ is continuous at $x=c$ if $\lim_{x \rightarrow c}f(x) = f(c)$.
A function $f(x)$ is continuous at $x=c$ if $\lim_{x \rightarrow c}f(x) = f(c)$.
:::
This says three things
The definition says three things
* The limit exists at $c$.
@@ -67,11 +72,14 @@ This says three things
* The value of the limit is the same as $f(c)$.
This speaks to continuity at a point, we can extend this to continuity over an interval $(a,b)$ by saying:
The definition speaks to continuity at a point, we can extend it to continuity over an interval $(a,b)$ by saying:
::: {.callout-note icon=false}
## Definition of continuity over an open interval
> A function $f(x)$ is continuous over $(a,b)$ if at each point $c$ with $a < c < b$, $f(x)$ is continuous at $c$.
A function $f(x)$ is continuous over $(a,b)$ if at each point $c$ with $a < c < b$, $f(x)$ is continuous at $c$.
:::
Finally, as with limits, it can be convenient to speak of *right* continuity and *left* continuity at a point, where the limit in the definition is replaced by a right or left limit, as appropriate.
@@ -122,9 +130,9 @@ There are various reasons why a function may not be continuous.
$$
f(x) = \begin{cases}
-1 & x < 0 \\
0 & x = 0 \\
1 & x > 0
-1 &~ x < 0 \\
0 &~ x = 0 \\
1 &~ x > 0
\end{cases}
$$
@@ -140,25 +148,57 @@ is implemented by `Julia`'s `sign` function. It has a value at $0$, but no limit
plot([-1,-.01], [-1,-.01], legend=false, color=:black)
plot!([.01, 1], [.01, 1], color=:black)
scatter!([0], [1/2], markersize=5, markershape=:circle)
ts = range(0, 2pi, 100)
C = Shape(0.02 * sin.(ts), 0.03 * cos.(ts))
plot!(C, fill=(:white,1), line=(:black, 1))
```
is not continuous at $x=0$. It has a limit of $0$ at $0$, a function value $f(0) =1/2$, but the limit and the function value are not equal.
* The `floor` function, which rounds down to the nearest integer, is also not continuous at the integers, but is right continuous at the integers, as, for example, $\lim_{x \rightarrow 0+} f(x) = f(0)$. This graph emphasizes the right continuity by placing a point for the value of the function when there is a jump:
* The `floor` function, which rounds down to the nearest integer, is also not continuous at the integers, but is right continuous at the integers, as, for example, $\lim_{x \rightarrow 0+} f(x) = f(0)$. This graph emphasizes the right continuity by placing a filled marker for the value of the function when there is a jump and an open marker where the function is not that value.
```{julia}
#| hold: true
#| echo: false
x = [0,1]; y=[0,0]
plt = plot(x.-2, y.-2, color=:black, legend=false)
plot!(plt, x.-1, y.-1, color=:black)
plot!(plt, x.-0, y.-0, color=:black)
plot!(plt, x.+1, y.+1, color=:black)
plot!(plt, x.+2, y.+2, color=:black)
scatter!(plt, [-2,-1,0,1,2], [-2,-1,0,1,2], markersize=5, markershape=:circle)
plt = let
empty_style = (xticks=-4:4, yticks=-4:4,
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35), line=nothing)
range_style = (; fill=(:blue, 0.35), line=nothing)
ts = range(0, 2pi, 100)
xys = sincos.(ts)
xys = [.1 .* xy for xy in xys]
plot(; empty_style..., aspect_ratio=:equal)
plot!([-4.25,4.25], [0,0]; axis_style...)
plot!([0,0], [-4.25, 4.25]; axis_style...)
for k in -4:4
P,Q = (k,k),(k+1,k)
plot!([P,Q], line=(:black,1))
S = Shape([k .+ xy for xy in xys])
plot!(S; fill=(:black,))
S = Shape([(k+1,k) .+ xy for xy in xys])
plot!(S; fill=(:white,), line=(:black,1))
end
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
* The function $f(x) = 1/x^2$ is not continuous at $x=0$: $f(x)$ is not defined at $x=0$ and $f(x)$ has no limit at $x=0$ (in the usual sense).
@@ -168,8 +208,8 @@ plt
$$
f(x) =
\begin{cases}
0 & \text{if } x \text{ is irrational,}\\
1 & \text{if } x \text{ is rational.}
0 &~ \text{if } x \text{ is irrational,}\\
1 &~ \text{if } x \text{ is rational.}
\end{cases}
$$
@@ -184,15 +224,15 @@ Let a function be defined by cases:
$$
f(x) = \begin{cases}
3x^2 + c & x \geq 0,\\
2x-3 & x < 0.
3x^2 + c &~ x \geq 0,\\
2x-3 &~ x < 0.
\end{cases}
$$
What value of $c$ will make $f(x)$ a continuous function?
We note that for $x < 0$ and for $x > 0$ the function is a simple polynomial, so is continuous. At $x=0$ to be continuous we need a limit to exists and be equal to $f(0)$, which is $c$. A limit exists if the left and right limits are equal. This means we need to solve for $c$ to make the left and right limits equal. We do this next with a bit of overkill in this case:
We note that for $x < 0$ and for $x > 0$ the function is defined by a simple polynomial, so is continuous. At $x=0$ to be continuous we need a limit to exists and be equal to $f(0)$, which is $c$. A limit exists if the left and right limits are equal. This means we need to solve for $c$ to make the left and right limits equal. We do this next with a bit of overkill in this case:
```{julia}
@@ -206,7 +246,7 @@ We need to solve for $c$ to make `del` zero:
```{julia}
solve(del, c)
solve(del ~ 0, c)
```
This gives the value of $c$.
@@ -375,8 +415,8 @@ Let $f(x)$ be defined by
$$
f(x) = \begin{cases}
c + \sin(2x - \pi/2) & x > 0\\
3x - 4 & x \leq 0.
c + \sin(2x - \pi/2) &~ x > 0\\
3x - 4 &~ x \leq 0.
\end{cases}
$$
@@ -415,12 +455,22 @@ Consider the function $f(x)$ given by the following graph
```{julia}
#| hold: true
#| echo: false
xs = range(0, stop=2, length=50)
plot(xs, [sqrt(1 - (x-1)^2) for x in xs], legend=false, xlims=(0,4))
plot!([2,3], [1,0])
scatter!([3],[0], markersize=5)
plot!([3,4],[1,0])
scatter!([4],[0], markersize=5)
let
xs = range(0, stop=2, length=50)
plot(xs, [sqrt(1 - (x-1)^2) for x in xs];
line=(:black,1),
legend=false, xlims=(-0.1,4.1))
plot!([2,3], [1,0]; line=(:black,1))
plot!([3,4],[1,0]; line=(:black,1))
scatter!([(0,0)], markersize=5, markercolor=:black)
scatter!([(2,0)], markersize=5, markercolor=:white)
scatter!([(2, 1)], markersize=5; markercolor=:black)
scatter!([(3,0)], markersize=5; markercolor=:black)
scatter!([(3,1)], markersize=5; markercolor=:white)
scatter!([(4,0)], markersize=5; markercolor=:black)
end
```
The function $f(x)$ is continuous at $x=1$?
@@ -505,3 +555,29 @@ choices = ["Can't tell",
answ = 1
radioq(choices, answ)
```
###### Question
A parametric equation is specified by a parameterization $(f(t), g(t)), a \leq t \leq b$. The parameterization will be continuous if and only if each function is continuous.
Suppose $k_x$ and $k_y$ are positive integers and $a, b$ are positive numbers, will the [Lissajous](https://en.wikipedia.org/wiki/Parametric_equation#Lissajous_Curve) curve given by $(a\cos(k_x t), b\sin(k_y t))$ be continuous?
```{julia}
#| hold: true
#| echo: false
yesnoq(true)
```
Here is a sample graph for $a=1, b=2, k_x=3, k_y=4$:
```{julia}
#| hold: true
a,b = 1, 2
k_x, k_y = 3, 4
plot(t -> a * cos(k_x *t), t-> b * sin(k_y * t), 0, 4pi)
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 438 KiB

View File

@@ -17,63 +17,94 @@ using SymPy
---
![Between points M and M lies an F for a continuous curve. [L'Hospitals](https://ia801601.us.archive.org/26/items/infinimentpetits1716lhos00uoft/infinimentpetits1716lhos00uoft.pdf) figure 55.](figures/ivt.jpg){width=40%}
Continuity for functions is a valued property which carries implications. In this section we discuss two: the intermediate value theorem and the extreme value theorem. These two theorems speak to some fundamental applications of calculus: finding zeros of a function and finding extrema of a function.
## Intermediate Value Theorem
::: {.callout-note icon=false}
## The intermediate value theorem
> The *intermediate value theorem*: If $f$ is continuous on $[a,b]$ with, say, $f(a) < f(b)$, then for any $y$ with $f(a) \leq y \leq f(b)$ there exists a $c$ in $[a,b]$ with $f(c) = y$.
If $f$ is continuous on $[a,b]$ with, say, $f(a) < f(b)$, then for any $y$ with $f(a) \leq y \leq f(b)$ there exists a $c$ in $[a,b]$ with $f(c) = y$.
:::
::: {#fig-IVT}
```{julia}
#| hold: true
#| echo: false
#| cache: true
### {{{IVT}}}
gr()
function IVT_graph(n)
f(x) = sin(pi*x) + 9x/10
a,b = [0,3]
let
gr()
# IVT
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35), line=nothing)
range_style = (; fill=(:blue, 0.35), line=nothing)
xs = range(a,stop=b, length=50)
f(x) = x + sinpi(3x) + 5sin(2x) + 3cospi(2x)
a, b = -1, 5
xs = range(a, b, 251)
ys = f.(xs)
y0, y1 = extrema(ys)
plot(; empty_style...)
plot!(f, a, b; fn_style...)
## cheat -- pick an x, then find a y
Δ = .2
x = range(a + Δ, stop=b - Δ, length=6)[n]
y = f(x)
plot!([a-.2, b + .2],[0,0]; axis_style...)
plot!([a-.1, a-.1], [y0-2, y1+2]; axis_style...)
plt = plot(f, a, b, legend=false, size=fig_size)
plot!(plt, [0,x,x], [f(x),f(x),0], color=:orange, linewidth=3)
plot!([(a,0),(a,f(a))]; line=(:black, 1, :dash))
plot!([(b,0),(b,f(b))]; line=(:black, 1, :dash))
plt
m = f(a/2 + b/2) + 1.5
plot!([a, b], [m,m]; line=(:black, 1, :dashdot))
δx = 0.03
plot!(Shape([a,b,b,a], 4*δx*[-1,-1,1,1]);
domain_style...)
plot!(Shape((a-.1) .+ 2δx * [-1,1,1,-1], [f(a),f(a),f(b), f(b)]);
range_style...)
plot!(Shape((a-.1) .+ 2δx/3 * [-1,1,1,-1], [y0,y0,y1,y1]);
range_style...)
zs = find_zeros(x -> f(x) - m, (a,b))
c = zs[2]
plot!([(c,0), (c,f(c))]; line=(:black, 1, :dashdot))
annotate!([
(a, 0, text(L"a", 12, :bottom)),
(b, 0, text(L"b", 12, :top)),
(c, 0, text(L"c", 12, :top)),
(a-.1, f(a), text(L"f(a)", 12, :right)),
(a-.1, f(b), text(L"f(b)", 12, :right)),
(a-0.2, m, text(L"y", 12, :right)),
])
end
n = 6
anim = @animate for i=1:n
IVT_graph(i)
end
imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)
caption = L"""
Illustration of intermediate value theorem. The theorem implies that any randomly chosen $y$
value between $f(a)$ and $f(b)$ will have at least one $x$ in $[a,b]$
with $f(x)=y$.
"""
plotly()
ImageFile(imgfile, caption)
```
In the early years of calculus, the intermediate value theorem was intricately connected with the definition of continuity, now it is a consequence.
```{julia}
#| echo: false
plotly()
nothing
```
Illustration of the intermediate value theorem. The theorem implies that any randomly chosen $y$ value between $f(a)$ and $f(b)$ will have at least one $c$ in $[a,b]$ with $f(c)=y$. This graphic shows one of several possible values for the given choice of $y$.
:::
In the early years of calculus, the intermediate value theorem was intricately connected with the definition of continuity, now it is an important consequence.
The basic proof starts with a set of points in $[a,b]$: $C = \{x \text{ in } [a,b] \text{ with } f(x) \leq y\}$. The set is not empty (as $a$ is in $C$) so it *must* have a largest value, call it $c$ (this might seem obvious, but it requires the completeness property of the real numbers). By continuity of $f$, it can be shown that $\lim_{x \rightarrow c-} f(x) = f(c) \leq y$ and $\lim_{y \rightarrow c+}f(x) =f(c) \geq y$, which forces $f(c) = y$.
@@ -85,26 +116,16 @@ The basic proof starts with a set of points in $[a,b]$: $C = \{x \text{ in } [a,
Suppose we have a continuous function $f(x)$ on $[a,b]$ with $f(a) < 0$ and $f(b) > 0$. Then as $f(a) < 0 < f(b)$, the intermediate value theorem guarantees the existence of a $c$ in $[a,b]$ with $f(c) = 0$. This was a special case of the intermediate value theorem proved by Bolzano first. Such $c$ are called *zeros* of the function $f$.
We use this fact when building a "sign chart" of a polynomial function. Between any two consecutive real zeros the polynomial can not change sign. (Why?) So a "test point" can be used to determine the sign of the function over an entire interval.
The `sign_chart` function from `CalculusWithJulia` uses this to indicate where an *assumed* continuous function changes sign:
```{julia}
f(x) = sin(x + x^2) + x/2
sign_chart(f, -3, 3)
```
The intermediate value theorem can find the sign of the function *between* adjacent zeros, but how are the zeros identified?
Here, we use the Bolzano theorem to give an algorithm - the *bisection method* - to locate a value $c$ in $[a,b]$ with $f(c) = 0$ under the assumptions:
* $f$ is continuous on $[a,b]$
* $f$ changes sign between $a$ and $b$. (In particular, when $f(a)$ and $f(b)$ have different signs.)
::: {.callout-note}
#### Between
The bisection method is used to find a zero, $c$, of $f(x)$ *between* two values, $a$ and $b$. The method is guaranteed to work under assumptions, the most important being the continuous function having different signs at $a$ and $b$.
The bisection method is used to find a zero, $c$, of $f(x)$ *between* two values, $a$ and $b$. The method is guaranteed to work under the assumption of a continuous function having different signs at $a$ and $b$.
:::
@@ -155,7 +176,7 @@ caption = L"""
Illustration of the bisection method to find a zero of a function. At
each step the interval has $f(a)$ and $f(b)$ having opposite signs so
that the intermediate value theorem guaratees a zero.
that the intermediate value theorem guarantees a zero.
"""
@@ -231,7 +252,7 @@ sin(c)
(Even `1pi` itself is not a "zero" due to floating point issues.)
### The `find_zero` function.
### The `find_zero` function to solve `f(x) = 0`
The `Roots` package has a function `find_zero` that implements the bisection method when called as `find_zero(f, (a,b))` where $[a,b]$ is a bracket. Its use is similar to `simple_bisection` above. This package is loaded when `CalculusWithJulia` is. We illlustrate the usage of `find_zero` in the following:
@@ -241,8 +262,8 @@ The `Roots` package has a function `find_zero` that implements the bisection met
xstar = find_zero(sin, (3, 4))
```
:::{.callout-warning}
## Warning
:::{.callout-note}
## Action template
Notice, the call `find_zero(sin, (3, 4))` again fits the template `action(function, args...)` that we see repeatedly. The `find_zero` function can also be called through `fzero`. The use of `(3, 4)` to specify the interval is not necessary. For example `[3,4]` would work equally as well. (Anything where `extrema` is defined works.)
:::
@@ -294,7 +315,7 @@ find_zero(q, (5, 10))
::: {.callout-note}
### Between need not be near
Later, we will see more efficient algorithms to find a zero *near* a given guess. The bisection method finds a zero *between* two values of a bracketing interval. This interval need not be small. Indeed in many cases it can be infinite. For this particular problem, any interval like `(2,N)` will work as long as `N` is bigger than the zero and small enough that `q(N)` is finite *or* infinite *but* not `NaN`. (Basically, `q` must evaluate to a number with a sign. Here, the value of `q(Inf)` is `NaN` as it evaluates to the indeterminate `Inf - Inf`. But `q` is still not `NaN` for quite large numbers, such as `1e77`, as `x^4` can as big as `1e308` -- technically `floatmax(Float64)` -- and be finite.)
Later, we will see more efficient algorithms to find a zero *near* a given guess. The bisection method finds a zero *between* two values of a bracketing interval. This interval need not be small. Indeed in many cases it can be infinite. For this particular problem, any interval like `(2,N)` will work as long as `N` is bigger than the zero and small enough that `q(N)` is finite *or* infinite *but* not `NaN`. (Basically, `q` must evaluate to a number with a sign. Here, the value of `q(Inf)` is `NaN` as it evaluates to the indeterminate `Inf - Inf`. But `q` is still not `NaN` for quite large numbers, such as `1e77`, as `x^4` can as big as `1e308`---technically `floatmax(Float64)`---and be finite.)
:::
@@ -322,6 +343,10 @@ It appears (and a plot over $[0,1]$ verifies) that there is one zero between $-2
find_zero(x^3 - x + 1, (-2, -1))
```
#### The `find_zero` function to solve `f(x) = c`
Solving `f(x) = c` is related to solving `h(x) = 0`. The key is to make a new function using the difference of the two sides: `h(x) = f(x) - c`.
##### Example
Solve for a value of $x$ where `erfc(x)` is equal to `0.5`.
@@ -341,6 +366,40 @@ find_zero(h, (-Inf, Inf)) # as wide as possible in this case
```
##### Example: Inverse functions
If $f(x)$ is *monotonic* and *continuous* over an interval $[a,b]$ then it has an *inverse function*. That is for any $y$ between $f(a)$ and $f(b)$ we can find an $x$ satisfying $y = f(x)$ with $a \leq x \leq b$. This is due, of course, to both the intermediate value theorem (which guarantees an $x$) and monotonicity (which guarantees just one $x$).
To see how we can *numerically* find an inverse function using `find_zero`, we have this function:
```{julia}
function inverse_function(f, a, b, args...; kwargs...)
fa, fb = f(a), f(b)
m, M = fa < fb ? (fa, fb) : (fb, fa)
y -> begin
@assert m ≤ y ≤ M
find_zero(x ->f(x) - y, (a,b), args...; kwargs...)
end
end
```
The check on `fa < fb` is due to the possibility that $f$ is increasing (in which case `fa < fb`) or decreasing (in which case `fa > fb`).
To see this used, we consider the monotonic function $f(x) = x - \sin(x)$ over $[0, 5\pi]$. To graph, we have:
```{julia}
f(x) = x - sin(x)
a, b = 0, 5pi
plot(inverse_function(f, a, b), f(a), f(b); aspect_ratio=:equal)
```
(We plot over the range $[f(a), f(b)]$ here, as we can guess $f(x)$ is *increasing*.)
#### The `find_zero` function to solve `f(x) = g(x)`
Solving `f(x) = g(x)` is related to solving `h(x) = 0`. The key is to make a new function using the difference of the two sides: `h(x) = f(x) - g(x)`.
##### Example
@@ -375,41 +434,11 @@ For symbolic expressions, as below, then, as a convenience, an equation (formed
```{julia}
@syms x
solve(cos(x) ~ x, (0, 2))
find_zero(cos(x) ~ x, (0, 2))
```
:::
[![Intersection of two curves as illustrated by Canadian artist Kapwani Kiwanga.](figures/intersection-biennale.jpg)](https://www.gallery.ca/whats-on/touring-exhibitions-and-loans/around-the-world/canada-pavilion-at-the-venice-biennale/kapwani-kiwanga-trinket)
##### Example: Inverse functions
If $f(x)$ is *monotonic* and *continuous* over an interval $[a,b]$ then it has an *inverse function*. That is for any $y$ between $f(a)$ and $f(b)$ we can find an $x$ satisfying $y = f(x)$ with $a \leq x \leq b$. This is due, of course, to both the intermediate value theorem (which guarantees an $x$) and monotonicity (which guarantees just one $x$).
To see how we can *numerically* find an inverse function using `find_zero`, we have this function:
```{julia}
function inverse_function(f, a, b, args...; kwargs...)
fa, fb = f(a), f(b)
m, M = fa < fb ? (fa, fb) : (fb, fa)
y -> begin
@assert m ≤ y ≤ M
find_zero(x ->f(x) - y, (a,b), args...; kwargs...)
end
end
```
The check on `fa < fb` is due to the possibility that $f$ is increasing (in which case `fa < fb`) or decreasing (in which case `fa > fb`).
To see this used, we consider the monotonic function $f(x) = x - \sin(x)$ over $[0, 5\pi]$. To graph, we have:
```{julia}
f(x) = x - sin(x)
a, b = 0, 5pi
plot(inverse_function(f, a, b), f(a), f(b); aspect_ratio=:equal)
```
(We plot over the range $[f(a), f(b)]$ here, as we can guess $f(x)$ is *increasing*.)
[![Intersection of two curves as illustrated by Canadian artist Kapwani Kiwanga.](figures/intersection-biennale.jpg)](https://www.gallery.ca/whats-on/touring-exhibitions-and-loans/around-the-world/canada-pavilion-at-the-venice-biennale/kapwani-kiwanga-trinket){width=40%}
##### Example
@@ -496,11 +525,12 @@ For the model without wind resistance, we can graph the function easily enough.
plot(j, 0, 500)
```
Well, we haven't even seen the peak yet. Better to do a little spade work first. This is a quadratic function, so we can use `roots` from `SymPy` to find the roots:
Well, we haven't even seen the peak yet. Better to do a little spade work first. This is a quadratic function, so we can use `solve` from `SymPy` to find the roots:
```{julia}
roots(j(x))
@syms x
solve(j(x) ~ 0, x)
```
We see that $1250$ is the largest root. So we plot over this domain to visualize the flight:
@@ -604,7 +634,7 @@ Geometry will tell us that $\cos(x) - x/p$ for *one* $x$ in $[0, \pi/2]$ wheneve
#| hold: true
f(x, p=1) = cos(x) - x/p
I = (0, pi/2)
find_zero(f, I), find_zero(f, I, p=2)
find_zero(f, I), find_zero(f, I; p=2)
```
The second number is the solution when `p=2`.
@@ -677,7 +707,7 @@ f.(zs)
The `find_zero` function in the `Roots` package is an interface to one of several methods. For now we focus on the *bracketing* methods, later we will see others. Bracketing methods, among others, include `Roots.Bisection()`, the basic bisection method though with a different sense of "middle" than $(a+b)/2$ and used by default above; `Roots.A42()`, which will typically converge much faster than simple bisection; `Roots.Brent()` for the classic method of Brent, and `FalsePosition()` for a family of *regula falsi* methods. These can all be used by specifying the method in a call to `find_zero`.
Alternatively, `Roots` implements the `CommonSolve` interface popularized by its use in the `DifferentialEquations.jl` ecosystem, a wildly successful area for `Julia`. The basic setup involves two steps: setup a "problem;" solve the problem.
Alternatively, `Roots` implements the `CommonSolve` interface popularized by its use in the `DifferentialEquations.jl` ecosystem, a wildly successful area for `Julia`. The basic setup involves two steps: setup a "problem"; solve the problem.
To set up a problem we call `ZeroProblem` with the function and an initial interval, as in:
@@ -706,22 +736,25 @@ The Extreme Value Theorem is another consequence of continuity.
To discuss the extreme value theorem, we define an *absolute maximum*.
::: {.callout-note icon=false}
## Absolute maximum, absolute minimum
> The absolute maximum of $f(x)$ over an interval $I$, when it exists, is the value $f(c)$, $c$ in $I$, where $f(x) \leq f(c)$ for any $x$ in $I$.
>
> Similarly, an *absolute minimum* of $f(x)$ over an interval $I$ can be defined, when it exists, by a value $f(c)$ where $c$ is in $I$ *and* $f(c) \leq f(x)$ for any $x$ in $I$.
The absolute maximum of $f(x)$ over an interval $I$, when it exists, is the value $f(c)$, $c$ in $I$, where $f(x) \leq f(c)$ for any $x$ in $I$.
Similarly, an *absolute minimum* of $f(x)$ over an interval $I$ can be defined, when it exists, by a value $f(c)$ where $c$ is in $I$ *and* $f(c) \leq f(x)$ for any $x$ in $I$.
:::
Related but different is the concept of a relative of *local extrema*:
::: {.callout-note icon=false}
## Local maximum, local minimum
> A local maxima for $f$ is a value $f(c)$ where $c$ is in **some** *open* interval $I=(a,b)$, $I$ in the domain of $f$, and $f(c)$ is an absolute maxima for $f$ over $I$. Similarly, an local minima for $f$ is a value $f(c)$ where $c$ is in **some** *open* interval $I=(a,b)$, $I$ in the domain of $f$, and $f(x)$ is an absolute minima for $f$ over $I$.
A local maxima for $f$ is a value $f(c)$ where $c$ is in **some** *open* interval $I=(a,b)$, $I$ in the domain of $f$, and $f(c)$ is an absolute maxima for $f$ over $I$. Similarly, an local minima for $f$ is a value $f(c)$ where $c$ is in **some** *open* interval $I=(a,b)$, $I$ in the domain of $f$, and $f(x)$ is an absolute minima for $f$ over $I$.
The term *local extrema* is used to describe either a local maximum or local minimum.
:::
The key point, is the extrema are values in the *range* that are realized by some value in the *domain* (possibly more than one.)
@@ -742,14 +775,84 @@ nothing
```
[![Elevation profile of the Hardrock 100 ultramarathon. Treating the elevation profile as a function, the absolute maximum is just about 14,000 feet and the absolute minimum about 7600 feet. These are of interest to the runner for different reasons. Also of interest would be each local maxima and local minima - the peaks and valleys of the graph - and the total elevation climbed - the latter so important/unforgettable its value makes it into the chart's title.
](limits/figures/hardrock-100.jpeg)](https://hardrock100.com)
](figures/hardrock-100.jpeg)](https://hardrock100.com){width=50%}
This figure shows the two concepts as well.
::: {#fig-absolute-relative}
```{julia}
#| echo: false
plt = let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
p(x) = (x-1)*(x-2)*(x-3)*(x-4) + x/2 + 2
a, b = 0.25, 4.5
z₁, z₂, z₃ = zs = find_zeros(x -> ForwardDiff.derivative(p,x), (a, b))
a = -0.0
plot(; empty_style...)
plot!(p, a, b; line=(:black, 2))
plot!([a,b+0.25], [0,0]; axis_style...)
plot!([a,a] .+ .1, [-1, p(0)]; axis_style...)
δ = .5
ts = range(0, 2pi, 100)
for z in zs
plot!([z-δ,z+δ],[p(z),p(z)]; line=(:black, 1))
C = Shape(z .+ 0.03 * sin.(ts), p(z) .+ 0.3 * cos.(ts))
plot!(C; fill=(:periwinkle, 1), line=(:black, 1))
end
for z in (a,b)
C = Shape(z .+ 0.03 * sin.(ts), p(z) .+ 0.3 * cos.(ts))
plot!(C; fill=(:black, 1), line=(:black, 1))
end
κ = 0.33
annotate!([
(a, 0, text(L"a", :top)),
(b,0, text(L"b", :top)),
(a + κ/5, p(a), text(raw"absolute max", 10, :left)),
(z₁, p(z₁)-κ, text(raw"absolute min", 10, :top)),
(z₂, p(z₂) + κ, text(raw"relative max", 10, :bottom)),
(z₃, p(z₃) - κ, text(raw"relative min", 10, :top)),
(b, p(b) + κ, text(raw"endpoint", 10, :bottom))
])
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Figure illustrating absolute and relative minima for a function $f(x)$ over $I=[a,b]$. The leftmost point has a $y$ value, $f(a)$, which is an absolute maximum of $f(x)$ over $I$. The three points highlighted between $a$ and $b$ are all relative extrema. The first one is *also* the absolute minimum over $I$. The endpoint is not considered a relative maximum for technical reasons---there is no interval around $b$, it being on the boundary of $I$.
:::
The extreme value theorem discusses an assumption that ensures absolute maximum and absolute minimum values exist.
::: {.callout-note icon=false}
## The extreme value theorem
> The *extreme value theorem*: If $f(x)$ is continuous over a closed interval $[a,b]$ then $f$ has an absolute maximum and an absolute minimum over $[a,b]$.
If $f(x)$ is continuous over a closed interval $[a,b]$ then $f$ has an absolute maximum and an absolute minimum over $[a,b]$.
:::
(By continuous over $[a,b]$ we mean continuous on $(a,b)$ and right continuous at $a$ and left continuous at $b$.)
@@ -769,7 +872,7 @@ The function $f(x) = \sqrt{1-x^2}$ is continuous on the interval $[-1,1]$ (in th
##### Example
The function $f(x) = x \cdot e^{-x}$ on the closed interval $[0, 5]$ is continuous. Hence it has an absolute maximum, which a graph shows to be $0.4$. It has an absolute minimum, clearly the value $0$ occurring at the endpoint.
The function $f(x) = x \cdot e^{-x}$ on the closed interval $[0, 5]$ is continuous. Hence it has an absolute maximum, which a graph shows to be about $0.4$ and occurring near $x=1$. It has an absolute minimum, clearly the value $0$ occurring at the endpoint.
```{julia}
@@ -806,7 +909,7 @@ A New York Times [article](https://www.nytimes.com/2016/07/30/world/europe/norwa
## Continuity and closed and open sets
We comment on two implications of continuity that can be generalized to more general settings.
We comment on two implications of continuity that can be generalized.
The two intervals $(a,b)$ and $[a,b]$ differ as the latter includes the endpoints. The extreme value theorem shows this distinction can make a big difference in what can be said regarding *images* of such interval.
@@ -1013,9 +1116,9 @@ nothing
```
![Trajectories of potential cannonball fires with air-resistance included. (http://ej.iop.org/images/0143-0807/33/1/149/Full/ejp405251f1_online.jpg)
](figures/cannonball.jpg)
](figures/cannonball.jpg){width=50%}
In 1638, according to Amir D. [Aczel](http://books.google.com/books?id=kvGt2OlUnQ4C&pg=PA28&lpg=PA28&dq=mersenne+cannon+ball+tests&source=bl&ots=wEUd7e0jFk&sig=LpFuPoUvODzJdaoug4CJsIGZZHw&hl=en&sa=X&ei=KUGcU6OAKJCfyASnioCoBA&ved=0CCEQ6AEwAA#v=onepage&q=mersenne%20cannon%20ball%20tests&f=false), an experiment was performed in the French Countryside. A monk, Marin Mersenne, launched a cannonball straight up into the air in an attempt to help Descartes prove facts about the rotation of the earth. Though the experiment was not successful, Mersenne later observed that the time for the cannonball to go up was greater than the time to come down. ["Vertical Projection in a Resisting Medium: Reflections on Observations of Mersenne".](http://www.maa.org/publications/periodicals/american-mathematical-monthly/american-mathematical-monthly-contents-junejuly-2014)
In 1638, according to Amir D. [Aczel](http://books.google.com/books?id=kvGt2OlUnQ4C&pg=PA28&lpg=PA28&dq=mersenne+cannon+ball+tests&source=bl&ots=wEUd7e0jFk&sig=LpFuPoUvODzJdaoug4CJsIGZZHw&hl=en&sa=X&ei=KUGcU6OAKJCfyASnioCoBA&ved=0CCEQ6AEwAA#v=onepage&q=mersenne%20cannon%20ball%20tests&f=false), an experiment was performed in the French Countryside. A monk, Marin Mersenne, launched a cannonball straight up into the air in an attempt to help Descartes prove facts about the rotation of the earth. Though the experiment was not successful, Mersenne later observed that the time for the cannonball to go up was less than the time to come down. ["Vertical Projection in a Resisting Medium: Reflections on Observations of Mersenne".](http://www.maa.org/publications/periodicals/american-mathematical-monthly/american-mathematical-monthly-contents-junejuly-2014)
This isn't the case for simple ballistic motion where the time to go up is equal to the time to come down. We can "prove" this numerically. For simple ballistic motion:
@@ -1132,7 +1235,7 @@ radioq(choices, answ, keep_order=true)
###### Question
The extreme value theorem has two assumptions: a continuous function and a *closed* interval. Which of the following examples fails to satisfy the consequence of the extreme value theorem because the function is not continuous?
The extreme value theorem has two assumptions: a continuous function and a *closed* interval. Which of the following examples fails to satisfy the consequence of the extreme value theorem because the function is defined on $I$ but is not continuous on $I$?
```{julia}
@@ -1147,6 +1250,170 @@ answ = 4
radioq(choices, answ, keep_order=true)
```
###### Question
The extreme value theorem is true when $f$ is a continuous function on an interval $I$ *and* $I=[a,b]$ is a *closed* interval. Which of these illustrates why it doesn't apply as $f$ is not continuous on $I$ but is defined on $I$?
```{julia}
#| hold: true
#| echo: false
let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
ts = range(0, 2pi, 100)
# defined on I; not continuous on I
p1 = plot(;empty_style..., aspect_ratio=:equal)
title!(p1, "(a)")
plot!(p1, x -> 1 - abs(2x), -1, 1, color=:black)
plot!(p1, zero; line=(:black, 1), arrow=true, side=:head)
C = Shape(0.03 .* sin.(ts), 1 .+ 0.03 .* cos.(ts))
plot!(p1, C, fill=(:white, 1), line=(:black,1))
C = Shape(0.03 .* sin.(ts), - 0.25 .+ 0.03 .* cos.(ts))
plot!(p1, C, fill=(:black,1))
annotate!(p1, [
(-1,0,text(L"a", :top)),
(1,0,text(L"b", :top))
])
# not defined on I
p2 = plot(;empty_style...)
title!(p2, "(b)")
plot!(p2, x -> 1/(1-x), 0, .95, color=:black)
plot!(p2, x-> -1/(1-x), 1.05, 2, color=:black)
plot!(p2, zero; axis_style...)
annotate!(p2,[
(0,0,text(L"a", :top)),
(2, 0, text(L"b", :top))
])
# not continuous on I
p3 = plot(;empty_style...)
title!(p3, "(c)")
plot!(p3, x -> 1/(1-x), 0, .95, color=:black)
ylims!((-0.25, 1/(1 - 0.96)))
plot!(p3, [0,1.05],[0,0]; axis_style...)
vline!(p3, [1]; line=(:black, 1, :dash))
annotate!(p3,[
(0,0,text(L"a", :top)),
(1, 0, text(L"b", :top))
])
# continuous
p4 = plot(;empty_style...)
title!(p4, "(d)")
f(x) = x^x
a, b = 0, 2
plot!(p4, f, a, b; line=(:black,1))
ylims!(p4, (-.25, f(b)))
plot!(p4, [a-.1, b+.1], [0,0]; axis_style...)
scatter!([0,2],[ f(0),f(2)]; marker=(:circle,:black))
annotate!([
(a, 0, text(L"a", :top)),
(b, 0, text(L"b", :top))
])
l = @layout[a b; c d]
p = plot(p1, p2, p3, p4, layout=l)
imgfile = tempname() * ".png"
savefig(p, imgfile)
hotspotq(imgfile, (0,1/2), (1/2,1))
end
```
The extreme value theorem is true when $f$ is a continuous function on an interval $I$ and $I=[a,b]$ is a *closed* interval. Which of these illustrates when the theorem's assumptions are true?
```{julia}
#| hold: true
#| echo: false
## come on; save this figure...
let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
ts = range(0, 2pi, 100)
# defined on I; not continuous on I
p1 = plot(;empty_style..., aspect_ratio=:equal)
title!(p1, "(a)")
plot!(p1, x -> 1 - abs(2x), -1, 1, color=:black)
plot!(p1, zero; line=(:black, 1), arrow=true, side=:head)
C = Shape(0.03 .* sin.(ts), 1 .+ 0.03 .* cos.(ts))
plot!(p1, C, fill=(:white, 1), line=(:black,1))
C = Shape(0.03 .* sin.(ts), - 0.25 .+ 0.03 .* cos.(ts))
plot!(p1, C, fill=(:black,1))
annotate!(p1, [
(-1,0,text(L"a", :top)),
(1,0,text(L"b", :top))
])
# not defined on I
p2 = plot(;empty_style...)
title!(p2, "(b)")
plot!(p2, x -> 1/(1-x), 0, .95, color=:black)
plot!(p2, x-> -1/(1-x), 1.05, 2, color=:black)
plot!(p2, zero; axis_style...)
annotate!(p2,[
(0,0,text(L"a", :top)),
(2, 0, text(L"b", :top))
])
# not continuous on I
p3 = plot(;empty_style...)
title!(p3, "(c)")
plot!(p3, x -> 1/(1-x), 0, .95, color=:black)
ylims!((-0.1, 1/(1 - 0.96)))
plot!(p3, [0,1.05],[0,0]; axis_style...)
vline!(p3, [1]; line=(:black, 1, :dash))
annotate!(p3,[
(0,0,text(L"a", :top)),
(1, 0, text(L"b", :top))
])
# continuous
p4 = plot(;empty_style...)
title!(p4, "(d)")
f(x) = x^x
a, b = 0, 2
ylims!(p4, (-.25, f(b)))
plot!(p4, f, a, b; line=(:black,1))
plot!(p4, [a-.1, b+.1], [0,0]; axis_style...)
scatter!([0,2],[ f(0),f(2)]; marker=(:circle,:black))
annotate!([
(a, 0, text(L"a", :top)),
(b, 0, text(L"b", :top))
])
l = @layout[a b; c d]
p = plot(p1, p2, p3, p4, layout=l)
imgfile = tempname() * ".png"
savefig(p, imgfile)
hotspotq(imgfile, (1/2,1), (0,1/2))
end
```
```{julia}
#| echo: false
plotly();
```
###### Question
@@ -1238,28 +1505,3 @@ The zeros of the equation $\cos(x) \cdot \cosh(x) = 1$ are related to vibrations
val = maximum(find_zeros(x -> cos(x) * cosh(x) - 1, (0, 6pi)))
numericq(val)
```
###### Question
A parametric equation is specified by a parameterization $(f(t), g(t)), a \leq t \leq b$. The parameterization will be continuous if and only if each function is continuous.
Suppose $k_x$ and $k_y$ are positive integers and $a, b$ are positive numbers, will the [Lissajous](https://en.wikipedia.org/wiki/Parametric_equation#Lissajous_Curve) curve given by $(a\cos(k_x t), b\sin(k_y t))$ be continuous?
```{julia}
#| hold: true
#| echo: false
yesnoq(true)
```
Here is a sample graph for $a=1, b=2, k_x=3, k_y=4$:
```{julia}
#| hold: true
a,b = 1, 2
k_x, k_y = 3, 4
plot(t -> a * cos(k_x *t), t-> b * sin(k_y * t), 0, 4pi)
```

View File

@@ -18,7 +18,7 @@ using SymPy # for symbolic limits
---
An historic problem in the history of math was to find the area under the graph of $f(x)=x^2$ between $[0,1]$.
A historic problem in the history of math was to find the area under the graph of $f(x)=x^2$ between $[0,1]$.
There wasn't a ready-made formula for the area of this shape, as was known for a triangle or a square. However, [Archimedes](http://en.wikipedia.org/wiki/The_Quadrature_of_the_Parabola) found a method to compute areas enclosed by a parabola and line segments that cross the parabola.
@@ -36,27 +36,38 @@ colors = [:black, :blue, :orange, :red, :green, :orange, :purple]
## Area of parabola
function make_triangle_graph(n)
title = "Area of parabolic cup ..."
n==1 && (title = "\${Area = }1/2\$")
n==2 && (title = "\${Area = previous }+ 1/8\$")
n==3 && (title = "\${Area = previous }+ 2\\cdot(1/8)^2\$")
n==4 && (title = "\${Area = previous }+ 4\\cdot(1/8)^3\$")
n==5 && (title = "\${Area = previous }+ 8\\cdot(1/8)^4\$")
n==6 && (title = "\${Area = previous }+ 16\\cdot(1/8)^5\$")
n==7 && (title = "\${Area = previous }+ 32\\cdot(1/8)^6\$")
n==1 && (title = L"Area $= 1/2$")
n==2 && (title = L"Area $=$ previous $+\; \frac{1}{8}$")
n==3 && (title = L"Area $=$ previous $+\; 2\cdot\frac{1}{8^2}$")
n==4 && (title = L"Area $=$ previous $+\; 4\cdot\frac{1}{8^3}$")
n==5 && (title = L"Area $=$ previous $+\; 8\cdot\frac{1}{8^4}$")
n==6 && (title = L"Area $=$ previous $+\; 16\cdot\frac{1}{8^5}$")
n==7 && (title = L"Area $=$ previous $+\; 32\cdot\frac{1}{8^6}$")
plt = plot(f, 0, 1, legend=false, size = fig_size, linewidth=2)
annotate!(plt, [(0.05, 0.9, text(title,:left))]) # if in title, it grows funny with gr
n >= 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1], color=colors[1], linetype=:polygon, fill=colors[1], alpha=.2)
n == 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1], color=colors[1], linewidth=2)
plt = plot(f, 0, 1;
legend=false,
size = fig_size,
linewidth=2)
annotate!(plt, [
(0.05, 0.9, text(title,:left))
]) # if in title, it grows funny with gr
n >= 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1];
color=colors[1], linetype=:polygon,
fill=colors[1], alpha=.2)
n == 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1];
color=colors[1], linewidth=2)
for k in 2:n
xs = range(0,stop=1, length=1+2^(k-1))
ys = map(f, xs)
k < n && plot!(plt, xs, ys, linetype=:polygon, fill=:black, alpha=.2)
xs = range(0, stop=1, length=1+2^(k-1))
ys = f.(xs)
k < n && plot!(plt, xs, ys;
linetype=:polygon, fill=:black, alpha=.2)
if k == n
plot!(plt, xs, ys, color=colors[k], linetype=:polygon, fill=:black, alpha=.2)
plot!(plt, xs, ys, color=:black, linewidth=2)
plot!(plt, xs, ys;
color=colors[k], linetype=:polygon, fill=:black, alpha=.2)
plot!(plt, xs, ys;
color=:black, linewidth=2)
end
end
plt
@@ -161,7 +172,7 @@ for (x, y, n, col) ∈ zip(xs, ys, ns, (blue, green, purple, red))
end
caption = L"""
The ratio of the circumference of a circle to its diameter, $\pi$, can be approximated from above and below by computing the perimeters of the inscribed $n$-gons. Archimedes computed the perimeters for $n$ being $12$, $24$, $48$, and $96$ to determine that $3~1/7 \leq \pi \leq 3~10/71$.
The ratio of the circumference of a circle to its diameter, $\pi$, can be approximated from above and below by computing the perimeters of the inscribed $n$-gons. Archimedes computed the perimeters for $n$ being $12$, $24$, $48$, and $96$ to determine that $3~10/71 \leq \pi \leq 3~1/7$.
"""
plotly()
ImageFile(p, caption)
@@ -181,9 +192,12 @@ Informally, if a limit exists it is the value that $f(x)$ gets close to as $x$ g
The modern formulation is due to Weierstrass:
::: {.callout-note icon=false}
## The $\epsilon-\delta$ Definition of a limit of $f(x)$
> The limit of $f(x)$ as $x$ approaches $c$ is $L$ if for every real $\epsilon > 0$, there exists a real $\delta > 0$ such that for all real $x$, $0 < \lvert x c \rvert < \delta$ implies $\lvert f(x) L \rvert < \epsilon$. The notation used is $\lim_{x \rightarrow c}f(x) = L$.
The limit of $f(x)$ as $x$ approaches $c$ is $L$ if for every real $\epsilon > 0$, there exists a real $\delta > 0$ such that for all real $x$, $0 < \lvert x c \rvert < \delta$ implies $\lvert f(x) L \rvert < \epsilon$. The notation used is $\lim_{x \rightarrow c}f(x) = L$.
:::
We comment on this later.
@@ -215,35 +229,64 @@ This bounds the expression $\sin(x)/x$ between $1$ and $\cos(x)$ and as $x$ gets
The above bound comes from this figure, for small $x > 0$:
::: {#fig-sin-cos-bound}
```{julia}
#| hold: true
#| echo: false
gr()
p = plot(x -> sqrt(1 - x^2), 0, 1, legend=false, aspect_ratio=:equal,
linewidth=3, color=:black)
θ = π/6
y,x = sincos(θ)
col=RGBA(0.0,0.0,1.0, 0.25)
plot!(range(0,x, length=2), zero, fillrange=u->y/x*u, color=col)
plot!(range(x, 1, length=50), zero, fillrange = u -> sqrt(1 - u^2), color=col)
plot!([x,x],[0,y], linestyle=:dash, linewidth=3, color=:black)
plot!([x,1],[y,0], linestyle=:dot, linewidth=3, color=:black)
plot!([1,1], [0,y/x], linewidth=3, color=:black)
plot!([0,1], [0,y/x], linewidth=3, color=:black)
plot!([0,1], [0,0], linewidth=3, color=:black)
Δ = 0.05
annotate!([(0,0+Δ,"A"), (x-Δ,y+Δ/4, "B"), (1+Δ/2,y/x, "C"),
(1+Δ/2,0+Δ/2,"D")])
annotate!([(.2*cos(θ/2), 0.2*sin(θ/2), "θ")])
imgfile = tempname() * ".png"
savefig(p, imgfile)
caption = "Triangle ``ABD`` has less area than the shaded wedge, which has less area than triangle ``ACD``. Their respective areas are ``(1/2)\\sin(\\theta)``, ``(1/2)\\theta``, and ``(1/2)\\tan(\\theta)``. The inequality used to show ``\\sin(x)/x`` is bounded below by ``\\cos(x)`` and above by ``1`` comes from a division by ``(1/2) \\sin(x)`` and taking reciprocals.
"
plotly()
ImageFile(imgfile, caption)
plt = let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35), line=nothing)
range_style = (; fill=(:blue, 0.35), line=nothing)
plot(; empty_style..., aspect_ratio=:equal)
plot!(x -> sqrt(1 - x^2), 0, 1; line=(:black, 2))
θ = π/6
y,x = sincos(θ)
col=RGBA(0.0,0.0,1.0, 0.25)
plot!(range(0,x, length=2), zero, fillrange=u->y/x*u, color=col)
plot!(range(x, 1, length=50), zero, fillrange = u -> sqrt(1 - u^2), color=col)
plot!([x,x],[0,y], line=(:dash, 2, :black))
plot!([x,1],[y,0], line=(:dot, 2, :black))
plot!([1,1], [0,y/x], line=(2, :black))
plot!([0,1], [0,y/x], line=(2, :black))
plot!([0,1], [0,0], line=(2, :black))
Δ = 0.05
annotate!([(0,0+Δ, text(L"A", 10)),
(x-Δ,y+Δ/4, text(L"B",10)),
(1+Δ/2,y/x, text(L"C", 10)),
(1+Δ/2,0+Δ/2, text(L"D", 10)),
(0.2*cos(θ/2), 0.2*sin(θ/2), text(L"\theta", 12))
])
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Triangle $\triangle ABD$ has less area than the shaded wedge, which has less area than triangle $\triangle ACD$. Their respective areas are $(1/2)\sin(\theta)$, $(1/2)\theta$, and $(1/2)\tan(\theta)$. The inequality used to show $\sin(x)/x$ is bounded below by $\cos(x)$ and above by $1$ comes from a division by $(1/2) \sin(x)$ and taking reciprocals.
:::
To discuss the case of $(1+x)^{1/x}$ it proved convenient to assume $x = 1/m$ for integer values of $m$. At the time of Cauchy, log tables were available to identify the approximate value of the limit. Cauchy computed the following value from logarithm tables:
@@ -266,14 +309,14 @@ xs = [1/10^i for i in 1:5]
This progression can be seen to be increasing. Cauchy, in his treatise, can see this through:
$$
\begin{align*}
(1 + \frac{1}{m})^n &= 1 + \frac{1}{1} + \frac{1}{1\cdot 2}(1 - \frac{1}{m}) + \\
(1 + \frac{1}{m})^m &= 1 + \frac{1}{1} + \frac{1}{1\cdot 2}(1 - \frac{1}{m}) + \\
& \frac{1}{1\cdot 2\cdot 3}(1 - \frac{1}{m})(1 - \frac{2}{m}) + \cdots \\
&+
\frac{1}{1 \cdot 2 \cdot \cdots \cdot m}(1 - \frac{1}{m}) \cdot \cdots \cdot (1 - \frac{m-1}{m}).
\end{align*}
$$
These values are clearly increasing as $m$ increases. Cauchy showed the value was bounded between $2$ and $3$ and had the approximate value above. Then he showed the restriction to integers was not necessary. Later we will use this definition for the exponential function:
@@ -597,6 +640,7 @@ Hmm, the values in `ys` appear to be going to $0.5$, but then end up at $0$. Is
```{julia}
xs = [1/10^i for i in 1:8]
y1s = [1 - cos(x) for x in xs]
y2s = [x^2 for x in xs]
[xs y1s y2s]
@@ -645,7 +689,7 @@ c = 15/11
lim(h, c; n = 16)
```
(Though the graph and table do hint at something a bit odd -- the graph shows a blip, the table doesn't show values in the second column going towards a specific value.)
(Though the graph and table do hint at something a bit odd---the graph shows a blip, the table doesn't show values in the second column going towards a specific value.)
However the limit in this case is $-\infty$ (or DNE), as there is an aysmptote at $c=15/11$. The problem is the asymptote due to the logarithm is extremely narrow and happens between floating point values to the left and right of $15/11$.
@@ -722,7 +766,7 @@ For example, the limit at $0$ of $(1-\cos(x))/x^2$ is easily handled:
limit((1 - cos(x)) / x^2, x => 0)
```
The pair notation (`x => 0`) is used to indicate the variable and the value it is going to. A `dir` argument is used to indicate ``x \rightarrow c+`` (the default), ``x \rightarrow c-``, and ``x \rightarrow c``.
The pair notation (`x => 0`) is used to indicate the variable and the value it is going to. A `dir` argument is used to indicate $x \rightarrow c+$ (the default, or `dir="+"`), $x \rightarrow c-$ (`dir="-"`), and $x \rightarrow c$ (`dir="+-"`).
##### Example
@@ -856,7 +900,7 @@ This accurately shows the limit does not exist mathematically, but `limit(ceil(x
The `limit` function doesn't compute limits from the definition, rather it applies some known facts about functions within a set of rules. Some of these rules are the following. Suppose the individual limits of $f$ and $g$ always exist (and are finite) below.
$$
\begin{align*}
\lim_{x \rightarrow c} (a \cdot f(x) + b \cdot g(x)) &= a \cdot
\lim_{x \rightarrow c} f(x) + b \cdot \lim_{x \rightarrow c} g(x)
@@ -870,7 +914,7 @@ The `limit` function doesn't compute limits from the definition, rather it appli
\frac{\lim_{x \rightarrow c} f(x)}{\lim_{x \rightarrow c} g(x)}
&(\text{provided }\lim_{x \rightarrow c} g(x) \neq 0)\\
\end{align*}
$$
These are verbally described as follows, when the individual limits exist and are finite then:
@@ -920,7 +964,7 @@ $$
This is clearly related to the function $f(x) = \sin(x)/x$, which has a limit of $1$ as $x \rightarrow 0$. We see $g(x) = k f(kx)$ is the limit in question. As $kx \rightarrow 0$, though not taking a value of $0$ except when $x=0$, the limit above is $k \lim_{x \rightarrow 0} f(kx) = k \lim_{u \rightarrow 0} f(u) = k$.
Basically when taking a limit as $x$ goes to $0$ we can multiply $x$ by any constant and figure out the limit for that. (It is as though we "go to" $0$ faster or slower. but are still going to $0$.
Basically when taking a limit as $x$ goes to $0$ we can multiply $x$ by any constant and figure out the limit for that. (It is as though we "go to" $0$ faster or slower, but are still going to $0$.)
Similarly,
@@ -1008,18 +1052,34 @@ $$
Why? We can express the function $e^{\csc(x)}/e^{\cot(x)}$ as the above function plus the polynomial $1 + x/2 + x^2/8$. The above is then the sum of two functions whose limits exist and are finite, hence, we can conclude that $M = 0 + 1$.
### The [squeeze](http://en.wikipedia.org/wiki/Squeeze_theorem) theorem
### The squeeze theorem
Sometimes limits can be found by bounding more complicated functions by easier functions.
::: {.callout-note icon=false}
## The [squeeze theorem](http://en.wikipedia.org/wiki/Squeeze_theorem)
Fix $c$ in $I=(a,b)$. Suppose for all $x$ in $I$, except possibly $c$, there are two functions $l$ and $u$, satisfying:
We note one more limit law. Suppose we wish to compute $\lim_{x \rightarrow c}f(x)$ and we have two other functions, $l$ and $u$, satisfying:
* $l(x) \leq f(x) \leq u(x)$.
* These limits exist and are equal:
$$
L = \lim_{x \rightarrow c} l(x) = \lim_{x \rightarrow c} u(x).
$$
* for all $x$ near $c$ (possibly not including $c$) $l(x) \leq f(x) \leq u(x)$.
* These limits exist and are equal: $L = \lim_{x \rightarrow c} l(x) = \lim_{x \rightarrow c} u(x)$.
Then
$$
\lim_{x\rightarrow c} f(x) = L.
$$
Then the limit of $f$ must also be $L$.
:::
The figure shows a usage of the squeeze theorem to show $\sin(x)/x \rightarrow 1$ as $\cos(x) \leq \sin(x)x \leq 1$ for $x$ close to $0$.
```{julia}
#| hold: true
@@ -1055,8 +1115,71 @@ ImageFile(imgfile, caption)
The formal definition of a limit involves clarifying what it means for $f(x)$ to be "close to $L$" when $x$ is "close to $c$". These are quantified by the inequalities $0 < \lvert x-c\rvert < \delta$ and the $\lvert f(x) - L\rvert < \epsilon$. The second does not have the restriction that it is greater than $0$, as indeed $f(x)$ can equal $L$. The order is important: it says for any idea of close for $f(x)$ to $L$, an idea of close must be found for $x$ to $c$.
The key is identifying a value for $\delta$ for a given value of $\epsilon$.
::: {#fig-limit-e-d}
```{julia}
#| echo: false
plt = let
gr()
f(x) = (x+1)^2 -1
a, b = -1/4, 2
c = 1
L = f(c)
δ = 0.2
ϵ = 3sqrt(δ)
plot(; empty_style...)#, aspect_ratio=:equal)
plot!(f, a, b; line=(:black, 2))
plot!([a,b],[0,0]; axis_style...)
plot!([0,0], [f(a), f(2)]; axis_style...)
plot!([c, c, 0], [0, f(c), f(c)]; line=(:black, 1, :dash))
plot!([c-δ, c-δ, 0], [0, f(c-δ), f(c-δ)]; line=(:black, 1, :dashdot))
plot!([c+δ, c+δ, 0], [0, f(c+δ), f(c+δ)]; line=(:black, 1, :dashdot))
S = Shape([0,b,b,0],[L-ϵ,L-ϵ,L+ϵ,L+ϵ])
plot!(S; fill=(:lightblue, 0.25), line=(nothing,))
domain_color=:red
range_color=:blue
S = Shape([c-δ, c+δ, c+δ, c-δ], 0.05*[-1,-1,1,1])
plot!(S, fill=(domain_color,0.4), line=nothing)
m,M = f(c-δ), f(c+δ)
T = Shape(0.015 * [-1,1,1,-1], [m,m,M,M])
plot!(T, fill=(range_color, 0.4), line=nothing)
C = Plots.scale(Shape(:circle), 0.02, 0.1)
plot!(Plots.translate(C, c, L), fill=(:white,1,0), line=(:black, 1))
plot!(Plots.translate(C, c, 0), fill=(:white,1,0), line=(domain_color, 1))
plot!(Plots.translate(C, 0, L), fill=(:white,1,0), line=(range_color, 1))
annotate!([
(c, 0, text(L"c", :top)),
(c-δ, 0, text(L"c - \delta", 10, :top)),
(c+δ, 0, text(L"c + \delta", 10, :top)),
(0, L, text(L"L", :right)),
(0, L+ϵ, text(L"L + \epsilon", 10, :right)),
(0, L-ϵ, text(L"L - \epsilon", 10, :right)),
])
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Figure illustrating requirements of $\epsilon-\delta$ definition of the limit. The image (shaded red on $y$ axis) of the $x$ within $\delta$ of $c$ (except for $c$ and shaded blue on the $x$ axis) must stay within the bounds of $L-\epsilon$ and $L+ \epsilon$, where $\delta$ may be chosen based on $\epsilon$ but needs to be chosen for every positive $\epsilon$, not just a fixed one as in this figure.
:::
A simple case is the linear case. Consider the function $f(x) = 3x + 2$. Verify that the limit at $c=1$ is $5$.
@@ -1084,64 +1207,78 @@ These lines produce a random $\epsilon$, the resulting $\delta$, and then verify
(The random numbers are technically in $[0,1)$, so in theory `epsilon` could be `0`. So the above approach would be more solid if some guard, such as `epsilon = max(eps(), rand())`, was used. As the formal definition is the domain of paper-and-pencil, we don't fuss.)
In this case, $\delta$ is easy to guess, as the function is linear and has slope $3$. This basically says the $y$ scale is 3 times the $x$ scale. For non-linear functions, finding $\delta$ for a given $\epsilon$ can be a challenge. For the function $f(x) = x^3$, illustrated below, a value of $\delta=\epsilon^{1/3}$ is used for $c=0$:
In this case, $\delta$ is easy to guess, as the function is linear and has slope $3$. This basically says the $y$ scale is 3 times the $x$ scale. For non-linear functions, finding $\delta$ for a given $\epsilon$ can be more of a challenge.
```{julia}
#| hold: true
#| echo: false
#| cache: true
## {{{ limit_e_d }}}
gr()
function make_limit_e_d(n)
f(x) = x^3
##### Example
xs = range(-.9, stop=.9, length=50)
ys = map(f, xs)
We show using the definition that for any fixed $a$ and $n$:
$$
\lim_{x \rightarrow a} x^n = a^n.
$$
This proof uses a bound based on properties of the absolute value.
plt = plot(f, -.9, .9, legend=false, size=fig_size)
if n == 0
nothing
else
k = div(n+1,2)
epsilon = 1/2^k
delta = cbrt(epsilon)
if isodd(n)
plot!(plt, xs, 0*xs .+ epsilon, color=:orange)
plot!(plt, xs, 0*xs .- epsilon, color=:orange)
else
plot!(delta * [-1, 1], epsilon * [ 1, 1], color=:orange)
plot!(delta * [ 1, -1], epsilon * [-1,-1], color=:orange)
plot!(delta * [-1, -1], epsilon * [-1, 1], color=:red)
plot!(delta * [ 1, 1], epsilon * [-1, 1], color=:red)
end
end
plt
end
We look at $f(x) - L = x^n - a^n = (x-a)(x^{n-1} + x^{n-2}a + \cdots + x^1a^{n-1} + a^n)$.
Taking absolute values gives an inequality by the triangle inequality:
n = 11
anim = @animate for i=1:n
make_limit_e_d(i-1)
end
$$
\lvert x^n - a^n\rvert \leq \lvert x-a\rvert\cdot
\left(
\lvert x^{n-1}\rvert +
\lvert x^{n-2}\rvert\lvert a\rvert +
\cdots +
\lvert x^1\rvert\lvert a^{n-1}\rvert +
\lvert a^n\rvert
\right).
$$
imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)
Now, for a given $\epsilon>0$ we seek a $\delta>0$ satisfying the properties of the limit definition for $f(x) = x^n$ and $L=a^n$. For now, assume $\delta < 1$. Then we can assume $\lvert x-a\rvert < \delta$ and
$$
\lvert x\rvert = \lvert x - a + a\rvert \leq \lvert x-a\rvert + \lvert a\rvert < 1 + \lvert a\rvert
$$
caption = L"""
This says then
Demonstration of $\epsilon$-$\delta$ proof of $\lim_{x \rightarrow 0}
x^3 = 0$. For any $\epsilon>0$ (the orange lines) there exists a
$\delta>0$ (the red lines of the box) for which the function $f(x)$
does not leave the top or bottom of the box (except possibly at the
edges). In this example $\delta^3=\epsilon$.
$$
\begin{align*}
\lvert x^n - a^n\rvert
&\leq
\lvert x-a\rvert\cdot \left(
\lvert x\rvert^{n-1} +
\lvert x\rvert^{n-2}\lvert a\rvert +
\cdots +
\lvert x\rvert^1\lvert a^{n-1}\rvert +
\lvert a^n\rvert
\right)\\
%%
&\leq \lvert x - a\rvert
\cdot \left(
(\lvert a\rvert+1)^{n-1} +
(\lvert a\rvert+1)^{n-2}\lvert a\rvert
+ \cdots +
(\lvert a\rvert+1)^1 \lvert a^{n-1}\rvert +
\lvert a^n \rvert
\right)\\
&\leq \lvert x-a\rvert \cdot C,
\end{align*}
$$
where $C$ is just some constant not depending on $x$, just $a$ and $n$.
Now if $\delta < 1$ and $\delta < \epsilon/C$ and if
$0 < \lvert x - a \rvert < \delta$ then
$$
\lvert f(x) - L \rvert =
\lvert x^n - a^n\rvert \leq \lvert x-a\rvert \cdot C \leq \delta\cdot C < \frac{\epsilon}{C} \cdot C = \epsilon.
$$
With this result, the rules of limits can immediately extend this to any polynomial, $p(x),$ it follows that $\lim_{x \rightarrow c} p(x) = p(a)$. (Because $c_n x^n \rightarrow c_n a^n$ and the sum of two functions with a limit has the limit of the sums.) Based on this, we will say later that any polynomial is *continuous* for all $x$.
"""
plotly()
ImageFile(imgfile, caption)
```
## Questions
@@ -1234,6 +1371,12 @@ $$
The limit exists, what is the value?
```{julia}
#| hold: true
#| echo: false
f(x) = (cos(x) - 1)/x
p = plot(f, -1, 1)
```
```{julia}
#| hold: true
@@ -1257,26 +1400,26 @@ let
title!(p1, "(a)")
plot!(p1, x -> x^2, 0, 2, color=:black)
plot!(p1, zero, linestyle=:dash)
annotate!(p1,[(1,0,"a")])
annotate!(p1,[(1,0,text(L"a",:top))])
p2 = plot(;axis=nothing, legend=false)
title!(p2, "(b)")
plot!(p2, x -> 1/(1-x), 0, .95, color=:black)
plot!(p2, x-> -1/(1-x), 1.05, 2, color=:black)
plot!(p2, zero, linestyle=:dash)
annotate!(p2,[(1,0,"a")])
annotate!(p2,[(1,0,text(L"a",:top))])
p3 = plot(;axis=nothing, legend=false)
title!(p3, "(c)")
plot!(p3, sinpi, 0, 2, color=:black)
plot!(p3, zero, linestyle=:dash)
annotate!(p3,[(1,0,"a")])
annotate!(p3,[(1,0,text("a",:top))])
p4 = plot(;axis=nothing, legend=false)
title!(p4, "(d)")
plot!(p4, x -> x^x, 0, 2, color=:black)
plot!(p4, zero, linestyle=:dash)
annotate!(p4,[(1,0,"a")])
annotate!(p4,[(1,0,text(L"a",:top))])
l = @layout[a b; c d]
p = plot(p1, p2, p3, p4, layout=l)
@@ -1487,8 +1630,8 @@ Take
$$
f(x) = \begin{cases}
0 & x \neq 0\\
1 & x = 0
0 &~ x \neq 0\\
1 &~ x = 0
\end{cases}
$$

View File

@@ -22,6 +22,8 @@ nothing
---
![To infinity and beyond](figures/buzz-infinity.jpg){width=40%}
The limit of a function at $c$ need not exist for one of many different reasons. Some of these reasons can be handled with extensions to the concept of the limit, others are just problematic in terms of limits. This section covers examples of each.
@@ -30,22 +32,24 @@ Let's begin with a function that is just problematic. Consider
$$
f(x) = \sin(1/x)
f(x) = \sin(\frac{1}{x})
$$
As this is a composition of nice functions it will have a limit everywhere except possibly when $x=0$, as then $1/x$ may not have a limit. So rather than talk about where it is nice, let's consider the question of whether a limit exists at $c=0$.
@fig-sin-1-over-x shows the issue:
A graph shows the issue:
:::{#fig-sin-1-over-x}
```{julia}
#| hold: true
#| echo: false
f(x) = sin(1/x)
plot(f, range(-1, stop=1, length=1000))
```
Graph of the function $f(x) = \sin(1/x)$ near $0$. It oscillates infinitely many times around $0$.
:::
The graph oscillates between $-1$ and $1$ infinitely many times on this interval - so many times, that no matter how close one zooms in, the graph on the screen will fail to capture them all. Graphically, there is no single value of $L$ that the function gets close to, as it varies between all the values in $[-1,1]$ as $x$ gets close to $0$. A simple proof that there is no limit, is to take any $\epsilon$ less than $1$, then with any $\delta > 0$, there are infinitely many $x$ values where $f(x)=1$ and infinitely many where $f(x) = -1$. That is, there is no $L$ with $|f(x) - L| < \epsilon$ when $\epsilon$ is less than $1$ for all $x$ near $0$.
@@ -63,11 +67,10 @@ The following figure illustrates:
```{julia}
#| hold: true
f(x) = x * sin(1/x)
plot(f, -1, 1)
plot!(abs)
plot!(x -> -abs(x))
plot(f, -1, 1; label="f")
plot!(abs; label="|.|")
plot!(x -> -abs(x); label="-|.|")
```
The [squeeze](http://en.wikipedia.org/wiki/Squeeze_theorem) theorem of calculus is the formal reason $f$ has a limit at $0$, as both the upper function, $|x|$, and the lower function, $-|x|$, have a limit of $0$ at $0$.
@@ -97,8 +100,11 @@ But unlike the previous example, this function *would* have a limit if the defin
Let's loosen up the language in the definition of a limit to read:
> The limit of $f(x)$ as $x$ approaches $c$ is $L$ if for every neighborhood, $V$, of $L$ there is a neighborhood, $U$, of $c$ for which $f(x)$ is in $V$ for every $x$ in $U$, except possibly $x=c$.
::: {.callout-note icon=false}
The limit of $f(x)$ as $x$ approaches $c$ is $L$ if for every neighborhood, $V$, of $L$ there is a neighborhood, $U$, of $c$ for which $f(x)$ is in $V$ for every $x$ in $U$, except possibly $x=c$.
:::
The $\epsilon-\delta$ definition has $V = (L-\epsilon, L + \epsilon)$ and $U=(c-\delta, c+\delta)$. This is a rewriting of $L-\epsilon < f(x) < L + \epsilon$ as $|f(x) - L| < \epsilon$.
@@ -106,12 +112,16 @@ The $\epsilon-\delta$ definition has $V = (L-\epsilon, L + \epsilon)$ and $U=(c-
Now for the definition:
::: {.callout-note icon=false}
## The $\epsilon-\delta$ Definition of a right limit
> A function $f(x)$ has a limit on the right of $c$, written $\lim_{x \rightarrow c+}f(x) = L$ if for every $\epsilon > 0$, there exists a $\delta > 0$ such that whenever $0 < x - c < \delta$ it holds that $|f(x) - L| < \epsilon$. That is, $U$ is $(c, c+\delta)$
A function $f(x)$ has a limit on the right of $c$, written $\lim_{x \rightarrow c+}f(x) = L$ if for every $\epsilon > 0$, there exists a $\delta > 0$ such that whenever $0 < x - c < \delta$ it holds that $|f(x) - L| < \epsilon$. That is, $U$ is $(c, c+\delta)$
Similarly, a limit on the left is defined where $U=(c-\delta, c)$.
:::
The `SymPy` function `limit` has a keyword argument `dir="+"` or `dir="-"` to request that a one-sided limit be formed. The default is `dir="+"`. Passing `dir="+-"` will compute both one side limits, and throw an error if the two are not equal, in agreement with no limit existing.
@@ -172,21 +182,38 @@ Consider this funny graph:
```{julia}
#| hold: true
#| echo: false
xs = range(0,stop=1, length=50)
plot(x->x^2, -2, -1, legend=false)
let
xs = range(0,stop=1, length=50)
plot(; legend=false, aspect_ratio=true,
xticks = -4:4)
plot!([(-4, -1.5),(-2,4)]; line=(:black,1))
plot!(x->x^2, -2, -1; line=(:black,1))
plot!(exp, -1,0)
plot!(x -> 1-2x, 0, 1)
plot!(sqrt, 1, 2)
plot!(x -> 1-x, 2,3)
S = Plots.scale(Shape(:circle), 0.05)
plot!(Plots.translate(S, -4, -1.5); fill=(:black,))
plot!(Plots.translate(S, -1, (-1)^2); fill=(:white,))
plot!(Plots.translate(S, -1, exp(-1)); fill=(:black,))
plot!(Plots.translate(S, 1, 1 - 2(1)); fill=(:black,))
plot!(Plots.translate(S, 1, sqrt(1)); fill=(:white,))
plot!(Plots.translate(S, 2, sqrt(2)); fill=(:white,))
plot!(Plots.translate(S, 2, 1 - (2)); fill=(:black,))
plot!(Plots.translate(S, 3, 1 - (3)); fill=(:black,))
end
```
Describe the limits at $-1$, $0$, and $1$.
* At $-1$ we see a jump, there is no limit but instead a left limit of 1 and a right limit appearing to be $1/2$.
* At $0$ we see a limit of $1$.
* Finally, at $1$ again there is a jump, so no limit. Instead the left limit is about $-1$ and the right limit $1$.
* At $-1$ we see a jump, there is no limit but instead a left limit of 1 and a right limit appearing to be $1/2$.
* At $0$ we see a limit of $1$.
* Finally, at $1$ again there is a jump, so no limit. Instead the left limit is about $-1$ and the right limit $1$.
## Limits at infinity
@@ -354,7 +381,7 @@ limit(g(x), x=>0, dir="+")
## Limits of sequences
After all this, we still can't formalize the basic question asked in the introduction to limits: what is the area contained in a parabola. For that we developed a sequence of sums: $s_n = 1/2 \cdot((1/4)^0 + (1/4)^1 + (1/4)^2 + \cdots + (1/4)^n)$. This isn't a function of $x$, but rather depends only on non-negative integer values of $n$. However, the same idea as a limit at infinity can be used to define a limit.
After all this, we still can't formalize the basic question asked in the introduction to limits: what is the area contained in a parabola. For that we developed a sequence of sums: $s_n = 1/2 \cdot((1/4)^0 + (1/4)^1 + (1/4)^2 + \cdots + (1/4)^n)$. This isn't a function of real $x$, but rather depends only on non-negative integer values of $n$. However, the same idea as a limit at infinity can be used to define a limit.
> Let $a_0,a_1, a_2, \dots, a_n, \dots$ be a sequence of values indexed by $n$. We have $\lim_{n \rightarrow \infty} a_n = L$ if for every $\epsilon > 0$ there exists an $M>0$ where if $n > M$ then $|a_n - L| < \epsilon$.
@@ -434,16 +461,6 @@ $$
That ${n \choose k} \leq n^k$ can be viewed as the left side counts the number of combinations of $k$ choices from $n$ distinct items, which is less than the number of permutations of $k$ choices, which is less than the number of choices of $k$ items from $n$ distinct ones without replacement what $n^k$ counts.
### Some limit theorems for sequences
The limit discussion first defined limits of scalar univariate functions at a point $c$ and then added generalizations. The pedagogical approach can be reversed by starting the discussion with limits of sequences and then generalizing from there. This approach relies on a few theorems to be gathered along the way that are mentioned here for the curious reader:
* Convergent sequences are bounded.
* All *bounded* monotone sequences converge.
* Every bounded sequence has a convergent subsequence. (Bolzano-Weierstrass)
* The limit of $f$ at $c$ exists and equals $L$ if and only if for *every* sequence $x_n$ in the domain of $f$ converging to $c$ the sequence $s_n = f(x_n)$ converges to $L$.
## Summary
@@ -867,7 +884,7 @@ numericq(-1)
###### Question
As mentioned, for limits that depend on specific values of parameters `SymPy` may have issues. As an example, `SymPy` has an issue with the following limit, whose answer depends on the value of $k$"
As mentioned, for limits that depend on specific values of parameters `SymPy` may have issues. As an example, `SymPy` has an issue with the following limit, whose answer depends on the value of "$k$"
$$
@@ -1008,7 +1025,7 @@ radioq(choices, answ, keep_order=true)
Suppose a sequence of points $x_n$ converges to $a$ in the limiting sense. For a function $f(x)$, the sequence of points $f(x_n)$ may or may not converge. One alternative definition of a [limit](https://en.wikipedia.org/wiki/Limit_of_a_function#In_terms_of_sequences) due to Heine is that $\lim_{x \rightarrow a}f(x) = L$ if *and* only if **all** sequences $x_n \rightarrow a$ have $f(x_n) \rightarrow L$.
Consider the function $f(x) = \sin(1/x)$, $a=0$, and the two sequences implicitly defined by $1/x_n = \pi/2 + n \cdot (2\pi)$ and $y_n = 3\pi/2 + n \cdot(2\pi)$, $n = 0, 1, 2, \dots$.
Consider the function $f(x) = \sin(1/x)$, $a=0$, and the two sequences implicitly defined by $1/x_n = \pi/2 + n \cdot (2\pi)$ and $1/y_n = 3\pi/2 + n \cdot(2\pi)$, $n = 0, 1, 2, \dots$.
What is $\lim_{x_n \rightarrow 0} f(x_n)$?

File diff suppressed because it is too large Load Diff

12
quarto/misc/Project.toml Normal file
View File

@@ -0,0 +1,12 @@
[deps]
CalculusWithJulia = "a2e0e22d-7d4c-5312-9169-8b992201a882"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
HCubature = "19dc6840-f33b-545b-b366-655c7e3ffd49"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuadGK = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -68,7 +68,7 @@ For more control, the command line and `IJulia` provide access to the function i
] status
```
External packages are *typically* installed from GitHub and if they are regisered, installation is as easy as calling `add`:
External packages are *typically* installed from GitHub and if they are registered, installation is as easy as calling `add`:
```{julia}

View File

@@ -80,7 +80,7 @@ Pluto has a built-in package management system that manages the installation of
"Project [Jupyter](https://jupyter.org/) exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages." The `IJulia` package allows `Julia` to be one of these programming languages. This package must be installed prior to use.
The Jupyter Project provides two web-based interfaces to `Julia`: the Jupyter notebook and the newer JupyterLab. The [binder](https://mybinder.org/) project use Juptyer notebooks for their primary interface to `Julia`. To use a binder notebook, follow this link:
The Jupyter Project provides two web-based interfaces to `Julia`: the Jupyter notebook and the newer JupyterLab. The [binder](https://mybinder.org/) project use Jupyter notebooks for their primary interface to `Julia`. To use a binder notebook, follow this link:
[launch binder](https://mybinder.org/v2/gh/CalculusWithJulia/CwJScratchPad.git/master)

Some files were not shown because too many files have changed in this diff Show More