Merge pull request #151 from jverzani/v0.24

V0.24
This commit is contained in:
john verzani
2025-07-30 07:26:51 -04:00
committed by GitHub
98 changed files with 8519 additions and 2068 deletions

6
.gitignore vendored
View File

@@ -5,6 +5,7 @@ docs/site
test/benchmarks.json
Manifest.toml
TODO.md
Changelog.md
/*/_pdf_index.pdf
/*/*/_pdf_index.pdf
/*/_pdf_index.typ
@@ -12,4 +13,7 @@ TODO.md
/*/CalculusWithJulia.pdf
default.profraw
/quarto/default.profraw
/*/*/default.profraw
/*/*/default.profraw
/*/bonepile.qmd
/*/*/bonepile.qmd
/*/*_files

43
adjust_plotly.jl Normal file
View File

@@ -0,0 +1,43 @@
# The issue with `PlotlyLight` appears to be that
# the `str` below is called *after* the inclusion of `require.min.js`
# (That str is included in the `.qmd` file to be included in the header
# but the order of inclusion appears not to be adjustable)
# This little script just adds a line *before* the require call
# which seems to make it all work. The line number 83 might change.
#alternatives/plotly_plotting.html
function _add_plotly(f)
lineno = 117
str = """
<script src="https://cdn.plot.ly/plotly-2.11.0.min.js"></script>
"""
r = readlines(f)
open(f, "w") do io
for (i,l) enumerate(r)
i == lineno && println(io, str)
println(io, l)
end
end
end
function (@main)(args...)
for (root, dirs, files) in walkdir("_book")
for fᵢ files
f = joinpath(root, fᵢ)
if endswith(f, ".html")
_add_plotly(f)
end
end
end
#f = "_book/integrals/center_of_mass.html"
#_add_plotly(f)
return 1
end
["ODEs", "alternatives", "derivatives", "differentiable_vector_calculus", "integral_vector_calculus", "integrals", "limits", "misc", "precalc", "site_libs"]

1
quarto/.gitignore vendored
View File

@@ -3,5 +3,6 @@
/_freeze/
/*/*_files/
/*/*.ipynb/
/*/bonepile.qmd
/*/references.bib
weave_support.jl

View File

@@ -79,7 +79,7 @@ R(0) &= 0\\
\end{align*}
$$
In `Julia` we define these, `N` to model the total population, and `u0` to be the proportions.
In `Julia` we define these parameter values and `N` to model the total population and `u0` to be represent the proportions.
```{julia}
@@ -94,7 +94,7 @@ An *estimated* set of values for $k$ and $b$ are $k=1/3$, coming from the averag
Okay, the mathematical modeling is done; now we try to solve for the unknown functions using `DifferentialEquations`.
To warm up, if $b=0$ then $i'(t) = -k \cdot i(t)$ describes the infected. (There is no circulation of people in this case.) The solution would be achieved through:
To warm up, if $b=0$ then $i'(t) = -k \cdot i(t)$ describes the infected. (There is no circulation of people in this case.) This is a single ODE. The solution would be achieved through:
```{julia}
@@ -102,10 +102,12 @@ To warm up, if $b=0$ then $i'(t) = -k \cdot i(t)$ describes the infected. (The
k = 1/3
f(u,p,t) = -k * u # solving u(t) = - k u(t)
uᵢ0= I0/N
time_span = (0.0, 20.0)
prob = ODEProblem(f, I0/N, time_span)
sol = solve(prob, Tsit5(), reltol=1e-8, abstol=1e-8)
prob = ODEProblem(f, uᵢ0, time_span)
sol = solve(prob, Tsit5(); reltol=1e-8, abstol=1e-8)
plot(sol)
```
@@ -120,7 +122,7 @@ $$
\frac{di}{dt} = -k \cdot i(t) = F(i(t), k, t)
$$
where $F$ depends on the current value ($i$), a parameter ($k$), and the time ($t$). We did not utilize $p$ above for the parameter, as it was easy not to, but could have, and will in the following. The time variable $t$ does not appear by itself in our equation, so only `f(u, p, t) = -k * u` was used, `u` the generic name for a solution which in this case is $i$.
where $F$ depends on the current value ($i$), a parameter ($k$), and the time ($t$). We did not utilize $p$ above for the parameter, as it was easy not to, but could have, and will in the following. The time variable $t$ does not appear by itself in our equation, so only `f(u, p, t) = -k * u` was used, `u` the generic name for a solution which in this case was labeled with an $i$.
The problem we set up needs an initial value (the $u0$) and a time span to solve over. Here we want time to model real time, so use floating point values.
@@ -167,7 +169,7 @@ The `sir!` function has the trailing `!` indicating by convention it *mu
:::
With the update function defined, the problem is setup and a solution found with in the same manner:
With the update function defined, the problem is setup and a solution is found using the same manner as before:
```{julia}
@@ -193,7 +195,7 @@ p = (k=1/2, b=2) # change b from 1/2 to 2 -- more daily contact
prob = ODEProblem(sir!, u0, time_span, p)
sol = solve(prob, Tsit5())
plot(sol)
plot(sol; legend=:right)
```
The graphs are somewhat similar, but the steady state is reached much more quickly and nearly everyone became infected.
@@ -252,7 +254,7 @@ end
p
```
The 3-dimensional graph with `plotly` can have its viewing angle adjusted with the mouse. When looking down on the $x-y$ plane, which code `b` and `k`, we can see the rapid growth along a line related to $b/k$.
(A 3-dimensional graph with `plotly` or `Makie` can have its viewing angle adjusted with the mouse. When looking down on the $x-y$ plane, which code `b` and `k`, we can see the rapid growth along a line related to $b/k$.)
Smith and Moore point out that $k$ is roughly the reciprocal of the number of days an individual is sick enough to infect others. This can be estimated during a breakout. However, they go on to note that there is no direct way to observe $b$, but there is an indirect way.
@@ -382,7 +384,7 @@ SOL = solve(trajectory_problem, Tsit5(); p = ps, callback=cb)
plot(t -> SOL(t)[1], t -> SOL(t)[2], TSPAN...; legend=false)
```
Finally, we note that the `ModelingToolkit` package provides symbolic-numeric computing. This allows the equations to be set up symbolically, as in `SymPy` before being passed off to `DifferentialEquations` to solve numerically. The above example with no wind resistance could be translated into the following:
Finally, we note that the `ModelingToolkit` package provides symbolic-numeric computing. This allows the equations to be set up symbolically, as has been illustrated with `SymPy`, before being passed off to `DifferentialEquations` to solve numerically. The above example with no wind resistance could be translated into the following:
```{julia}

View File

@@ -184,7 +184,7 @@ plot(exp(-1/2)*exp(x^2/2), x0, 2)
plot!(xs, ys)
```
Not bad. We wouldn't expect this to be exact - due to the concavity of the solution, each step is an underestimate. However, we see it is an okay approximation and would likely be better with a smaller $h$. A topic we pursue in just a bit.
Not bad. We wouldn't expect this to be exact---due to the concavity of the solution, each step is an underestimate. However, we see it is an okay approximation and would likely be better with a smaller $h$. A topic we pursue in just a bit.
Rather than type in the above command each time, we wrap it all up in a function. The inputs are $n$, $a=x_0$, $b=x_n$, $y_0$, and, most importantly, $F$. The output is massaged into a function through a call to `linterp`, rather than two vectors. The `linterp` function[^Interpolations] we define below just finds a function that linearly interpolates between the points and is `NaN` outside of the range of the $x$ values:
@@ -263,7 +263,7 @@ Each step introduces an error. The error in one step is known as the *local trun
The total error, or more commonly, *global truncation error*, is the error between the actual answer and the approximate answer at the end of the process. It reflects an accumulation of these local errors. This error is *bounded* by a constant times $h$. Since it gets smaller as $h$ gets smaller in direct proportion, the Euler method is called *first order*.
Other, somewhat more complicated, methods have global truncation errors that involve higher powers of $h$ - that is for the same size $h$, the error is smaller. In analogy is the fact that Riemann sums have error that depends on $h$, whereas other methods of approximating the integral have smaller errors. For example, Simpson's rule had error related to $h^4$. So, the Euler method may not be employed if there is concern about total resources (time, computer, ...), it is important for theoretical purposes in a manner similar to the role of the Riemann integral.
Other, somewhat more complicated, methods have global truncation errors that involve higher powers of $h$---that is for the same size $h$, the error is smaller. In analogy is the fact that Riemann sums have error that depends on $h$, whereas other methods of approximating the integral have smaller errors. For example, Simpson's rule had error related to $h^4$. So, the Euler method may not be employed if there is concern about total resources (time, computer, ...), it is important for theoretical purposes in a manner similar to the role of the Riemann integral.
In the examples, we will see that for many problems the simple Euler method is satisfactory, but not always so. The task of numerically solving differential equations is not a one-size-fits-all one. In the following, a few different modifications are presented to the basic Euler method, but this just scratches the surface of the topic.
@@ -648,7 +648,7 @@ plot(euler2(x0, xn, y0, yp0, 360), 0, 4T)
plot!(x -> pi/4*cos(sqrt(g/l)*x), 0, 4T)
```
Even now, we still see that something seems amiss, though the issue is not as dramatic as before. The oscillatory nature of the pendulum is seen, but in the Euler solution, the amplitude grows, which would necessarily mean energy is being put into the system. A familiar instance of a pendulum would be a child on a swing. Without pumping the legs - putting energy in the system - the height of the swing's arc will not grow. Though we now have oscillatory motion, this growth indicates the solution is still not quite right. The issue is likely due to each step mildly overcorrecting and resulting in an overall growth. One of the questions pursues this a bit further.
Even now, we still see that something seems amiss, though the issue is not as dramatic as before. The oscillatory nature of the pendulum is seen, but in the Euler solution, the amplitude grows, which would necessarily mean energy is being put into the system. A familiar instance of a pendulum would be a child on a swing. Without pumping the legs---putting energy in the system---the height of the swing's arc will not grow. Though we now have oscillatory motion, this growth indicates the solution is still not quite right. The issue is likely due to each step mildly overcorrecting and resulting in an overall growth. One of the questions pursues this a bit further.
## Questions
@@ -794,7 +794,7 @@ Modify the `euler2` function to implement the Euler-Cromer method. What do you s
#| hold: true
#| echo: false
choices = [
"The same as before - the amplitude grows",
"The same as before---the amplitude grows",
"The solution is identical to that of the approximation found by linearization of the sine term",
"The solution has a constant amplitude, but its period is slightly *shorter* than that of the approximate solution found by linearization",
"The solution has a constant amplitude, but its period is slightly *longer* than that of the approximate solution found by linearization"]

View File

@@ -149,7 +149,7 @@ $$
U'(t) = -r U(t), \quad U(0) = U_0.
$$
This shows that the rate of change of $U$ depends on $U$. Large positive values indicate a negative rate of change - a push back towards the origin, and large negative values of $U$ indicate a positive rate of change - again, a push back towards the origin. We shouldn't be surprised to either see a steady decay towards the origin, or oscillations about the origin.
This shows that the rate of change of $U$ depends on $U$. Large positive values indicate a negative rate of change---a push back towards the origin, and large negative values of $U$ indicate a positive rate of change---again, a push back towards the origin. We shouldn't be surprised to either see a steady decay towards the origin, or oscillations about the origin.
What will we find? This equation is different from the previous two equations, as the function $U$ appears on both sides. However, we can rearrange to get:
@@ -177,7 +177,7 @@ $$
In words, the initial difference in temperature of the object and the environment exponentially decays to $0$.
That is, as $t > 0$ goes to $\infty$, the right hand will go to $0$ for $r > 0$, so $T(t) \rightarrow T_a$ - the temperature of the object will reach the ambient temperature. The rate of this is largest when the difference between $T(t)$ and $T_a$ is largest, so when objects are cooling the statement "hotter things cool faster" is appropriate.
That is, as $t > 0$ goes to $\infty$, the right hand will go to $0$ for $r > 0$, so $T(t) \rightarrow T_a$---the temperature of the object will reach the ambient temperature. The rate of this is largest when the difference between $T(t)$ and $T_a$ is largest, so when objects are cooling the statement "hotter things cool faster" is appropriate.
A graph of the solution for $T_0=200$ and $T_a=72$ and $r=1/2$ is made as follows. We've added a few line segments from the defining formula, and see that they are indeed tangent to the solution found for the differential equation.
@@ -403,7 +403,7 @@ To finish, we call `dsolve` to find a solution (if possible):
out = dsolve(eqn)
```
This answer - to a first-order equation - has one free constant, `C₁`, which can be solved for from an initial condition. We can see that when $a > 0$, as $x$ goes to positive infinity the solution goes to $1$, and when $x$ goes to negative infinity, the solution goes to $0$ and otherwise is trapped in between, as expected.
This answer---to a first-order equation---has one free constant, `C₁`, which can be solved for from an initial condition. We can see that when $a > 0$, as $x$ goes to positive infinity the solution goes to $1$, and when $x$ goes to negative infinity, the solution goes to $0$ and otherwise is trapped in between, as expected.
The limits are confirmed by investigating the limits of the right-hand:
@@ -618,6 +618,7 @@ nothing
```
![The cables of an unloaded suspension bridge have a different shape than a loaded suspension bridge. As seen, the cables in this [figure](https://www.brownstoner.com/brooklyn-life/verrazano-narrows-bridge-anniversary-historic-photos/) would be modeled by a catenary.](./figures/verrazano-narrows-bridge-anniversary-historic-photos-2.jpeg)
---
@@ -641,7 +642,7 @@ $$
x''(t) = 0, \quad y''(t) = -g.
$$
That is, the $x$ position - where no forces act - has $0$ acceleration, and the $y$ position - where the force of gravity acts - has constant acceleration, $-g$, where $g=9.8m/s^2$ is the gravitational constant. These equations can be solved to give:
That is, the $x$ position---where no forces act---has $0$ acceleration, and the $y$ position---where the force of gravity acts---has constant acceleration, $-g$, where $g=9.8m/s^2$ is the gravitational constant. These equations can be solved to give:
$$
@@ -957,7 +958,7 @@ radioq(choices, answ)
##### Question
The example with projectile motion in a medium has a parameter $\gamma$ modeling the effect of air resistance. If `y` is the answer - as would be the case if the example were copy-and-pasted in - what can be said about `limit(y, gamma=>0)`?
The example with projectile motion in a medium has a parameter $\gamma$ modeling the effect of air resistance. If `y` is the answer---as would be the case if the example were copy-and-pasted in---what can be said about `limit(y, gamma=>0)`?
```{julia}
@@ -966,7 +967,7 @@ The example with projectile motion in a medium has a parameter $\gamma$ modeling
choices = [
"The limit is a quadratic polynomial in `x`, mirroring the first part of that example.",
"The limit does not exist, but the limit to `oo` gives a quadratic polynomial in `x`, mirroring the first part of that example.",
"The limit does not exist -- there is a singularity -- as seen by setting `gamma=0`."
"The limit does not exist---there is a singularity---as seen by setting `gamma=0`."
]
answ = 1
radioq(choices, answ)

View File

@@ -118,10 +118,10 @@ function solve(prob::Problem, alg::EulerMethod)
end
```
The post has a more elegant means to unpack the parameters from the structures, but for each of the above, the parameters are unpacked, and then the corresponding algorithm employed. As of version `v1.7` of `Julia`, the syntax `(;g,y0,v0,tspan) = prob` could also be employed.
The post has a more elegant means to unpack the parameters from the structures, but for each of the above, the parameters are unpacked using the dot notation for `getproperty`, and then the corresponding algorithm employed. As of version `v1.7` of `Julia`, the syntax `(;g,y0,v0,tspan) = prob` could also have been employed.
The exact formulas, `y(t) = y0 + v0*(t - t0) - g*(t - t0)^2/2` and `v(t) = v0 - g*(t - t0)`, follow from well-known physics formulas. Each answer is wrapped in a `Solution` type so that the answers found can be easily extracted in a uniform manner.
The exact answers, `y(t) = y0 + v0*(t - t0) - g*(t - t0)^2/2` and `v(t) = v0 - g*(t - t0)`, follow from well-known physics formulas for constant-acceleration motion. Each answer is wrapped in a `Solution` type so that the answers found can be easily extracted in a uniform manner.
For example, plots of each can be obtained through:
@@ -138,7 +138,9 @@ plot!(sol_exact.t, sol_exact.y; label="exact solution", ls=:auto)
title!("On the Earth"; xlabel="t", legend=:bottomleft)
```
Following the post, since the time step `dt = 0.1` is not small enough, the error of the Euler method is rather large. Next we change the algorithm parameter, `dt`, to be smaller:
Following the post, since the time step `dt = 0.1` is not small enough, the error of the Euler method is readily identified.
Next we change the algorithm parameter, `dt`, to be smaller:
```{julia}
@@ -155,7 +157,7 @@ title!("On the Earth"; xlabel="t", legend=:bottomleft)
It is worth noting that only the first line is modified, and only the method requires modification.
Were the moon to be considered, the gravitational constant would need adjustment. This parameter is part of the problem, not the solution algorithm.
Were the moon to be considered, the gravitational constant would need adjustment. This parameter is a property of the problem, not the solution algorithm, as `dt` is.
Such adjustments are made by passing different values to the `Problem` constructor:
@@ -175,7 +177,9 @@ title!("On the Moon"; xlabel="t", legend=:bottomleft)
The code above also adjusts the time span in addition to the graviational constant. The algorithm for exact formula is set to use the `dt` value used in the `euler` formula, for easier comparison. Otherwise, outside of the labels, the patterns are the same. Only those things that need changing are changed, the rest comes from defaults.
The above shows the benefits of using a common interface. Next, the post illustrates how *other* authors could extend this code, simply by adding a *new* `solve` method. For example,
The above shows the benefits of using a common interface.
Next, the post illustrates how *other* authors could extend this code, simply by adding a *new* `solve` method. For example, a sympletic method conserves a quantity, so can track long-term evolution without drift.
```{julia}

View File

@@ -4,12 +4,22 @@ Short cut. Run first command until happy, then run second to publish
```
quarto render
#julia adjust_plotly.jl # <-- no longer needed
# maybe git config --global http.postBuffer 157286400
quarto publish gh-pages --no-render
```
But better to
```
quarto render
# commit changes and push
# fix typos
quarto render
quarto publish gh-pages --no-render
```
To compile the pages through quarto

View File

@@ -2,6 +2,28 @@
#| output: false
#| echo: false
# Some style choices for `Plots.jl`
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35))
range_style = (; fill=(:blue, 0.35))
nothing
```
```{julia}
#| output: false
#| echo: false
## Formatting options are included here; not in CalculusWithJulia.WeaveSupport
using QuizQuestions
nothing

View File

@@ -11,7 +11,7 @@ typst_tpl = mt"""
---
title: {{:title}}
date: today
jupyter: julia-1.11
engine: julia
execute:
daemon: false
format:
@@ -25,6 +25,11 @@ format:
#set figure(placement: auto)
bibliography: references.bib
---
```{julia}
#| echo: false
import Plots; Plots.plotly() = Plots.gr();
nothing
```
"""
index = "_pdf_index"

View File

@@ -1,4 +1,4 @@
version: "0.23"
version: "0.24"
engines: ['julia']
project:
@@ -25,14 +25,16 @@ book:
page-footer: "Copyright 2022-25, John Verzani"
chapters:
- index.qmd
- part: basics.qmd
chapters:
- basics/calculator.qmd
- basics/variables.qmd
- basics/numbers_types.qmd
- basics/logical_expressions.qmd
- basics/vectors.qmd
- basics/ranges.qmd
- part: precalc.qmd
chapters:
- precalc/calculator.qmd
- precalc/variables.qmd
- precalc/numbers_types.qmd
- precalc/logical_expressions.qmd
- precalc/vectors.qmd
- precalc/ranges.qmd
- precalc/functions.qmd
- precalc/plotting.qmd
- precalc/transformations.qmd
@@ -83,6 +85,7 @@ book:
- integrals/volumes_slice.qmd
- integrals/arc_length.qmd
- integrals/surface_area.qmd
- integrals/orthogonal_polynomials.qmd
- integrals/twelve-qs.qmd
- part: ODEs.qmd
@@ -100,6 +103,7 @@ book:
- differentiable_vector_calculus/scalar_functions.qmd
- differentiable_vector_calculus/scalar_functions_applications.qmd
- differentiable_vector_calculus/vector_fields.qmd
- differentiable_vector_calculus/matrix_calculus_notes.qmd
- differentiable_vector_calculus/plots_plotting.qmd
- part: integral_vector_calculus.qmd
@@ -114,7 +118,7 @@ book:
chapters:
- alternatives/symbolics.qmd
- alternatives/SciML.qmd
# - alternatives/interval_arithmetic.qmd
#- alternatives/interval_arithmetic.qmd
- alternatives/plotly_plotting.qmd
- alternatives/makie_plotting.qmd
@@ -158,3 +162,5 @@ execute:
error: false
# freeze: false
freeze: auto
# cache: false
# enabled: true

View File

@@ -5,19 +5,42 @@
# This little script just adds a line *before* the require call
# which seems to make it all work. The line number 83 might change.
f = "_book/alternatives/plotly_plotting.html"
lineno = 88
#alternatives/plotly_plotting.html
function _add_plotly(f)
#lineno = 117
str = """
<script src="https://cdn.plot.ly/plotly-2.11.0.min.js"></script>
"""
r = readlines(f)
open(f, "w") do io
for (i,l) enumerate(r)
i == lineno && println(io, str)
println(io, l)
r = readlines(f)
inserted = false
open(f, "w") do io
for (i,l) enumerate(r)
if contains(l, "require.min.js")
!inserted && println(io, """
<script src="https://cdn.plot.ly/plotly-2.6.3.min.js"></script>
""")
inserted = true
end
println(io, l)
end
end
end
function (@main)(args...)
for (root, dirs, files) in walkdir("_book")
for fᵢ files
f = joinpath(root, fᵢ)
if endswith(f, ".html")
dirname(f) == "_book" && continue
_add_plotly(f)
end
end
end
#f = "_book/integrals/center_of_mass.html"
#_add_plotly(f)
return 1
end
["ODEs", "alternatives", "derivatives", "differentiable_vector_calculus", "integral_vector_calculus", "integrals", "limits", "misc", "precalc", "site_libs"]

View File

@@ -6,8 +6,8 @@ These notes use a particular selection of packages. This selection could have be
* The finding of zeros of scalar-valued, univariate functions is done with `Roots`. The [NonlinearSolve](./alternatives/SciML.html#nonlinearsolve) package provides an alternative for univariate and multi-variate functions.
* The finding of minima and maxima was done mirroring the framework of a typical calculus class; the [Optimization](./alternatives/SciML.html#optimization-optimization.jl) provides an alternative.
* The finding of minima and maxima was done mirroring the framework of a typical calculus class; the [Optimization](./alternatives/SciML.html#optimization-optimization.jl) package provides an alternative.
* The computation of numeric approximations for definite integrals is computed with the `QuadGK` and `HCubature` packages. The [Integrals](./alternatives/SciML.html#integration-integrals.jl) package provides a unified interface for numeric to these two packages, among others.
* The computation of numeric approximations for definite integrals is computed with the `QuadGK` and `HCubature` packages. The [Integrals](./alternatives/SciML.html#integration-integrals.jl) package provides a unified interface for numeric integration, including these two packages, among others.
* Plotting was done using the popular `Plots` package. The [Makie](./alternatives/makie_plotting.html) package provides a very powerful alternative. Whereas the [PlotlyLight](./alternatives/plotly_plotting.html) package provides a light-weight alternative using an open-source JavaScript library.

View File

@@ -5,11 +5,13 @@ ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
GLMakie = "e9467ef8-e4e7-5192-8a1a-b1aee30e663a"
GeometryBasics = "5c1252a2-5f33-56bf-86c9-59e7332b4326"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
Implicit3DPlotting = "d997a800-832a-4a4c-b340-7dddf3c1ad50"
Integrals = "de52edbc-65ea-441a-8357-d3a637375a31"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
Meshing = "e6723b4c-ebff-59f1-b4b7-d97aa5274f73"
ModelingToolkit = "961ee093-0014-501f-94e3-6117800e7a78"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
NonlinearSolve = "8913a72c-1f9b-4ce2-8d82-65094dcecaec"
Optimization = "7f7a1694-90dd-40f0-9382-eb1efda571ba"
OptimizationOptimJL = "36348300-93cb-4f02-beb5-3c3902f8871e"
@@ -19,6 +21,7 @@ PlotlyKaleido = "f2990250-8cf9-495f-b13a-cce12b45703c"
PlotlyLight = "ca7969ec-10b3-423e-8d99-40f33abb42bf"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuadGK = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
SplitApplyCombine = "03a91e81-4c3e-53e1-a0a4-9c0c8f19dd66"
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
@@ -26,3 +29,5 @@ SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
SymbolicLimits = "19f23fe9-fdab-4a78-91af-e7b7767979c3"
SymbolicNumericIntegration = "78aadeae-fbc0-11eb-17b6-c7ec0477ba9e"
Symbolics = "0c5d862f-8b57-4792-8d23-62f2024744c7"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -250,7 +250,7 @@ With the system defined, we can pass this to `NonlinearProblem`, as was done wit
```{julia}
prob = NonlinearProblem(ns, [1.0], [α => 1.0])
prob = NonlinearProblem(mtkcompile(ns), [1.0], Dict(α => 1.0))
```
The problem is solved as before:

View File

@@ -1,6 +1,5 @@
# Calculus plots with Makie
{{< include ../_common_code.qmd >}}
The [Makie.jl webpage](https://github.com/JuliaPlots/Makie.jl) says
@@ -36,8 +35,7 @@ using GLMakie
import LinearAlgebra: norm
```
The `Makie` developers have workarounds for the delayed time to first plot, but without utilizing these the time to load the package is lengthy.
The package load time as of recent version of `Makie` is quite reasonable for a complicated project. (The time to first plot is under 3 seconds on a typical machine.)
## Points (`scatter`)
@@ -158,7 +156,8 @@ A point is drawn with a "marker" with a certain size and color. These attributes
```{julia}
scatter(xs, ys;
marker=[:x,:cross, :circle], markersize=25,
marker=[:x,:cross, :circle],
markersize=25,
color=:blue)
```
@@ -176,7 +175,7 @@ A single value will be repeated. A vector of values of a matching size will spec
## Curves
The curves of calculus are lines. The `lines` command of `Makie` will render a curve by connecting a series of points with straight-line segments. By taking a sufficient number of points the connect-the-dot figure can appear curved.
A visualization of a curve in calculus is comprised of line segments. The `lines` command of `Makie` will render a curve by connecting a series of points with straight-line segments. By taking a sufficient number of points the connect-the-dot figure can appear curved.
### Plots of univariate functions
@@ -304,7 +303,6 @@ current_figure()
### Text (`annotations`)
Text can be placed at a point, as a marker is. To place text, the desired text and a position need to be specified along with any adjustments to the default attributes.
@@ -315,25 +313,43 @@ For example:
xs = 1:5
pts = Point2.(xs, xs)
scatter(pts)
annotations!("Point " .* string.(xs), pts;
fontsize = 50 .- 2*xs,
rotation = 2pi ./ xs)
annotation!(pts;
text = "Point " .* string.(xs),
fontsize = 30 .- 5*xs)
current_figure()
```
The graphic shows that `fontsize` adjusts the displayed size and `rotation` adjusts the orientation. (The graphic also shows a need to manually override the limits of the `y` axis, as the `Point 5` is chopped off; the `ylims!` function to do so will be shown later.)
The graphic shows that `fontsize` adjusts the displayed size.
Attributes for `text`, among many others, include:
* `align` Specify the text alignment through `(:pos, :pos)`, where `:pos` can be `:left`, `:center`, or `:right`.
* `rotation` to indicate how the text is to be rotated
* `fontsize` the font point size for the text
* `font` to indicate the desired font
Annotations with an arrow can be useful to highlight a feature of a graph. This example is modified from the documentation and utilizes some interval functions to draw an arrow with an arc:
```{julia}
g(x) = cos(6x) * exp(x)
xs = 0:0.01:4
_, ax, _ = lines(xs, g.(xs); axis = (; xgridvisible = false, ygridvisible = false))
annotation!(ax, 1, 20, 2.1, g(2.1),
text = "A relative maximum",
path = Ann.Paths.Arc(0.3),
style = Ann.Styles.LineArrow(),
labelspace = :data
)
current_figure()
```
#### Line attributes
@@ -666,6 +682,7 @@ A surface of revolution for $g(u)$ revolved about the $z$ axis can be visualized
```{julia}
g(u) = u^2 * exp(-u)
r(u,v) = (g(u)*sin(v), g(u)*cos(v), u)
us = range(0, 3, length=10)
vs = range(0, 2pi, length=10)
xs, ys, zs = parametric_grid(us, vs, r)
@@ -681,6 +698,7 @@ A torus with big radius $2$ and inner radius $1/2$ can be visualized as follows
```{julia}
r1, r2 = 2, 1/2
r(u,v) = ((r1 + r2*cos(v))*cos(u), (r1 + r2*cos(v))*sin(u), r2*sin(v))
us = vs = range(0, 2pi, length=25)
xs, ys, zs = parametric_grid(us, vs, r)
@@ -696,6 +714,7 @@ A Möbius strip can be produced with:
ws = range(-1/4, 1/4, length=8)
thetas = range(0, 2pi, length=30)
r(w, θ) = ((1+w*cos(θ/2))*cos(θ), (1+w*cos(θ/2))*sin(θ), w*sin(θ/2))
xs, ys, zs = parametric_grid(ws, thetas, r)
surface(xs, ys, zs)
@@ -865,20 +884,19 @@ end
#### Implicitly defined surfaces, $F(x,y,z)=0$
To plot the equation $F(x,y,z)=0$, for $F$ a scalar-valued function, again the implicit function theorem says that, under conditions, near any solution $(x,y,z)$, $z$ can be represented as a function of $x$ and $y$, so the graph will look likes surfaces stitched together. The `Implicit3DPlotting` package takes an approach like `ImplicitPlots` to represent these surfaces. It replaces the `Contour` package computation with a $3$-dimensional alternative provided through the `Meshing` and `GeometryBasics` packages.
To plot the equation $F(x,y,z)=0$, for $F$ a scalar-valued function, again the implicit function theorem says that, under conditions, near any solution $(x,y,z)$, $z$ can be represented as a function of $x$ and $y$, so the graph will look like surfaces stitched together.
```{julia}
using Implicit3DPlotting
```
With `Makie`, many implicitly defined surfaces can be adequately represented using `countour` with the attribute `levels=[0]`. We will illustrate this technique.
The `Implicit3DPlotting` package takes an approach like `ImplicitPlots` to represent these surfaces. It replaces the `Contour` package computation with a $3$-dimensional alternative provided through the `Meshing` and `GeometryBasics` packages. This package has a `plot_implicit_surface` function that does something similar to below. We don't illustrate it, as it *currently* doesn't work with the latest version of `Makie`.
This example, plotting an implicitly defined sphere, comes from the documentation of `Implicit3DPlotting`. The `f` to be plotted is a scalar-valued function of a vector:
To begin, we plot a sphere implicitly as a solution to $F(x,y,z) = x^2 + y^2 + z^2 - 1 = 0$>
```{julia}
f(x) = sum(x.^2) - 1
xlims = ylims = zlims = (-5, 5)
plot_implicit_surface(f; xlims, ylims, zlims)
f(x,y,z) = x^2 + y^2 + z^2 - 1
xs = ys = zs = range(-3/2, 3/2, 100)
contour(xs, ys, zs, f; levels=[0])
```
@@ -887,11 +905,13 @@ Here we visualize an intersection of a sphere with another figure:
```{julia}
r₂(x) = sum(x.^2) - 5/4 # a sphere
r₂(x) = sum(x.^2) - 2 # a sphere
r₄(x) = sum(x.^4) - 1
xlims = ylims = zlims = (-2, 2)
p = plot_implicit_surface(r₂; xlims, ylims, zlims, color=:yellow)
plot_implicit_surface!(p, r₄; xlims, ylims, zlims, color=:red)
ϕ(x,y,z) = (x,y,z)
xs = ys = zs = range(-2, 2, 100)
contour(xs, ys, zs, r₂∘ϕ; levels = [0], colormap=:RdBu)
contour!(xs, ys, zs, r₄∘ϕ; levels = [0], colormap=:viridis)
current_figure()
```
@@ -900,11 +920,12 @@ This example comes from [Wikipedia](https://en.wikipedia.org/wiki/Implicit_surfa
```{julia}
f(x,y,z) = 2y*(y^2 -3x^2)*(1-z^2) + (x^2 +y^2)^2 - (9z^2-1)*(1-z^2)
xlims = ylims = zlims = (-5/2, 5/2)
plot_implicit_surface(x -> f(x...); xlims, ylims, zlims)
xs = ys = zs = range(-5/2, 5/2, 100)
contour(xs, ys, zs, f; levels=[0], colormap=:RdBu)
```
(This figure does not render well through `contour(xs, ys, zs, f, levels=[0])`, as the hole is not shown.)
(This figure does not render well though, as the hole is not shown.)
For one last example from Wikipedia, we have the Cassini oval which "can be defined as the point set for which the *product* of the distances to $n$ given points is constant." That is:
@@ -915,8 +936,8 @@ function cassini(λ, ps = ((1,0,0), (-1, 0, 0)))
n = length(ps)
x -> prod(norm(x .- p) for p ∈ ps) - λ^n
end
xlims = ylims = zlims = (-2, 2)
plot_implicit_surface(cassini(1.05); xlims, ylims, zlims)
xs = ys = zs = range(-2, 2, 100)
contour(xs, ys, zs, cassini(0.80) ∘ ϕ; levels=[0], colormap=:RdBu)
```
## Vector fields. Visualizations of $f:R^2 \rightarrow R^2$
@@ -1064,7 +1085,7 @@ F
### Observables
The basic components of a plot in `Makie` can be updated [interactively](https://makie.juliaplots.org/stable/documentation/nodes/index.html#observables_interaction). `Makie` uses the `Observables` package which allows complicated interactions to be modeled quite naturally. In the following we give a simple example.
The basic components of a plot in `Makie` can be updated [interactively](https://makie.juliaplots.org/stable/documentation/nodes/index.html#observables_interaction). Historically `Makie` used the `Observables` package which allows complicated interactions to be modeled quite naturally. In the following we give a simple example, though newer versions of `Makie` rely on a different mechanism.
In Makie, an `Observable` is a structure that allows its value to be updated, similar to an array. When changed, observables can trigger an event. Observables can rely on other observables, so events can be cascaded.
@@ -1123,6 +1144,7 @@ end
lines!(ax, xs, f)
lines!(ax, points)
scatter!(ax, points; markersize=10)
current_figure()
```

View File

@@ -73,7 +73,6 @@ The `Config` constructor (from the `EasyConfig` package loaded with `PlotlyLight
```{julia}
#| hold: true
cfg = Config()
cfg.key1.key2.key3 = "value"
cfg
@@ -89,7 +88,6 @@ A basic scatter plot of points $(x,y)$ is created as follows:
```{julia}
#| hold: true
xs = 1:5
ys = rand(5)
data = Config(x = xs,
@@ -113,7 +111,6 @@ A line plot is very similar, save for a different `mode` specification:
```{julia}
#| hold: true
xs = 1:5
ys = rand(5)
data = Config(x = xs,
@@ -134,7 +131,6 @@ The line graph plays connect-the-dots with the points specified by paired `x` an
```{julia}
#| hold: true
data = Config(
x=[0,1,nothing,3,4,5],
y = [0,1,2,3,4,5],
@@ -149,7 +145,6 @@ More than one graph or layer can appear on a plot. The `data` argument can be a
```{julia}
#| hold: true
data = [Config(x = 1:5,
y = rand(5),
type = "scatter",
@@ -177,7 +172,6 @@ For example, here we plot the graphs of both the $\sin(x)$ and $\cos(x)$ over $[
```{julia}
#| hold: true
a, b = 0, 2pi
xs, ys = PlotUtils.adapted_grid(sin, (a,b))
@@ -193,7 +187,6 @@ The values for `a` and `b` are used to generate the $x$- and $y$-values. These c
```{julia}
#| hold: true
xs, ys = PlotUtils.adapted_grid(x -> x^5 - x - 1, (0, 2)) # answer is (0,2)
p = Plot([Config(x=xs, y=ys, name="Polynomial"),
Config(x=xs, y=0 .* ys, name="x-axis", mode="lines", line=Config(width=5))]
@@ -232,7 +225,6 @@ A marker's attributes can be adjusted by values passed to the `marker` key. Labe
```{julia}
#| hold: true
data = Config(x = 1:5,
y = rand(5),
mode="markers+text",
@@ -251,40 +243,7 @@ The `text` mode specification is necessary to have text be displayed on the char
#### RGB Colors
The `ColorTypes` package is the standard `Julia` package providing an `RGB` type (among others) for specifying red-green-blue colors. To make this work with `Config` and `JSON3` requires some type-piracy (modifying `Base.string` for the `RGB` type) to get, say, `RGB(0.5, 0.5, 0.5)` to output as `"rgb(0.5, 0.5, 0.5)"`. (RGB values in JavaScript are integers between $0$ and $255$ or floating point values between $0$ and $1$.) A string with this content can be specified. Otherwise, something like the following can be used to avoid the type piracy:
```{julia}
struct rgb
r
g
b
end
PlotlyLight.JSON3.StructTypes.StructType(::Type{rgb}) = PlotlyLight.JSON3.StructTypes.StringType()
Base.string(x::rgb) = "rgb($(x.r), $(x.g), $(x.b))"
```
With these defined, red-green-blue values can be used for colors. For example to give a range of colors, we might have:
```{julia}
#| hold: true
cols = [rgb(i,i,i) for i in range(10, 245, length=5)]
sizes = [12, 16, 20, 24, 28]
data = Config(x = 1:5,
y = rand(5),
mode="markers+text",
type="scatter",
name="scatter plot",
text = ["marker $i" for i in 1:5],
textposition = "top center",
marker = Config(size=sizes, color=cols)
)
Plot(data)
```
The `opacity` key can be used to control the transparency, with a value between $0$ and $1$.
The `ColorTypes` package is the standard `Julia` package providing an `RGB` type (among others) for specifying red-green-blue colors. To make this work with `Config` and `JSON3` requires some type-piracy (modifying `Base.string` for the `RGB` type) to get, say, `RGB(0.5, 0.5, 0.5)` to output as `"rgb(0.5, 0.5, 0.5)"`. (RGB values in JavaScript are integers between $0$ and $255$ or floating point values between $0$ and $1$.) A string with this content can be specified.
#### Marker symbols
@@ -293,7 +252,6 @@ The `marker_symbol` key can be used to set a marker shape, with the basic values
```{julia}
#| hold: true
markers = ["circle", "square", "diamond", "cross", "x", "triangle", "pentagon",
"hexagram", "star", "diamond", "hourglass", "bowtie", "asterisk",
"hash", "y", "line"]
@@ -327,7 +285,6 @@ The `shape` attribute determine how the points are connected. The default is `li
```{julia}
#| hold: true
shapes = ["linear", "hv", "vh", "hvh", "vhv", "spline"]
data = [Config(x = 1:5, y = 5*(i-1) .+ [1,3,2,3,1], mode="lines+markers", type="scatter",
name=shape,
@@ -358,7 +315,6 @@ In the following, to highlight the difference between $f(x) = \cos(x)$ and $p(x)
```{julia}
#| hold: true
xs = range(-1, 1, 100)
data = [
Config(
@@ -381,7 +337,6 @@ The `toself` declaration is used below to fill in a polygon:
```{julia}
#| hold: true
data = Config(
x=[-1,1,1,-1,-1], y = [-1,1,-1,1,-1],
fill="toself",
@@ -399,7 +354,6 @@ The legend is shown when $2$ or more charts or specified, by default. This can b
```{julia}
#| hold: true
data = Config(x=1:5, y=rand(5), type="scatter", mode="markers", name="legend label")
lyt = Config(title = "Main chart title",
xaxis = Config(title="x-axis label"),
@@ -416,7 +370,6 @@ The aspect ratio of the chart can be set to be equal through the `scaleanchor` k
```{julia}
#| hold: true
ts = range(0, 2pi, length=100)
data = Config(x = sin.(ts), y = cos.(ts), mode="lines", type="scatter")
lyt = Config(title = "A circle",
@@ -434,7 +387,6 @@ Text annotations may be specified as part of the layout object. Annotations may
```{julia}
#| hold: true
data = Config(x = [0, 1], y = [0, 1], mode="markers", type="scatter")
layout = Config(title = "Annotations",
xaxis = Config(title="x",
@@ -452,7 +404,7 @@ Plot(data, layout)
The following example is more complicated use of the elements previously described. It mimics an image from [Wikipedia](https://en.wikipedia.org/wiki/List_of_trigonometric_identities) for trigonometric identities. The use of `LaTeX` does not seem to be supported through the `JavaScript` interface; unicode symbols are used instead. The `xanchor` and `yanchor` keys are used to position annotations away from the default. The `textangle` key is used to rotate text, as desired.
```{julia, hold=true}
```{julia}
alpha = pi/6
beta = pi/5
xₘ = cos(alpha)*cos(beta)
@@ -569,7 +521,6 @@ Earlier, we plotted a two dimensional circle, here we plot the related helix.
```{julia}
#| hold: true
helix(t) = [cos(t), sin(t), t]
ts = range(0, 4pi, length=200)
@@ -596,7 +547,6 @@ There is no `quiver` plot for `plotly` using JavaScript. In $2$-dimensions a tex
```{julia}
#| hold: true
helix(t) = [cos(t), sin(t), t]
helix(t) = [-sin(t), cos(t), 1]
ts = range(0, 4pi, length=200)
@@ -642,7 +592,6 @@ A contour plot is created by the "contour" trace type. The data is prepared as a
```{julia}
#| hold: true
f(x,y) = x^2 - 2y^2
xs = range(0,2,length=25)
@@ -661,7 +610,6 @@ The same `zs` data can be achieved by broadcasting and then collecting as follow
```{julia}
#| hold: true
f(x,y) = x^2 - 2y^2
xs = range(0,2,length=25)
@@ -692,7 +640,6 @@ Surfaces defined through a scalar-valued function are drawn quite naturally, sav
```{julia}
#| hold: true
peaks(x,y) = 3 * (1-x)^2 * exp(-(x^2) - (y+1)^2) -
10*(x/5 - x^3 - y^5) * exp(-x^2-y^2) - 1/3 * exp(-(x+1)^2 - y^2)
@@ -713,7 +660,6 @@ For parametrically defined surfaces, the $x$ and $y$ values also correspond to m
```{julia}
#| hold: true
r, R = 1, 5
X(theta,phi) = [(r*cos(theta)+R)*cos(phi),
(r*cos(theta)+R)*sin(phi),

View File

@@ -601,14 +601,7 @@ det(N)
det(collect(N))
```
Similarly, with `norm`:
```{julia}
norm(v)
```
and
Similarly, with `norm`, which returns a generator unless collected:
```{julia}

3
quarto/basics.qmd Normal file
View File

@@ -0,0 +1,3 @@
# Mathematical basics
This chapter introduces some mathematical basics and their counterparts within the `Julia` programming language.

View File

@@ -0,0 +1,15 @@
[deps]
CalculusWithJulia = "a2e0e22d-7d4c-5312-9169-8b992201a882"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
Measures = "442fdcdd-2543-5da2-b0f3-8c86c306513e"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
PlotlyBase = "a03496cd-edff-5a9b-9e67-9cda94a718b5"
PlotlyKaleido = "f2990250-8cf9-495f-b13a-cce12b45703c"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Primes = "27ebfcd6-29c5-5fa9-bf4b-fb8fc14df3ae"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -0,0 +1 @@
../basics.qmd

View File

@@ -244,7 +244,7 @@ With the Google Calculator, typing `1 + 2 x 3 =` will give the value $7$, but *i
In `Julia`, the entire expression is typed in before being evaluated, so the usual conventions of mathematics related to the order of operations may be used. These are colloquially summarized by the acronym [PEMDAS](http://en.wikipedia.org/wiki/Order_of_operations).
> **PEMDAS**. This acronym stands for Parentheses, Exponents, Multiplication, Division, Addition, Subtraction. The order indicates which operation has higher precedence, or should happen first. This isn't exactly the case, as "M" and "D" have the same precedence, as do "A" and "S". In the case of two operations with equal precedence, *associativity* is used to decide which to do. For the operations `-`, `/` the associativity is left to right, as in the left one is done first, then the right. However, `^` has right associativity, so `4^3^2` is `4^(3^2)` and not `(4^3)^2` (Be warned that some calculators - and spread sheets, such as Excel - will treat this expression with left associativity). But, `+` and `*` don't have associativity, so `1+2+3` can be `(1+2)+3` or `1+(2+3)`.
> **PEMDAS**. This acronym stands for Parentheses, Exponents, Multiplication, Division, Addition, Subtraction. The order indicates which operation has higher precedence, or should happen first. This isn't exactly the case, as "M" and "D" have the same precedence, as do "A" and "S". In the case of two operations with equal precedence, *associativity* is used to decide which to do. For the operations `-`, `/` the associativity is left to right, as in the left one is done first, then the right. However, `^` has right associativity, so `4^3^2` is `4^(3^2)` and not `(4^3)^2` (Be warned that some calculators - and spread sheets, such as Excel - will treat this expression with left associativity). But, `+` and `*` don't have associativity, so `1+2+3` can be `(1+2)+3` or `1+(2+3)`.

View File

Before

Width:  |  Height:  |  Size: 50 KiB

After

Width:  |  Height:  |  Size: 50 KiB

View File

Before

Width:  |  Height:  |  Size: 114 KiB

After

Width:  |  Height:  |  Size: 114 KiB

View File

Before

Width:  |  Height:  |  Size: 10 KiB

After

Width:  |  Height:  |  Size: 10 KiB

16
quarto/basics/make_pdf.jl Normal file
View File

@@ -0,0 +1,16 @@
module Make
# makefile for generating typst pdfs
# per directory usage
dir = "basics"
files = ("calculator",
"variables",
"numbers_types",
"logical_expressions",
"vectors",
"ranges",
)
include("../_make_pdf.jl")
main()
end

View File

@@ -26,7 +26,7 @@ On top of these, we have special subsets, such as the natural numbers $\{1, 2, \
Mathematically, these number systems are naturally nested within each other as integers are rational numbers which are real numbers, which can be viewed as part of the complex numbers.
Calculators typically have just one type of number - floating point values. These model the real numbers. `Julia`, on the other hand, has a rich type system, and within that has serveral different number types. There are types that model each of the four main systems above, and within each type, specializations for how these values are stored.
Calculators typically have just one type of number - floating point values. These model the real numbers. `Julia`, on the other hand, has a rich type system, and within that has several different number types. There are types that model each of the four main systems above, and within each type, specializations for how these values are stored.
Most of the details will not be of interest to all, and will be described later.
@@ -165,7 +165,7 @@ Integers are often used casually, as they come about from parsing. As with a cal
As per IEEE Standard 754, the `Float64` type gives 52 bits to the precision (with an additional implied one), 11 bits to the exponent and the other bit is used to represent the sign. Positive, finite, floating point numbers have a range approximately between $10^{-308}$ and $10^{308}$, as 308 is about $\log_{10} 2^{1023}$. The numbers are not evenly spread out over this range, but, rather, are much more concentrated closer to $0$.
The use of 32-bit floating point values is common, as some widley used computer chips expect this. These values have a narrower range of possible values.
The use of 32-bit floating point values is common, as some widely used computer chips expect this. These values have a narrower range of possible values.
:::{.callout-warning}
## More on floating point numbers
@@ -404,6 +404,42 @@ Though complex numbers are stored as pairs of numbers, the imaginary unit, `im`,
:::
### Strings and symbols
For text, `Julia` has a `String` type. When double quotes are used to specify a string, the parser creates this type:
```{julia}
x = "The quick brown fox jumped over the lazy dog"
typeof(x)
```
Values can be inserted into a string through *interpolation* using a dollar sign.
```{julia}
animal = "lion"
x = "The quick brown $(animal) jumped over the lazy dog"
```
The use of parentheses allows more complicated expressions; it isn't always necessary.
Longer strings can be produced using *triple* quotes:
```{julia}
lincoln = """
Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
"""
```
Strings are comprised of *characters* which can be produced directly using *single* quotes:
```{julia}
'c'
```
We won't use these.
Finally, `Julia` has *symbols* which are *interned* strings which are used as identifiers. Symbols are used for advanced programming techniques; we will only see them as shortcuts to specify plotting arguments.
## Type stability

View File

@@ -1,4 +1,4 @@
# Ranges and Sets
# Ranges and sets
{{< include ../_common_code.qmd >}}
@@ -103,13 +103,13 @@ h = (b-a)/(n-1)
collect(a:h:b)
```
Pretty neat. If we were doing this many times - such as once per plot - we'd want to encapsulate this into a function, for example:
Pretty neat. If we were doing this many times - such as once per plot - we'd want to encapsulate this into a function, for example using a comprehension:
```{julia}
function evenly_spaced(a, b, n)
h = (b-a)/(n-1)
collect(a:h:b)
[a + i*h for i in 0:n-1]
end
```
@@ -131,10 +131,10 @@ It seems to work as expected. But looking just at the algorithm it isn't quite s
```{julia}
1/5 + 2*1/5 # last value
1/5 + 2*1/5 # last value if h is exactly 1/5 or 0.2
```
Floating point roundoff leads to the last value *exceeding* `0.6`, so should it be included? Well, here it is pretty clear it *should* be, but better to have something programmed that hits both `a` and `b` and adjusts `h` accordingly.
Floating point roundoff leads to the last value *exceeding* `0.6`, so should it be included? Well, here it is pretty clear it *should* be, but better to have something programmed that hits both `a` and `b` and adjusts `h` accordingly. Something which isn't subject to the vagaries of `(3/5 - 1/5)/2` not being `0.2`.
Enter the base function `range` which solves this seemingly simple - but not really - task. It can use `a`, `b`, and `n`. Like the range operation, this function returns a generator which can be collected to realize the values.
@@ -144,14 +144,7 @@ The number of points is specified as a third argument (though keyword arguments
```{julia}
xs = range(-1, 1,9)
```
and
```{julia}
collect(xs)
xs = range(1/5, 3/5, 3) |> collect
```
:::{.callout-note}
@@ -266,7 +259,7 @@ Here are decreasing powers of $2$:
[1/2^i for i in 1:10]
```
Sometimes, the comprehension does not produce the type of output that may be expected. This is related to `Julia`'s more limited abilities to infer types at the command line. If the output type is important, the extra prefix of `T[]` can be used, where `T` is the desired type. We will see that this will be needed at times with symbolic math.
Sometimes, the comprehension does not produce the type of output that may be expected. This is related to `Julia`'s more limited abilities to infer types at the command line. If the output type is important, the extra prefix of `T[]` can be used, where `T` is the desired type.
### Generators

View File

@@ -231,7 +231,7 @@ The distinction between ``=`` versus `=` is important and one area where common
## Context
The binding of a value to a variable name happens within some context. When a variable is assigned or referenced, the scope of the variable -- the region of code where it is accessible -- is taken into consideration.
The binding of a value to a variable name happens within some context. When a variable is assigned or referenced, the scope of the variable---the region of code where it is accessible---is taken into consideration.
For our simple illustrations, we are assigning values, as though they were typed at the command line. This stores the binding in the `Main` module. `Julia` looks for variables in this module when it encounters an expression and the value is substituted. Other uses, such as when variables are defined within a function, involve different contexts which may not be visible within the `Main` module.

View File

@@ -1,5 +1,4 @@
# Vectors
# Vectors and containers
{{< include ../_common_code.qmd >}}
@@ -143,8 +142,9 @@ We call the values $x$ and $y$ of the vector $\vec{v} = \langle x,~ y \rangle$ t
Two operations on vectors are fundamental.
* Vectors can be multiplied by a scalar (a real number): $c\vec{v} = \langle cx,~ cy \rangle$. Geometrically this scales the vector by a factor of $\lvert c \rvert$ and switches the direction of the vector by $180$ degrees (in the $2$-dimensional case) when $c < 0$. A *unit vector* is one with magnitude $1$, and, except for the $\vec{0}$ vector, can be formed from $\vec{v}$ by dividing $\vec{v}$ by its magnitude. A vector's two parts are summarized by its direction given by a unit vector **and** its magnitude given by the norm.
* Vectors can be added: $\vec{v} + \vec{w} = \langle v_x + w_x,~ v_y + w_y \rangle$. That is, each corresponding component adds to form a new vector. Similarly for subtraction. The $\vec{0}$ vector then would be just $\langle 0,~ 0 \rangle$ and would satisfy $\vec{0} + \vec{v} = \vec{v}$ for any vector $\vec{v}$. Vector addition, $\vec{v} + \vec{w}$, is visualized by placing the tail of $\vec{w}$ at the tip of $\vec{v}$ and then considering the new vector with tail coming from $\vec{v}$ and tip coming from the position of the tip of $\vec{w}$. Subtraction is different, place both the tails of $\vec{v}$ and $\vec{w}$ at the same place and the new vector has tail at the tip of $\vec{w}$ and tip at the tip of $\vec{v}$.
* *Scalar multiplication*: Vectors can be multiplied by a scalar (a real number): $c\vec{v} = \langle cx,~ cy \rangle$. Geometrically this scales the vector by a factor of $\lvert c \rvert$ and switches the direction of the vector by $180$ degrees (in the $2$-dimensional case) when $c < 0$. A *unit vector* is one with magnitude $1$, and, except for the $\vec{0}$ vector, can be formed from $\vec{v}$ by dividing $\vec{v}$ by its magnitude. A vector's two parts are summarized by its direction given by a unit vector **and** its magnitude given by the norm.
* *Vector addition*: Vectors can be added: $\vec{v} + \vec{w} = \langle v_x + w_x,~ v_y + w_y \rangle$. That is, each corresponding component adds to form a new vector. Similarly for subtraction. The $\vec{0}$ vector then would be just $\langle 0,~ 0 \rangle$ and would satisfy $\vec{0} + \vec{v} = \vec{v}$ for any vector $\vec{v}$. Vector addition, $\vec{v} + \vec{w}$, is visualized by placing the tail of $\vec{w}$ at the tip of $\vec{v}$ and then considering the new vector with tail coming from $\vec{v}$ and tip coming from the position of the tip of $\vec{w}$. Subtraction is different, place both the tails of $\vec{v}$ and $\vec{w}$ at the same place and the new vector has tail at the tip of $\vec{w}$ and tip at the tip of $\vec{v}$.
```{julia}
@@ -334,7 +334,7 @@ Finally, to find an angle $\theta$ from a vector $\langle x,~ y\rangle$, we can
norm(v), atan(y, x) # v = [x, y]
```
## Higher dimensional vectors
### Higher dimensional vectors
Mathematically, vectors can be generalized to more than $2$ dimensions. For example, using $3$-dimensional vectors are common when modeling events happening in space, and $4$-dimensional vectors are common when modeling space and time.
@@ -395,334 +395,6 @@ Whereas, in this example where there is no common type to promote the values to,
["one", 2, 3.0, 4//1]
```
## Indexing
Getting the components out of a vector can be done in a manner similar to multiple assignment:
```{julia}
vs = [1, 2]
v₁, v₂ = vs
```
When the same number of variable names are on the left hand side of the assignment as in the container on the right, each is assigned in order.
Though this is convenient for small vectors, it is far from being so if the vector has a large number of components. However, the vector is stored in order with a first, second, third, $\dots$ component. `Julia` allows these values to be referred to by *index*. This too uses the `[]` notation, though differently. Here is how we get the second component of `vs`:
```{julia}
vs[2]
```
The last value of a vector is usually denoted by $v_n$. In `Julia`, the `length` function will return $n$, the number of items in the container. So `v[length(v)]` will refer to the last component. However, the special keyword `end` will do so as well, when put into the context of indexing. So `v[end]` is more idiomatic. (Similarly, there is a `begin` keyword that is useful when the vector is not $1$-based, as is typical but not mandatory.)
The functions `first` and `last` refer to the first and last components of a collection. An additional argument can be specified to take the first (or last) $n$ components. The function `only` will return the only component of a vector, if it has length $1$ and error otherwise.
:::{.callout-note}
## More on indexing
There is [much more](https://docs.julialang.org/en/v1/manual/arrays/#man-array-indexing) to indexing than just indexing by a single integer value. For example, the following can be used for indexing:
* a scalar integer (as seen)
* a range
* a vector of integers
* a boolean vector
:::
Some add-on packages extend this further.
### Assignment and indexing
Indexing notation can also be used with assignment, meaning it can appear on the left hand side of an equals sign. The following expression replaces the second component with a new value:
```{julia}
vs[2] = 10
```
The value of the right hand side is returned, not the value for `vs`. We can check that `vs` is then $\langle 1,~ 10 \rangle$ by showing it:
```{julia}
#| hold: true
vs = [1,2]
vs[2] = 10
vs
```
The assignment `vs[2]` is different than the initial assignment `vs=[1,2]` in that, `vs[2]=10` **modifies** the container that `vs` points to, whereas `vs=[1,2]` **replaces** any binding for `vs`. The indexed assignment is more memory efficient when vectors are large. This point is also of interest when passing vectors to functions, as a function may modify components of the vector passed to it, though can't replace the container itself.
## Some useful functions for working with vectors.
As mentioned, the `length` function returns the number of components in a vector. It is one of several useful functions for vectors.
The `sum` and `prod` function will add and multiply the elements in a vector:
```{julia}
v1 = [1,1,2,3,5,8]
sum(v1), prod(v1)
```
The `unique` function will throw out any duplicates:
```{julia}
unique(v1) # drop a `1`
```
The functions `maximum` and `minimum` will return the largest and smallest values of an appropriate vector.
```{julia}
maximum(v1)
```
(These should not be confused with `max` and `min` which give the largest or smallest value over all their arguments.)
The `extrema` function returns both the smallest and largest value of a collection:
```{julia}
extrema(v1)
```
Consider now
```{julia}
𝒗 = [1,4,2,3]
```
The `sort` function will rearrange the values in `𝒗`:
```{julia}
sort(𝒗)
```
The keyword argument, `rev=true` can be given to get values in decreasing order:
```{julia}
sort(𝒗, rev=true)
```
For adding a new element to a vector the `push!` method can be used, as in
```{julia}
push!(𝒗, 5)
```
To append more than one value, the `append!` function can be used:
```{julia}
append!(v1, [6,8,7])
```
These two functions modify or mutate the values stored within the vector `𝒗` that passed as an argument. In the `push!` example above, the value `5` is added to the vector of $4$ elements. In `Julia`, a convention is to name mutating functions with a trailing exclamation mark. (Again, these do not mutate the binding of `𝒗` to the container, but do mutate the contents of the container.) There are functions with mutating and non-mutating definitions, an example is `sort` and `sort!`.
If only a mutating function is available, like `push!`, and this is not desired a copy of the vector can be made. It is not enough to copy by assignment, as with `w = 𝒗`. As both `w` and `𝒗` will be bound to the same memory location. Rather, you call `copy` (or sometimes `deepcopy`) to make a new container with copied contents, as in `w = copy(𝒗)`.
Creating new vectors of a given size is common for programming, though not much use will be made here. There are many different functions to do so: `ones` to make a vector of ones, `zeros` to make a vector of zeros, `trues` and `falses` to make Boolean vectors of a given size, and `similar` to make a similar-sized vector (with no particular values assigned).
## Applying functions element by element to values in a vector
Functions such as `sum` or `length` are known as *reductions* as they reduce the "dimensionality" of the data: a vector is in some sense $1$-dimensional, the sum or length are $0$-dimensional numbers. Applying a reduction is straightforward it is just a regular function call.
```{julia}
#| hold: true
v = [1, 2, 3, 4]
sum(v), length(v)
```
Other desired operations with vectors act differently. Rather than reduce a collection of values using some formula, the goal is to apply some formula to *each* of the values, returning a modified vector. A simple example might be to square each element, or subtract the average value from each element. An example comes from statistics. When computing a variance, we start with data $x_1, x_2, \dots, x_n$ and along the way form the values $(x_1-\bar{x})^2, (x_2-\bar{x})^2, \dots, (x_n-\bar{x})^2$.
Such things can be done in *many* different ways. Here we describe two, but will primarily utilize the first.
### Broadcasting a function call
If we have a vector, `xs`, and a function, `f`, to apply to each value, there is a simple means to achieve this task. By adding a "dot" between the function name and the parenthesis that enclose the arguments, instructs `Julia` to "broadcast" the function call. The details allow for more flexibility, but, for this purpose, broadcasting will take each value in `xs` and apply `f` to it, returning a vector of the same size as `xs`. When more than one argument is involved, broadcasting will try to fill out different sized objects.
For example, the following will find, using `sqrt`, the square root of each value in a vector:
```{julia}
xs = [1, 1, 3, 4, 7]
sqrt.(xs)
```
This would find the sine of each number in `xs`:
```{julia}
sin.(xs)
```
For each function, the `.(` (and not `(`) after the name is the surface syntax for broadcasting.
The `^` operator is an *infix* operator. Infix operators can be broadcast, as well, by using the form `.` prior to the operator, as in:
```{julia}
xs .^ 2
```
Here is an example involving the logarithm of a set of numbers. In astronomy, a logarithm with base $100^{1/5}$ is used for star [brightness](http://tinyurl.com/ycp7k8ay). We can use broadcasting to find this value for several values at once through:
```{julia}
ys = [1/5000, 1/500, 1/50, 1/5, 5, 50]
base = (100)^(1/5)
log.(base, ys)
```
Broadcasting with multiple arguments allows for mixing of vectors and scalar values, as above, making it convenient when parameters are used.
As a final example, the task from statistics of centering and then squaring can be done with broadcasting. We go a bit further, showing how to compute the [sample variance](http://tinyurl.com/p6wa4r8) of a data set. This has the formula
$$
\frac{1}{n-1}\cdot ((x_1-\bar{x})^2 + \cdots + (x_n - \bar{x})^2).
$$
This can be computed, with broadcasting, through:
```{julia}
#| hold: true
import Statistics: mean
xs = [1, 1, 2, 3, 5, 8, 13]
n = length(xs)
(1/(n-1)) * sum(abs2.(xs .- mean(xs)))
```
This shows many of the manipulations that can be made with vectors. Rather than write `.^2`, we follow the definition of `var` and chose the possibly more performant `abs2` function which, in general, efficiently finds $|x|^2$ for various number types. The `.-` uses broadcasting to subtract a scalar (`mean(xs)`) from a vector (`xs`). Without the `.`, this would error.
:::{.callout-note}
## Note
The `map` function is very much related to broadcasting and similarly named functions are found in many different programming languages. (The "dot" broadcast is mostly limited to `Julia` and mirrors a similar usage of a dot in `MATLAB`.) For those familiar with other programming languages, using `map` may seem more natural. Its syntax is `map(f, xs)`.
:::
### Comprehensions
In mathematics, set notation is often used to describe elements in a set.
For example, the first $5$ cubed numbers can be described by:
$$
\{x^3: x \text{ in } 1, 2,\dots, 5\}
$$
Comprehension notation is similar. The above could be created in `Julia` with:
```{julia}
xs = [1,2,3,4,5]
[x^3 for x in xs]
```
Something similar can be done more succinctly:
```{julia}
xs .^ 3
```
However, comprehensions have a value when more complicated expressions are desired as they work with an expression of `xs`, and not a pre-defined or user-defined function.
Another typical example of set notation might include a condition, such as, the numbers divisible by $7$ between $1$ and $100$. Set notation might be:
$$
\{x: \text{rem}(x, 7) = 0 \text{ for } x \text{ in } 1, 2, \dots, 100\}.
$$
This would be read: "the set of $x$ such that the remainder on division by $7$ is $0$ for all x in $1, 2, \dots, 100$."
In `Julia`, a comprehension can include an `if` clause to mirror, somewhat, the math notation. For example, the above would become (using `1:100` as a means to create the numbers $1,2,\dots, 100$, as will be described in an upcoming section):
```{julia}
[x for x in 1:100 if rem(x,7) == 0]
```
Comprehensions can be a convenient means to describe a collection of numbers, especially when no function is defined, but the simplicity of the broadcast notation (just adding a judicious ".") leads to its more common use in these notes.
##### Example: creating a "T" table for creating a graph
The process of plotting a function is usually first taught by generating a "T" table: values of $x$ and corresponding values of $y$. These pairs are then plotted on a Cartesian grid and the points are connected with lines to form the graph. Generating a "T" table in `Julia` is easy: create the $x$ values, then create the $y$ values for each $x$.
To be concrete, let's generate $7$ points to plot $f(x) = x^2$ over $[-1,1]$.
The first task is to create the data. We will soon see more convenient ways to generate patterned data, but for now, we do this by hand:
```{julia}
a, b, n = -1, 1, 7
d = (b-a) // (n-1)
xs = [a, a+d, a+2d, a+3d, a+4d, a+5d, a+6d] # 7 points
```
To get the corresponding $y$ values, we can use a compression (or define a function and broadcast):
```{julia}
ys = [x^2 for x in xs]
```
Vectors can be compared together by combining them into a separate container, as follows:
```{julia}
[xs ys]
```
(If there is a space between objects they are horizontally combined. In our construction of vectors using `[]` we used a comma for vertical combination. More generally we should use a `;` for vertical concatenation.)
In the sequel, we will typically use broadcasting for this task using two steps: one to define a function the second to broadcast it.
:::{.callout-note}
## Note
The style generally employed here is to use plural variable names for a collection of values, such as the vector of $y$ values and singular names when a single value is being referred to, leading to expressions like "`x in xs`".
:::
## Other container types
We end this section with some general comments that are for those interested in a bit more, but in general aren't needed to understand most all of what follows later.
@@ -761,12 +433,16 @@ Tuples are fixed-length containers where there is no expectation or enforcement
While a vector is formed by placing comma-separated values within a `[]` pair (e.g., `[1,2,3]`), a tuple is formed by placing comma-separated values within a `()` pair. A tuple of length $1$ uses a convention of a trailing comma to distinguish it from a parenthesized expression (e.g. `(1,)` is a tuple, `(1)` is just the value `1`).
Vectors and tuples can appear at the same time: a vector of tuples---each of length $n$---can be used in plotting to specify points.
:::{.callout-note}
## Well, actually...
Technically, the tuple is formed just by the use of commas, which separate different expressions. The parentheses are typically used, as they clarify the intent and disambiguate some usage. In a notebook interface, it is useful to just use commas to separate values to output, as typically the only the last command is displayed. This usage just forms a tuple of the values and displays that.
:::
#### Named tuples
There are *named tuples* where each component has an associated name. Like a tuple these can be indexed by number and unlike regular tuples also by name.
@@ -795,12 +471,14 @@ The values in a named tuple can be accessed using the "dot" notation:
nt.x1
```
Alternatively, the index notation -- using a *symbol* for the name -- can be used:
Alternatively, the index notation---using a *symbol* for the name---can be used:
```{julia}
nt[:x1]
```
(Indexing is described a bit later, but it is a way to pull elements out of a collection.)
Named tuples are employed to pass parameters to functions. To find the slope, we could do:
```{julia}
@@ -820,11 +498,15 @@ x1 - x0
:::
### Associative arrays
### Pairs, associative arrays
Named tuples associate a name (in this case a symbol) to a value. More generally an associative array associates to each key a value, where the keys and values may be of different types.
The `pair` notation, `key => value`, is used to make one association. A *dictionary* is used to have a container of associations. For example, this constructs a simple dictionary associating a spelled out name with a numeric value:
The `pair` notation, `key => value`, is used to make one association between the first and second value.
A *dictionary* is used to have a container of associations.
This example constructs a simple dictionary associating a spelled out name with a numeric value:
```{julia}
d = Dict("one" => 1, "two" => 2, "three" => 3)
@@ -842,7 +524,449 @@ d["two"]
Named tuples are associative arrays where the keys are restricted to symbols. There are other types of associative arrays, specialized cases of the `AbstractDict` type with performance benefits for specific use cases. In these notes, dictionaries appear as output in some function calls.
Unlike vectors and tuples, dictionaries are not currently supported by broadcasting. This causes no loss in usefulness, as the values can easily be iterated over, but the convenience of the dot notation is lost.
Unlike vectors and tuples, dictionaries are not currently supported by broadcasting. (To be described in the next section.) This causes no loss in usefulness, as the values can easily be iterated over, but the convenience of the dot notation is lost.
## The container interface in Julia
There are numerous generic functions for working across the many different types of containers. Some are specific to containers which can be modified, some to associative arrays. But it is expected for different container types to implement as many as possible. We list a few here for completeness. Only a few will be used in these notes.
### Indexing
Vectors have an implied order: first element, second, last, etc. Tuples do as well. Matrices have two orders: by a row-column pair or by linear order where the first column precedes the second etc. Arrays are similar in that they have a linear order and can be accessed by their individual dimensions.
To access an element in a vector, say the second, the underlying `getindex` function is used. This is rarely typed, as the `[` notation is used. This notation is used in a style similar to a function call, the indexes go between matching pairs.
For example, we create a vector, tuple, and matrix:
```{julia}
v = [1,2,3,4]
t = (1,2,3,4)
m = [1 2; 3 4]
```
The second element of each is accessed similarly:
```{julia}
v[2], t[2], m[2]
```
(All of `v`, `t`, and `m` have $1$-based indexing.)
There is special syntax to reference the last index when used within the square braces:
```{julia}
v[end], t[end], m[end]
```
The last element is also returned by `last`:
```{julia}
last(v), last(t), last(m)
```
These use `lastindex` behind the scenes. There is also a `firstindex` which is associated with the `first` method:
```{julia}
first(v), first(t), first(m)
```
For indexing by a numeric index, a container of numbers may be used. Containers can be generated different ways, here we just use a vector to get the second and third elements:
```{julia}
I = [2,3]
v[I], t[I], m[I]
```
When indexing by a vector, the value will not be a scalar, even if there is only one element indicated.
Indexing can also be done by a mask of Boolean values with a matching length. This following mask should do the same as indexing by `I` above:
```{julia}
J = [false, true, true, false]
v[J], t[J], m[J]
```
For the matrix, values can be referenced by row/column values. The following will extract the second row, first column:
```{julia}
m[2, 1]
```
*If* a container has *only* one entry, then the `only` method will return that element (not within the container). Here we use a tuple to illustrate to emphasize the trailing comma in construction:
```{julia}
s = ("one", )
```
```{julia}
only(s)
```
There will be an error with `only` should the container not have just one element.
### Mutating values
Vectors and matrices can have their elements changed or mutated; tuples can not. The process is similar to assignment---using an equals sign---but the left hand side has indexing notation to reference which values within the container are to be updated.
To change the last element of `v` to `0` we have:
```{julia}
v[end] = 0
v
```
We might read this as assignment, but what happens is the underlying container has an element indicated by the index mutated. The `setindex!` function is called behind the scenes.
The `setindex!` function will try to promote the value (`0` above) to the element type of the container. This can throw an error if the promotion isn't possible. For example, to specify an element as `missing` with `v[end] = missing` will error, as missing can't be promoted to an integer.
If more than one value is referenced in the assignment, then more than one value can be specified on the right-hand side.
Mutation is different from reassignment. A command like `v=[1,2,3,0]` would have had the same effect as `v[end] = 0`, but would be quite different. The first *replaces* the binding to `v` with a new container, the latter reaches into the container and replaces just a value it holds.
### Size and type
The `length` of a container is the number of elements in linear order:
```{julia}
length(v), length(t), length(m)
```
The `isempty` method will indicate if the length is 0, perhaps in a performant way:
```{julia}
isempty(v), isempty([]), isempty(t), isempty(())
```
The `size` of a container, when defined, takes into account its shape or the dimensions:
```{julia}
size(v), size(m) # no size defined for tuples
```
Arrays, and hence vectors and matrices have an element type given by `eltype` (the `typeof` method returns the container type:
```{julia}
eltype(v), eltype(t), eltype(m)
```
(The element type of the tuple is `Int64`, but this is only because of this particular tuple. Tuples are typically heterogeneous containers---not homogeneous like vectors---and do not expect to have a common type. The `NTuple` type is for tuples with elements of the same type.)
### Modifying the length of a container
Vectors and some other containers allow elements to be added on or elements to be taken off. In computer science a queue is a collection that is ordered and has addition at one or the other end. Vectors can be used as a queue, though for just that task, there are more performant structures available.
Two key methods for queues are `push!` and `pop!`. We `push!` elements onto the end of the queue:
```{julia}
push!(v, 5)
```
The output is expected---`5` was added to the end of `v`. What might not be expected is the underlying `v` is changed without assignment. (Actually `mutated`, the underlying container assigned to the symbol `v` is extended, not replaced.)
:::{.callout-note}
## Trailing exclamation point convention
The function `push!` has a trailing exclamation point which is a `Julia` convention to indicate one of the underlying arguments (traditionally the first) will be *mutated* by the function call.
:::
The `pop!` function is somewhat of a reverse: it takes the last element and "pops" it off the queue, leaving the queue one element shorter and returning the last element.
```{julia}
pop!(v)
```
```{julia}
v
```
There are also `pushfirst!`, `popfirst!`, `insert!` and `deleteat!` methods.
### Iteration
A very fundamental operation is to iterate over the elements of a collection one by one.
In computer science the `for` loop is the basic construct to iterate over values. This example will iterate over `v` and add each value to `tot` which is initialized to be `0`:
```{julia}
tot = 0
for e in v
tot = tot + e
end
tot
```
The `for` loop construct is central in many programming languages; in `Julia` for loops are very performant and very flexible, however, they are more verbose than needed. (In the above example we had to initialize an accumulator and then write three lines for the loop, whereas `sum(v)` would do the same---and in this case more flexibly, with just a single call.) Alternatives are usually leveraged---we mention a few.
Iterating over a vector can be done by *value*, as above, or by *index*. For the latter the `eachindex` method creates an iterable for the indices of the container. For rectangular objects, like matrices, there are also many uses for `eachrow` and `eachcol`, though not in these notes.
There are a few basic patterns where alternatives to a `for` loop exist. We discuss two:
* mapping a function or expression over each element in the collection
* a reduction where a larger dimensional object is summarized by a lower dimensional one. In the example above, the $1$-dimensional vector is reduced to a $0$-dimensional scalar by summing the elements.
#### Comprehensions
In mathematics, set notation is often used to describe elements in a set.
For example, the first $5$ cubed numbers can be described by:
$$
\{x^3: x \text{ in } 1, 2,\dots, 5\}
$$
Comprehension notation is similar. The above could be created in `Julia` with:
```{julia}
xs = [1, 2, 3, 4, 5]
[x^3 for x in xs]
```
Comprehensions are one way of iterating over a collection and evaluating an expression on each element.
In the above, the value `x` takes on each value in `xs`. The variables may be tuples, as well.
The `enumerate` method wraps a container (or iterable) and iterates both the index and the value. This is useful, say for polynomials:
```{julia}
x = 3
as = [1, 2, 3] # evaluate a₀⋅x⁰, a₁⋅x¹, a₂⋅x²
[a*x^(i-1) for (i, a) in enumerate(as)]
```
(These values can then be easily summed to evaluate the polynomial.)
When iterating over `enumerate` a tuple is returned. The use of `(i, a)` to iterate over these tuples destructures the tuple into parts to be used in the expression.
The `zip` function also is useful to *pair* off iterators. Redoing the above to have the powers iterated over:
```{julia}
as = [1, 2, 3]
inds = [0, 1, 2]
[a*x^i for (i, a) in zip(inds, as)]
```
Like `enumerate`, the `zip` iterator has elements which are tuples.
:::{.callout-note}
## Note
The style generally employed herein is to use plural variable names for a collection of values, such as the vector of $y$ values and singular names when a single value is being referred to, leading to expressions like "`x in xs`".
:::
#### Broadcasting a function call
If we have a vector, `xs`, and a function, `f`, to apply to each value, there is a simple means to achieve this task that is shorter than a `for` loop or the comprehension `[f(x) for x in s]`. By adding a "dot" between the function name and the parenthesis that enclose the arguments, instructs `Julia` to "broadcast" the function call. The details allow for more much flexibility, but, for this purpose, broadcasting will take each value in `xs` and apply `f` to it, returning a vector of the same size as `xs`. When more than one argument is involved, broadcasting will try to pad out different sized objects to the same shape. Broadcasting can also *fuse* combined function calls.
For example, the following will find, using `sqrt`, the square root of each value in a vector:
```{julia}
xs = [1, 1, 3, 4, 7]
sqrt.(xs)
```
This call finds the sine of each number in `xs`:
```{julia}
sin.(xs)
```
For each function call, the `.(` (and not `(`) after the name is the surface syntax for broadcasting.
The `^` operator is an *infix* operator. Infix operators can be broadcast, as well, by using the form `.` prior to the operator, as in:
```{julia}
xs .^ 2
```
Here is an example involving the logarithm of a set of numbers. In astronomy, a logarithm with base $100^{1/5}$ is used for star [brightness](http://tinyurl.com/ycp7k8ay). We can use broadcasting to find this value for several values at once through:
```{julia}
ys = [1/5000, 1/500, 1/50, 1/5, 5, 50]
base = (100)^(1/5)
log.(base, ys)
```
Broadcasting with multiple arguments allows for mixing of vectors and scalar values, as above, making it convenient when parameters are used. In broadcasting, there are times where it is desirable to treat a container as a scalar-like argument, a common idiom is to wrap that container in a 1-element tuple.
As a final example, the task from statistics of centering and then squaring can be done with broadcasting. We go a bit further, showing how to compute the [sample variance](http://tinyurl.com/p6wa4r8) of a data set. This has the formula
$$
\frac{1}{n-1}\cdot ((x_1-\bar{x})^2 + \cdots + (x_n - \bar{x})^2).
$$
This can be computed, with broadcasting, through:
```{julia}
#| hold: true
import Statistics: mean
xs = [1, 1, 2, 3, 5, 8, 13]
n = length(xs)
(1/(n-1)) * sum(abs2.(xs .- mean(xs)))
```
This shows many of the manipulations that can be made with vectors. Rather than write `.^2`, we follow the definition of `var` and chose the possibly more performant `abs2` function which, in general, efficiently finds $|x|^2$ for various number types. The `.-` uses broadcasting to subtract a scalar (`mean(xs)`) from a vector (`xs`). Without the `.`, this would error.
Broadcasting is a widely used and powerful surface syntax which we will employ occasionally in the sequel.
#### Mapping a function over a collection
The `map` function is very much related to broadcasting. Similarly named functions are found in many different programming languages. (The "dot" broadcast is mostly limited to `Julia` and mirrors a similar usage of a dot in `MATLAB`.) For those familiar with other programming languages, using `map` may seem more natural. Its syntax is `map(f, xs)`. There may be one or more iterable passed to `map`.
For example, this will map `sin` over each value in `xs`, computing the same things as `sin.(xs)`:
```{julia}
map(sin, xs)
```
The `map` function can be used with one or more iterators.
The `map` function can also be used in combination with `reduce`, a reduction. Reductions take a container with one or more dimensions and reduces the number of dimensions. A example might be:
```{julia}
sum(map(sin, xs))
```
This has a performance drawback---there are two passes through the container, one to apply `sin` another to add.
The `mapreduce` function combines the map and reduce operations in one pass. It takes a third argument to reduce by in the second position. This is a *binary* operator. So this combination will map `sin` over `xs` and then add the results up:
```{julia}
mapreduce(sin, +, xs)
```
There are other specialized reduction functions that reverse the order of the mapper and the reducer. For example, we have `sum` (used above) and `prod` for adding and multiplying values in a collection:
```{julia}
sum(xs), prod(xs)
```
These are reductions, which which fall back to a `mapreduce` call. They require a starting value (`init`) of `0` and `1` (which in this case can be determined from `xs`). The `sum` and `prod` function also allow as a first argument an initial function to map over the collection:
```{julia}
sum(sin, xs)
```
#### Other reductions
There are other reductions, which summarize a container. We mention those related to the maximum or minimum of a collection. For these examples, we have
```{julia}
v = [1, 2, 3, 4]
```
The largest value in a numeric collection is returned by `maximum`:
```{julia}
maximum(v)
```
Where this maximum occurred is returned by `argmax`:
```{julia}
argmax(v)
```
For `v` these are the same. But if we were to apply `sin` to `v` say, then the result may not be in order. This can be done with, say, a call to `map` and then `maximum`, but the functions allow an initial function to be specified:
```{julia}
maximum(sin, v), argmax(sin, v)
```
This combination is also the duty of `findmax`:
```{julia}
findmax(sin, v)
```
There are also `minimum`, `argmin`, and `findmin`.
The `extrema` function returns the maximum and minimum of the collection:
```{julia}
extrema(v)
```
:::{.callout-note}
## `maximum` and `max`
In `Julia` there are two related functions: `maximum` and `max`. The `maximum` function generically returns the largest element in a collection. The `max` function returns the maximum of its *arguments*.
That is, these return identical values:
```{julia}
xs = [1, 3, 2]
maximum(xs), max(1, 3, 2), max(xs...)
```
The latter using *splatting* to iterate over each value in `xs` and pass it to `max` as an argument.
:::
### Predicate functions
A few reductions work with *predicate* functions---those that return `true` or `false`. Let's use `iseven` as an example, which tests if a number is even.
We can check if *all* the elements of a container are even or if *any* of the elements of a container are even with `all` and `even`:
```{julia}
xs = [1, 1, 2, 3, 5]
all(iseven, xs), any(iseven, xs)
```
Related, we can count the number of `true` responses of the predicate function:
```{julia}
count(iseven, xs)
```
#### methods for associative arrays
For dictionaries, the collection is unordered (by default), but iteration can still be done over "key-value" pairs.
In `Julia` a `Pair` matches a key and a value into one entity. Pairs are made with the `=>` notation with the `key` on the left and the value on the right.
Dictionaries are a collection of pairs. The `Dict` constructor can be passed pairs directly:
```{julia}
ascii = Dict("a"=>97, "b"=>98, "c"=>99) # etc.
```
To iterate over these, the `pairs` iterator is useful:
```{julia}
collect(pairs(ascii))
```
(We used `collect` to iterate over values and return them as a vector.)
The keys are returned by `keys`, the values by `values`:
```{julia}
keys(ascii)
```
@@ -993,6 +1117,7 @@ From [transum.org](http://www.transum.org/Maths/Exam/Online_Exercise.asp?Topic=V
#| hold: true
#| echo: false
let
gr()
p = plot(xlim=(0,10), ylim=(0,5), legend=false, framestyle=:none)
for j in (-3):10
plot!(p, [j, j + 5], [0, 5*sqrt(3)], color=:blue, alpha=0.5)
@@ -1058,6 +1183,12 @@ answ = 4
radioq(choices, answ)
```
```{julia}
#| echo: false
plotly()
nothing
```
###### Question

View File

@@ -1,4 +1,4 @@
# Curve Sketching
# Curve sketching
{{< include ../_common_code.qmd >}}

View File

@@ -230,13 +230,22 @@ function secant_line_tangent_line_graph(n)
xs = range(0, stop=pi, length=50)
fig_size=(800, 600)
plt = plot(f, 0, pi, legend=false, size=fig_size,
line=(2,),
axis=([],false),
plt = plot(;
xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false,
ylims=(-.1,1.5)
)
plot!([0, 1.1* pi],[0,0], line=(3, :black))
plot!([0, 0], [0,2*1], line=(3, :black))
plot!(f, 0, pi/2; line=(:black, 2))
plot!(f, pi/2, pi/2 + pi/5; line=(:black, 2, 1/4))
plot!(f, pi/2 + pi/5, pi; line=(:black, 2))
plot!(0.1 .+ [0,0],[-.1, 1.5]; line=(:gray,1), arrow=true, side=:head)
plot!([-0.2, 3.4], [.1, .1]; line=(:gray, 1), arrow=true, side=:head)
plot!(plt, xs, f(c) .+ cos(c)*(xs .- c), color=:orange)
plot!(plt, xs, f(c) .+ m*(xs .- c), color=:black)
@@ -244,8 +253,10 @@ function secant_line_tangent_line_graph(n)
plot!(plt, [c, c+h, c+h], [f(c), f(c), f(c+h)], color=:gray30)
annotate!(plt, [(c+h/2, f(c), text("h", :top)),
(c + h + .05, (f(c) + f(c + h))/2, text("f(c+h) - f(c)", :left))
annotate!(plt, [(c+h/2, f(c), text(L"h", :top)),
(c + h + .05, (f(c) + f(c + h))/2, text(L"f(c+h) - f(c)", :left)),
])
plt
@@ -258,7 +269,7 @@ The slope of each secant line represents the *average* rate of change between $c
n = 5
n = 6
anim = @animate for i=0:n
secant_line_tangent_line_graph(i)
end
@@ -279,11 +290,59 @@ $$
We will define the tangent line at $(c, f(c))$ to be the line through the point with the slope from the limit above - provided that limit exists. Informally, the tangent line is the line through the point that best approximates the function.
::: {#fig-tangent_line_approx_graph}
```{julia}
#| echo: false
gr()
let
function make_plot(Δ)
f(x) = 1 + sin(x-c)
df(x) = cos(x-c)
plt = plot(;
#xaxis=([], false),
yaxis=([], false),
aspect_ratio=:equal,
legend=false,
)
c = 1
xticks!([c-Δ, c, c+Δ], [latexstring("c-$Δ"), L"c", latexstring("c-$Δ")])
y₀ = f(c) - 2/3 * Δ
tl(x) = f(c) + df(c) * (x-c)
plot!(f, c - Δ, c + Δ; line=(:black, 2))
plot!(tl, c - Δ, c + Δ; line=(:red, 2))
plot!([c,c], [tl(c-Δ), f(c)]; line=(:gray, :dash, 1))
#plot!([c-1.1*Δ, c+1.1*Δ], y₀ .+ [0,0]; line=(:gray, 1), arrow=true)
current()
end
ps = make_plot.((1.5, 1.0, 0.5, 0.1))
plot(ps...)
end
```
Illustration that the tangent line is the best linear approximation *near* $c$.
:::
```{julia}
#| echo: false
plotly()
nothing
```
```{julia}
#| hold: true
#| echo: false
#| cache: true
#| eval: false
gr()
function line_approx_fn_graph(n)
f(x) = sin(x)

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

View File

@@ -50,7 +50,7 @@ A parallel definition with $a < b$ implying $f(a) > f(b)$ would be used for a *s
We can try and prove these properties for a function algebraically we'll see both are related to the zeros of some function. However, before proceeding to that it is usually helpful to get an idea of where the answer is using exploratory graphs.
We will use a helper function, `plotif(f, g, a, b)` that plots the function `f` over `[a,b]` highlighting the regions in the domain when `g` is non-negative. Such a function is defined for us in the accompanying `CalculusWithJulia` package, which has been previously been loaded.
We will use a helper function, `plotif(f, g, a, b)` that plots the function `f` over `[a,b]` highlighting the regions in the domain when `g` is non-negative. Such a function is defined for us in the accompanying `CalculusWithJulia` package, which has been previously loaded.
To see where a function is positive, we simply pass the function object in for *both* `f` and `g` above. For example, let's look at where $f(x) = \sin(x)$ is positive:
@@ -475,22 +475,22 @@ Let's look at the function $x^2 \cdot e^{-x}$ for positive $x$. A quick graph sh
```{julia}
h(x) = x^2 * exp(-x)
plotif(h, h'', 0, 8)
g(x) = x^2 * exp(-x)
plotif(g, g'', 0, 8)
```
From the graph, we would expect that the second derivative - which is continuous - would have two zeros on $[0,8]$:
```{julia}
ips = find_zeros(h'', 0, 8)
ips = find_zeros(g'', 0, 8)
```
As well, between the zeros we should have the sign pattern `+`, `-`, and `+`, as we verify:
```{julia}
sign_chart(h'', 0, 8)
sign_chart(g'', 0, 8)
```
### Second derivative test
@@ -744,6 +744,90 @@ choices=[
answ = 3
radioq(choices, answ)
```
###### Question
The function
$$
f(x) =
\begin{cases}
\frac{x}{2} + x^2 \sin(\frac{\pi}{x}) & x \neq 0\\
0 & x = 0
\end{cases}
$$
is graphed below over $[-1/3, 1/3]$.
```{julia}
#| echo: false
plt = let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
## f'(0) > 0 but not increasing
f(x) = x/2 + x^2 * sinpi(1/x)
g(x) = x/2 - x^2
a, b = -1/3, 1/3
xs = range(a, b, 10_000)
ys = f.(xs)
y0,y1 = extrema(ys)
plot(; empty_style..., aspect_ratio=:equal)
plot!([a,b],[0,0]; axis_style...)
plot!([0,0], [y0,y1]; axis_style...)
plot!(xs, f.(xs); line=(:black, 1))
plot!(xs, x -> x/2 + x^2; line=(:gray, 1, :dot))
plot!(xs, x -> x/2 - x^2; line=(:gray, 1, :dot))
plot!(xs, x -> x/2; line=(:gray, 1))
a1 = (1/4 + 1/5)/2
a2 = -(1*1/3 + 4*1/4)/5
annotate!([
(a1, g(a1), text(L"\frac{x}{2} - x^2", 10, :top)),
(a1, f(a1), text(L"\frac{x}{2} + x^2", 10, :bottom)),
(-1/6, f(1/6), text(L"\frac{x}{2} + x^2\sin(\frac{\pi}{x})", 10, :bottom))
])
plot!([-1/6, -1/13.5], [f(1/6), f(-1/13.5)]; axis_style...)
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
This function has a derivative at $0$ that is *positive*
```{julia}
f(x) = x == 0 ? 0 : x/2 + x^2 * sinpi(1/x)
@syms h
limit((f(0+h) - f(0))/h, h=>0; dir="+-")
```
Is the function increasing **around** $0$?
(The derivative away from $0$ is given by:
```{julia}
@syms x
diff(f(x), x)
```
```{julia}
#| echo: false
choices = ["Yes", "No"]
answer = 1
buttonq(choices, answer; explanation=raw"""
The slope of the tangent line away from $0$ oscillates from positive to negative at every rational number of the form $1/n$ due to the $\cos(\pi/x)$ term, so it is neither going just up or down around $0$. (This example comes from @Angenent.)
""")
```
###### Question
@@ -779,21 +863,30 @@ Consider the following figure of a graph of $f$:
```{julia}
#| echo: false
ex(x) = x * tanh(exp(x))
a, b = -5, 1
plot(ex, a, b, legend=false,
axis=([], false),
color = :royalblue
)
plot!([a-.1, b+.1], [0,0], line=(3, :black))
let
gr()
ex(x) = x * tanh(exp(x))
a, b = -5, 1
plot(ex, a, b, legend=false,
axis=([], false),
line=(:black, 2)
)
plot!([a-.1, b+.1], [0,0], line=(:gray,1), arrow=true, side=:head)
zs = find_zeros(ex, (a, b))
cps = find_zeros(ex', (a, b))
ips = find_zeros(ex'', (a, b))
zs = find_zeros(ex, (a, b))
cps = find_zeros(ex', (a, b))
ips = find_zeros(ex'', (a, b))
scatter!(zs, ex.(zs), marker=(5, "black", :circle))
scatter!(cps, ex.(cps), marker=(5, "forestgreen", :diamond))
scatter!(ips, ex.(ips), marker=(5, :brown3, :star5))
scatter!(zs, ex.(zs), fill=(:black,), marker=(8, :circle))
scatter!(cps, ex.(cps), fill=(:green,), marker=(8, :diamond))
scatter!(ips, ex.(ips), fill=(:brown3,), marker=(8,:star5))
end
```
```{julia}
#| echo: false
plotly()
nothing
```
The black circle denotes what?

View File

@@ -1,4 +1,4 @@
# Implicit Differentiation
# Implicit differentiation
{{< include ../_common_code.qmd >}}
@@ -93,6 +93,42 @@ In general though, we may not be able to solve for $y$ in terms of $x$. What the
The idea is to *assume* that $y$ is representable by some function of $x$. This makes sense, moving on the curve from $(x,y)$ to some nearby point, means changing $x$ will cause some change in $y$. This assumption is only made *locally* - basically meaning a complicated graph is reduced to just a small, well-behaved, section of its graph.
::: {#fig-well-behaved-section}
```{julia}
#| echo: false
let
gr()
a = 1
k = 2
F(x,y) = a * (x + a)*(x^2 + y^2) - k*x^2
xs = range(-3/2, 3/2, 100)
ys = range(-2, 2, 100)
contour(xs, ys, F; levels=[0],
axis=([], nothing),
line=(:black, 1),
framestyle=:none, legend=false)
x₀, y₀ = 3/4, -0.2834733547569205
m = (-a^2*x₀ - 3*a*x₀^2/2 - a*y₀^2/2 + k*x₀)/(a*(a + x₀)*y₀)
plot!(x -> y₀ + m*(x - x₀), x₀-0.5, x₀ + 0.5; line=(:gray, 2))
plot!(x -> -x*sqrt(-(a^2 + a*x - k)/(a*(a + x))), -1/8,0.99;
line=(:black,4))
scatter!([x₀], [y₀]; marker=(:circle,5,:yellow))
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Graph of an equation with a well behaved section emphasized. The tangent line can be found by finding a formula for this well behaved section and differentiating *or* by implicit differentiation simply by assuming a form for the implicit function.
:::
With this assumption, asking what $dy/dx$ is has an obvious meaning - what is the slope of the tangent line to the graph at $(x,y)$. (The assumption eliminates the question of what a tangent line would mean when a graph self intersects.)
@@ -120,7 +156,7 @@ This says the slope of the tangent line depends on the point $(x,y)$ through the
As a check, we compare to what we would have found had we solved for $y= \sqrt{1 - x^2}$ (for $(x,y)$ with $y \geq 0$). We would have found: $dy/dx = 1/2 \cdot 1/\sqrt{1 - x^2} \cdot (-2x)$. Which can be simplified to $-x/y$. This should show that the method above - assuming $y$ is a function of $x$ and differentiating - is not only more general, but can even be easier.
The name - *implicit differentiation* - comes from the assumption that $y$ is implicitly defined in terms of $x$. According to the [Implicit Function Theorem](http://en.wikipedia.org/wiki/Implicit_function_theorem) the above method will work provided the curve has sufficient smoothness near the point $(x,y)$.
The name - *implicit differentiation* - comes from the assumption that $y$ is implicitly defined in terms of $x$. According to the [Implicit Function Theorem](http://en.wikipedia.org/wiki/Implicit_function_theorem) the above method will work provided the curve has sufficient smoothness near the point $(x,y)$. (Continuously differentiable and non vanishing derivative in $y$.)
##### Examples
@@ -140,10 +176,16 @@ For $a = 2, b=1$ we have the graph:
#| hold: true
a, b = 2, 1
f(x,y) = x^2*y + a * b * y - a^2 * x
implicit_plot(f)
implicit_plot(f; legend=false)
x₀, y₀ = 0, 0
m = (a^2 - 2x₀*y₀) / (a*b + x₀^2)
plot!(x -> y₀ + m*(x - x₀), -1, 1)
```
We can see that at each point in the viewing window the tangent line exists due to the smoothness of the curve. Moreover, at a point $(x,y)$ the tangent will have slope $dy/dx$ satisfying:
To the plot we added a tangent line at $(0,0)$.
We can see that at each point in the viewing window the tangent line exists due to the smoothness of the curve. To find the slope of the tangent line at a point $(x,y)$ the tangent line will have slope $dy/dx$ satisfying:
$$
@@ -177,7 +219,7 @@ A graph for $a=3$ shows why it has the name it does:
#| hold: true
a = 3
f(x,y) = x^4 - a^2*(x^2 - y^2)
implicit_plot(f)
implicit_plot(f; xticks=-5:5)
```
The tangent line at $(x,y)$ will have slope, $dy/dx$ satisfying:
@@ -341,7 +383,7 @@ The next step is solve for $dy/dx$ - the lone answer to the linear equation - wh
```{julia}
dydx = diff(u(x), x)
ex3 = solve(ex2, dydx)[1] # pull out lone answer with [1] indexing
ex3 = only(solve(ex2, dydx)) # pull out the only answer
```
As this represents an answer in terms of `u(x)`, we replace that term with the original variable:
@@ -369,9 +411,9 @@ Let $a = b = c = d = 1$, then $(1,4)$ is a point on the curve. We can draw a tan
```{julia}
H = ex(a=>1, b=>1, c=>1, d=>1)
x0, y0 = 1, 4
𝒎 = dydx₁(x=>1, y=>4, a=>1, b=>1, c=>1, d=>1)
m = dydx₁(x=>1, y=>4, a=>1, b=>1, c=>1, d=>1)
implicit_plot(lambdify(H); xlims=(-5,5), ylims=(-5,5), legend=false)
plot!(y0 + 𝒎 * (x-x0))
plot!(y0 + m * (x-x0))
```
Basically this includes all the same steps as if done "by hand." Some effort could have been saved in plotting, had values for the parameters been substituted initially, but not doing so shows their dependence in the derivative.
@@ -379,7 +421,7 @@ Basically this includes all the same steps as if done "by hand." Some effort cou
:::{.callout-warning}
## Warning
The use of `lambdify(H)` is needed to turn the symbolic expression, `H`, into a function.
The use of `lambdify(H)` is needed to turn the symbolic expression, `H`, into a function for plotting purposes.
:::
@@ -517,15 +559,9 @@ This could have been made easier, had we leveraged the result of the previous ex
#### Example: from physics
Many problems are best done with implicit derivatives. A video showing such a problem along with how to do it analytically is [here](http://ocw.mit.edu/courses/mathematics/18-01sc-single-variable-calculus-fall-2010/unit-2-applications-of-differentiation/part-b-optimization-related-rates-and-newtons-method/session-32-ring-on-a-string/).
This video starts with a simple question:
> If you have a rope and heavy ring, where will the ring position itself due to gravity?
This problem illustrates one best done with implicit derivatives. A video showing this problem along with how to do it analytically is [here](http://ocw.mit.edu/courses/mathematics/18-01sc-single-variable-calculus-fall-2010/unit-2-applications-of-differentiation/part-b-optimization-related-rates-and-newtons-method/session-32-ring-on-a-string/).
Well, suppose you hold the rope in two places, which we can take to be $(0,0)$ and $(a,b)$. Then let $(x,y)$ be all the possible positions of the ring that hold the rope taught. Then we have this picture:
@@ -534,19 +570,41 @@ Well, suppose you hold the rope in two places, which we can take to be $(0,0)$ a
```{julia}
#| hold: true
#| echo: false
let
gr()
P = (4,1)
Q = (1, -3)
scatter([0,4], [0,1], legend=false, xaxis=nothing, yaxis=nothing)
plot!([0,1,4],[0,-3,1])
𝑎, 𝑏= .05, .25
plot(;
axis=([],false),
legend=false)
scatter!([0,4], [0,1])
plot!([0,1,4],[0,-3,1]; line=(:black,2))
a, b = .05, .25
ts = range(0, 2pi, length=100)
plot!(1 .+ 𝑎*sin.(ts), -3 .+ 𝑏*cos.(ts), color=:gold)
annotate!((4-0.3,1,"(a,b)"))
plot!([0,1,1],[0,0,-3], color=:gray, alpha=0.25)
plot!([1,1,4],[0,1,1], color=:gray, alpha=0.25)
Δ = 0.15
annotate!([(1/2, 0-Δ, "x"), (5/2, 1 - Δ, "a-x"), (1-Δ, -1, "|y|"), (1+Δ, -1, "b-y")])
plot!(1 .+ a*sin.(ts), -3 .+ b*cos.(ts), line=(:gold,2))
plot!([0,1,1],[0,0,-3], color=:gray, alpha=0.75)
plot!([1,1,4],[0,1,1], color=:gray, alpha=0.75)
Δ = 0.05
annotate!([
(0,0, text(L"(0,0)",:bottom)),
(4,1, text(L"(a,b)",:bottom)),
(1/2, 0, text(L"x",:top)),
(5/2, 1, text(L"a-x", :top)),
(1, -1, text(L"|y|",:right)),
(1+Δ, -1, text(L"b-y",:left)),
(1+2a, -3, text(L"(x,y)",:left))
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Since the length of the rope does not change, we must have for any admissible $(x,y)$ that:

View File

@@ -1,4 +1,4 @@
# L'Hospital's Rule
# L'Hospital's rule
{{< include ../_common_code.qmd >}}
@@ -28,17 +28,17 @@ We know this is $1$ using a bound from geometry, but might also guess this is on
$$
\sin(x) = x - \sin(\xi)x^2/2, \quad 0 < \xi < x.
\sin(x) = x - \sin(\xi)\frac{x^2}{2}, \quad 0 < \xi < x.
$$
This would yield:
$$
\lim_{x \rightarrow 0} \frac{\sin(x)}{x} = \lim_{x\rightarrow 0} \frac{x -\sin(\xi) x^2/2}{x} = \lim_{x\rightarrow 0} 1 - \sin(\xi) \cdot x/2 = 1.
\lim_{x \rightarrow 0} \frac{\sin(x)}{x} = \lim_{x\rightarrow 0} \frac{x -\sin(\xi) \frac{x^2}{2}}{x} = \lim_{x\rightarrow 0} 1 - \sin(\xi) \cdot \frac{x}{2} = 1.
$$
This is because we know $\sin(\xi) x/2$ has a limit of $0$, when $|\xi| \leq |x|$.
This is because we know $\sin(\xi) \frac{x}{2}$ has a limit of $0$, when $|\xi| \leq |x|$.
That doesn't look any easier, as we worried about the error term, but if just mentally replaced $\sin(x)$ with $x$ - which it basically is near $0$ - then we can see that the limit should be the same as $x/x$ which we know is $1$ without thinking.
@@ -384,10 +384,10 @@ the first equality by L'Hospital's rule, as the second limit exists.
Indeterminate forms of the type $0 \cdot \infty$, $0^0$, $\infty^\infty$, $\infty - \infty$ can be re-expressed to be in the form $0/0$ or $\infty/\infty$ and then L'Hospital's theorem can be applied.
###### Example: rewriting $0 \cdot \infty$
##### Example: rewriting $0 \cdot \infty$
What is the limit $x \log(x)$ as $x \rightarrow 0+$? The form is $0\cdot \infty$, rewriting, we see this is just:
What is the limit of $x \log(x)$ as $x \rightarrow 0+$? The form is $0\cdot \infty$, rewriting, we see this is just:
$$
@@ -401,10 +401,10 @@ $$
\lim_{x \rightarrow 0+}\frac{1/x}{-1/x^2} = \lim_{x \rightarrow 0+} -x = 0.
$$
###### Example: rewriting $0^0$
##### Example: rewriting $0^0$
What is the limit $x^x$ as $x \rightarrow 0+$? The expression is of the form $0^0$, which is indeterminate. (Even though floating point math defines the value as $1$.) We can rewrite this by taking a log:
What is the limit of $x^x$ as $x \rightarrow 0+$? The expression is of the form $0^0$, which is indeterminate. (Even though floating point math defines the value as $1$.) We can rewrite this by taking a log:
$$

View File

@@ -186,25 +186,45 @@ In each of these cases, a more complicated non-linear function is well approxim
#| echo: false
#| label: fig-tangent-dy-dx
#| fig-cap: "Graph with tangent line layered on"
f(x) = sin(x)
a, b = -1/4, pi/2
let
gr()
f(x) = sin(x)
a, b = -1/4, pi/2
p = plot(f, a, b, legend=false,
line=(3, :royalblue),
axis=([], false)
);
p = plot(f, a, b, legend=false,
line=(3, :royalblue),
axis=([], false)
);
plot!(p, x->x, a, b);
plot!(p, [0,1,1], [0, 0, 1], color=:brown);
plot!(p, x->x, a, b);
plot!(p, [0,1,1], [0, 0, 1], color=:brown);
plot!(p, [1,1], [0, sin(1)], color=:green, linewidth=4);
plot!(p, [1,1], [0, sin(1)], color=:green, linewidth=4);
scatter!([0], [0], marker=(5, :mediumorchid3))
annotate!(p, [(0, f(0), text("(c,f(c))", :bottom,:right))])
annotate!(p, collect(zip([1/2, 1+.075, 1/2-1/8],
[.05, sin(1)/2, .75],
["Δx", "Δy", "m=dy/dx"])));
x₀ = 1.15
δ = 0.1
plot!(p, [x₀,x₀,1], [sin(1)/2-δ,0,0], line=(:black, 1, :dash), arrow=true)
plot!(p, [x₀,x₀,1], [sin(1)/2+δ,1, 1], line=(:black, 1, :dash), arrow=true)
plot!(p, [1/2 - 0.8δ, 0], [-δ, -δ]*3/4, line=(:black, 1, :dash), arrow=true)
plot!(p, [1/2 + 0.8δ, 1], [-δ, -δ]*3/4, line=(:black, 1, :dash), arrow=true)
scatter!([0], [0], marker=(5, :mediumorchid3))
annotate!(p, [
(0, f(0), text(L"(c, f(c))", :bottom, :right)),
(1/2, 0, text(L"\Delta x", :bottom)),
(1/2, 0, text(L"dx", :top)),
(1-0.02, sin(1)/2, text(L"Δ y", :right)),
(x₀, sin(1)/2, text(L"dy")),
(2/3, 2/3, text(L"m = \frac{dy}{dx} \approx \frac{\Delta y}{\Delta x}",
:bottom, rotation=33)) # why 33 and not 45?
])
p
end
```
```{julia}
#| echo: false
plotly()
nothing
```
The plot in @fig-tangent-dy-dx shows a tangent line with slope $dy/dx$ and the actual change in $y$, $\Delta y$, for some specified $\Delta x$ at a point $(c,f(c))$. The small gap above the sine curve is the error were the value of the sine approximated using the drawn tangent line. We can see that approximating the value of $\Delta y = \sin(c+\Delta x) - \sin(c)$ with the often easier to compute $(dy/dx) \cdot \Delta x = f'(c)\Delta x$ - for small enough values of $\Delta x$ - is not going to be too far off provided $\Delta x$ is not too large.
@@ -480,7 +500,8 @@ To see formally why the remainder is as it is, we recall the mean value theorem
$$
\text{error} = h(x) - h(0) = (g(x) - g(0)) \frac{h'(e)}{g'(e)} = x^2 \cdot \frac{1}{2} \cdot \frac{f'(e) - f'(0)}{e} =
\text{error} = h(x) - h(0) = (g(x) - g(0)) \frac{h'(e)}{g'(e)} =
(x^2 - 0) \cdot \frac{f'(e) - f'(0)}{2e} =
x^2 \cdot \frac{1}{2} \cdot f''(\xi).
$$
@@ -551,7 +572,8 @@ Is it a coincidence that a basic algebraic operation with tangent lines approxim
$$
\begin{align*}
f(x) \cdot g(x) &= [f(c) + f'(c)(x-c) + \mathcal{O}((x-c)^2)] \cdot [g(c) + g'(c)(x-c) + \mathcal{O}((x-c)^2)]\\
&=[f(c) + f'(c)(x-c)] \cdot [g(c) + g'(c)(x-c)] + (f(c) + f'(c)(x-c)) \cdot \mathcal{O}((x-c)^2) + (g(c) + g'(c)(x-c)) \cdot \mathcal{O}((x-c)^2) + [\mathcal{O}((x-c)^2)]^2\\
&=[f(c) + f'(c)(x-c)] \cdot [g(c) + g'(c)(x-c)] \\
&+ (f(c) + f'(c)(x-c)) \cdot \mathcal{O}((x-c)^2) + (g(c) + g'(c)(x-c)) \cdot \mathcal{O}((x-c)^2) + [\mathcal{O}((x-c)^2)]^2\\
&= [f(c) + f'(c)(x-c)] \cdot [g(c) + g'(c)(x-c)] + \mathcal{O}((x-c)^2)\\
&= f(c) \cdot g(c) + [f'(c)\cdot g(c) + f(c)\cdot g'(c)] \cdot (x-c) + [f'(c)\cdot g'(c) \cdot (x-c)^2 + \mathcal{O}((x-c)^2)] \\
&= f(c) \cdot g(c) + [f'(c)\cdot g(c) + f(c)\cdot g'(c)] \cdot (x-c) + \mathcal{O}((x-c)^2)
@@ -630,7 +652,7 @@ Automatic differentiation (forward mode) essentially uses this technique. A "dua
```{julia}
Dual(0, 1)
x = Dual(0, 1)
```
Then what is $x$? It should reflect both $(\sin(0), \cos(0))$ the latter being the derivative of $\sin$. We can see this is *almost* what is computed behind the scenes through:
@@ -638,11 +660,13 @@ Then what is $x$? It should reflect both $(\sin(0), \cos(0))$ the latter being t
```{julia}
#| hold: true
x = Dual(0, 1)
@code_lowered sin(x)
```
This output of `@code_lowered` can be confusing, but this simple case needn't be. Working from the end we see an assignment to a variable named `%3` of `Dual(%6, %12)`. The value of `%6` is `sin(x)` where `x` is the value `0` above. The value of `%12` is `cos(x)` *times* the value `1` above (the `xp`), which reflects the *chain* rule being used. (The derivative of `sin(u)` is `cos(u)*du`.) So this dual number encodes both the function value at `0` and the derivative of the function at `0`.
This output of `@code_lowered` can be confusing, but this simple case needn't be, as we know what to look for: we need to evaluate `sin` at `1` and carry along the derivative `cos(x)` **times** the derivative at `x`.
The `sin` is computed in `%6` and is passed to `Dual` in `%13` as the first arguments. The `cos` is computed in `%11` and then *multiplied* in `%` by `xp`, which holds the derivative information about `x`. This is passed as the second argument to `Dual` in `%13`.
Similarly, we can see what happens to `log(x)` at `1` (encoded by `Dual(1,1)`):
@@ -654,14 +678,15 @@ x = Dual(1, 1)
@code_lowered log(x)
```
We can see the derivative again reflects the chain rule, it being given by `1/x * xp` where `xp` acts like `dx` (from assignments `%9` and `%8`). Comparing the two outputs, we see only the assignment to `%9` differs, it reflecting the derivative of the function.
We again see `log(x)` being evaluated in line `%6`. The derivative evaluated at `x` is done in line `%11` and this is multiplied by `xp` in line `%12`.
## Curvature
The curvature of a function will be a topic in a later section on differentiable vector calculus, but the concept of linearization can be used to give an earlier introduction.
The tangent line linearizes the function, it begin the best linear approximation to the graph of the function at the point. The slope of the tangent line is the limit of the slopes of different secant lines. Consider now, the orthogonal concept, the *normal line* at a point. This is a line perpendicular to the tangent line that goes through the point on the curve.
The tangent line linearizes the function, it being the best linear approximation to the graph of the function at the point. The slope of the tangent line is the limit of the slopes of different secant lines. Consider now, the orthogonal concept, the *normal line* at a point. This is a line perpendicular to the tangent line that goes through the point on the curve.
At a point $(c,f(c))$ the slope of the normal line is $-1/f'(c)$.
@@ -692,6 +717,7 @@ Call $R$ the intersection point of the two normal lines:
#| echo: false
using Roots
let
gr()
f(x) = x^4
fp(x) = 4x^3
c = 1/4
@@ -706,12 +732,17 @@ let
Rx = find_zero(x -> nlc(x) - nlch(x), (-10, 10))
scatter!([c,c+h], f.([c, c+h]))
scatter!([Rx], [nlc(Rx)])
annotate!([(c, f(c), "(c,f(c))",:top),
(c+h, f(c+h), "(c+h, f(c+h))",:bottom),
(Rx, nlc(Rx), "R",:left)])
annotate!([(c, f(c), L"(c,f(c))",:top),
(c+h, f(c+h), L"(c+h, f(c+h))",:bottom),
(Rx, nlc(Rx), L"R",:left)])
end
```
```{julia}
#| echo: false
plotly()
nothing
```
What happens to $R$ as $h \rightarrow 0$?
@@ -760,6 +791,7 @@ This formula for $r$ is known as the radius of curvature of $f$ -- the radius of
```{julia}
#| echo: false
let
gr()
f(x) = x^4
fp(x) = 4x^3
fpp(x) = 12x^2
@@ -779,8 +811,8 @@ let
scatter!([c], f.([c]))
scatter!([Rx], [nlc(Rx)])
annotate!([(c, f(c), "(c,f(c))",:top),
(Rx, nlc(Rx), "R",:left)])
annotate!([(c, f(c), L"(c,f(c))",:top),
(Rx, nlc(Rx), L"R",:left)])
Delta = pi/10
@@ -801,6 +833,12 @@ let
end
```
```{julia}
#| echo: false
plotly()
nothing
```
## Questions

View File

@@ -1,4 +1,4 @@
# The mean value theorem for differentiable functions.
# The mean value theorem for differentiable functions
{{< include ../_common_code.qmd >}}
@@ -277,6 +277,13 @@ For $f$ differentiable on $(a,b)$ and continuous on $[a,b]$, if $f(a)=f(b)$, the
:::
::: {#fig-l-hospital-144}
![Figure from L'Hospital's calculus book](figures/lhopital-144.png)
Figure from L'Hospital's calculus book showing Rolle's theorem where $c=E$ in the labeling.
:::
This modest observation opens the door to many relationships between a function and its derivative, as it ties the two together in one statement.
@@ -323,46 +330,71 @@ The mean value theorem is a direct generalization of Rolle's theorem.
::: {.callout-note icon=false}
## Mean value theorem
Let $f(x)$ be differentiable on $(a,b)$ and continuous on $[a,b]$. Then there exists a value $c$ in $(a,b)$ where $f'(c) = (f(b) - f(a)) / (b - a)$.
Let $f(x)$ be differentiable on $(a,b)$ and continuous on $[a,b]$. Then there exists a value $c$ in $(a,b)$ where
$$
f'(c) = (f(b) - f(a)) / (b - a).
$$
:::
This says for any secant line between $a < b$ there will be a parallel tangent line at some $c$ with $a < c < b$ (all provided $f$ is differentiable on $(a,b)$ and continuous on $[a,b]$).
Figure @fig-mean-value-theorem illustrates the theorem. The blue line is the secant line. A parallel line tangent to the graph is guaranteed by the mean value theorem. In this figure, there are two such lines, rendered using brown.
@fig-mean-value-theorem illustrates the theorem. The secant line between $a$ and $b$ is dashed. For this function there are two values of $c$ where the slope of the tangent line is seen to be the same as the slope of this secant line. At least one is guaranteed by the theorem.
```{julia}
#| hold: true
#| echo: false
#| label: fig-mean-value-theorem
f(x) = x^3 - x
a, b = -2, 1.75
m = (f(b) - f(a)) / (b-a)
cps = find_zeros(x -> f'(x) - m, a, b)
let
# mean value theorem
gr()
f(x) = x^3 -4x^2 + 3x - 1
a, b = -3/4, 3+3/4
plot(; axis=([], nothing),
legend=false,
xlims=(-1.1,4),
framestyle=:none)
y₀ = 0.3 + f(-1)
p = plot(f, a-0.75, b+1,
color=:mediumorchid3,
linewidth=3, legend=false,
axis=([],false),
)
plot!(f, -1, 4; line=(:black, 2))
plot!([-1.1, 4], y₀*[1,1]; line=(:black, 1), arrow=true, head=:top)
p,q = (a,f(a)), (b, f(b))
scatter!([p,q]; marker=(:circle, 4, :red))
plot!([p,q]; line=(:gray, 2, :dash))
m = (f(b) - f(a))/(b-a)
c₁, c₂ = find_zeros(x -> f'(x) - m, (a,b))
Δ = 2/3
for c ∈ (c₁, c₂)
plot!(tangent(f,c), c-Δ, c+Δ; line=(:gray, 2))
plot!([(c, y₀), (c, f(c))]; line=(:gray, 1, :dash))
end
for c ∈ (a,b)
plot!([(c, y₀), (c, f(c))]; line=(:gray, 1))
end
annotate!([
(a, y₀, text(L"a", :top)),
(b, y₀, text(L"b", :top)),
(c₁, y₀, text(L"c_1", :top)),
(c₂, y₀, text(L"c_2", :top)),
])
current()
plot!(x -> f(a) + m*(x-a), a-1, b+1, linewidth=5, color=:royalblue)
scatter!([a,b], [f(a), f(b)])
annotate!([(a, f(a), text("a", :bottom)),
(b, f(b), text("b", :bottom))])
for cp in cps
plot!(x -> f(cp) + f'(cp)*(x-cp), a-1, b+1, color=:brown3)
end
```
scatter!(cps, f.(cps))
subsscripts = collect("₀₁₂₃₄₅₆₇₈₉")
annotate!([(cp, f(cp), text("c"*subsscripts[i], :bottom)) for (i,cp) ∈ enumerate(cps)])
p
```{julia}
#| echo: false
plotly()
nothing
```
Like Rolle's theorem this is a guarantee that something exists, not a recipe to find it. In fact, the mean value theorem is just Rolle's theorem applied to:
@@ -506,7 +538,7 @@ function parametric_fns_graph(n)
xlim=(-1.1,1.1), ylim=(-pi/2-.1, pi/2+.1))
scatter!(plt, [f(ts[end])], [g(ts[end])], color=:orange, markersize=5)
val = @sprintf("% 0.2f", ts[end])
annotate!(plt, [(0, 1, "t = $val")])
annotate!(plt, [(0, 1, L"t = %$val")])
end
caption = L"""

View File

@@ -315,10 +315,10 @@ One way to think about this is the difference between `x` and the next largest f
For the specific example, `abs(b-a) <= 2eps(m)` means that the gap between `a` and `b` is essentially 2 floating point values from the $x$ value with the smallest $f(x)$ value.
For bracketing methods that is about as good as you can get. However, once floating values are understood, the absolute best you can get for a bracketing interval would be
For bracketing methods that is about as good as you can get. However, once floating point values are understood, the absolute best you can get for a bracketing interval would be
* along the way, a value `f(c)` is found which is *exactly* `0.0`
* along the way, a value `f(c)` is found which evaluates *exactly* to `0.0`
* the endpoints of the bracketing interval are *adjacent* floating point values, meaning the interval can not be bisected and `f` changes sign between the two values.
@@ -334,6 +334,8 @@ chandrapatla(fu, -9, 1, λ3)
Here the issue is `abs(b-a)` is tiny (of the order `1e-119`) but `eps(m)` is even smaller.
> For checking if $x_n \approx x_{n+1}$ both a relative and absolute error should be used unless something else is known.
For non-bracketing methods, like Newton's method or the secant method, different criteria are useful. There may not be a bracketing interval for `f` (for example `f(x) = (x-1)^2`) so the second criteria above might need to be restated in terms of the last two iterates, $x_n$ and $x_{n-1}$. Calling this difference $\Delta = |x_n - x_{n-1}|$, we might stop if $\Delta$ is small enough. As there are scenarios where this can happen, but the function is not at a zero, a check on the size of $f$ is needed.
@@ -347,7 +349,7 @@ First if `f(x_n)` is `0.0` then it makes sense to call `x_n` an *exact zero* of
However, there may never be a value with `f(x_n)` exactly `0.0`. (The value of `sin(1pi)` is not zero, for example, as `1pi` is an approximation to $\pi$, as well the `sin` of values adjacent to `float(pi)` do not produce `0.0` exactly.)
Suppose `x_n` is the closest floating number to $\alpha$, the zero. Then the relative rounding error, $($ `x_n` $- \alpha)/\alpha$, will be a value $\delta$ with $\delta$ less than `eps()`.
Suppose `x_n` is the closest floating point number to $\alpha$, the zero. Then the relative rounding error, $($ `x_n` $- \alpha)/\alpha$, will be a value $\delta$ with $\delta$ less than `eps()`.
How far then can `f(x_n)` be from $0 = f(\alpha)$?
@@ -364,10 +366,11 @@ $$
f(x_n) \approx f(\alpha) + f'(\alpha) \cdot (\alpha\delta) = f'(\alpha) \cdot \alpha \delta
$$
So we should consider `f(x_n)` an *approximate zero* when it is on the scale of $f'(\alpha) \cdot \alpha \delta$.
So we should consider `f(x_n)` an *approximate zero* when it is on the scale of $f'(\alpha) \cdot \alpha \delta$. That $\alpha$ factor means we consider a *relative* tolerance for `f`.
> For checking if $f(x_n) \approx 0$ both a relative and absolute error should be used---the relative error involving the size of $x_n$.
That $\alpha$ factor means we consider a *relative* tolerance for `f`. Also important when `x_n` is close to `0`, is the need for an *absolute* tolerance, one not dependent on the size of `x`. So a good condition to check if `f(x_n)` is small is
A good condition to check if `f(x_n)` is small is
`abs(f(x_n)) <= abs(x_n) * rtol + atol`, or `abs(f(x_n)) <= max(abs(x_n) * rtol, atol)`
@@ -396,7 +399,7 @@ It is not uncommon to assign `rtol` to have a value like `sqrt(eps())` to accoun
In Part III of @doi:10.1137/1.9781611977165 we find language of numerical analysis useful to formally describe the zero-finding problem. Key concepts are errors, conditioning, and stability. These give some theoretical justification for the tolerances above.
Abstractly a *problem* is a mapping, $F$, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm is a generalization of the absolute value and gives quantitative meaning to terms like small and large.
Abstractly a *problem* is a mapping, $F$, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm (denoted with $\lVert\cdot\rVert$) is a generalization of the absolute value and gives quantitative meaning to terms like small and large.
> A *well-conditioned* problem is one with the property that all small perturbations of $x$ lead to only small changes in $F(x)$.
@@ -435,7 +438,7 @@ $$
\tilde{F}(x) = F(\tilde{x})
$$
for some $\tilde{x}$ with $\lVert\tilde{x} - x\rVert/\lVert x\rVert$ is small.
for some $\tilde{x}$ where $\lVert\tilde{x} - x\rVert/\lVert x\rVert$ is small.
> "A backward stable algorithm gives exactly the right answer to nearly the right question."

View File

@@ -413,7 +413,7 @@ To machine tolerance the answer is a zero, even though the exact answer is irrat
The first example by Newton of applying the method to a non-polynomial function was solving an equation from astronomy: $x - e \sin(x) = M$, where $e$ is an eccentric anomaly and $M$ a mean anomaly. Newton used polynomial approximations for the trigonometric functions, here we can solve directly.
Let $e = 1/2$ and $M = 3/4$. With $f(x) = x - e\sin(x) - M$ then $f'(x) = 1 - e cos(x)$. Starting at 1, Newton's method for 3 steps becomes:
Let $e = 1/2$ and $M = 3/4$. With $f(x) = x - e\sin(x) - M$ then $f'(x) = 1 - e \cos(x)$. Starting at 1, Newton's method for 3 steps becomes:
```{julia}
ec, M = 0.5, 0.75
@@ -490,7 +490,7 @@ $$
x_{i+1} = x_i - (1/x_i - q)/(-1/x_i^2) = -qx^2_i + 2x_i.
$$
Now for $q$ in the interval $[1/2, 1]$ we want to get a *good* initial guess. Here is a claim. We can use $x_0=48/17 - 32/17 \cdot q$. Let's check graphically that this is a reasonable initial approximation to $1/q$:
Now for $q$ in the interval $[1/2, 1]$ we want to get a *good* initial guess. Here is a claim: we can use $x_0=48/17 - 32/17 \cdot q$. Let's check graphically that this is a reasonable initial approximation to $1/q$:
```{julia}
@@ -865,7 +865,7 @@ The function $f(x) = x^{20} - 1$ has two bad behaviours for Newton's
method: for $x < 1$ the derivative is nearly $0$ and for $x>1$ the
second derivative is very big. In this illustration, we have an
initial guess of $x_0=8/9$. As the tangent line is fairly flat, the
next approximation is far away, $x_1 = 1.313\dots$. As this guess
next approximation is far away, $x_1 = 1.313\dots$. As this guess
is much bigger than $1$, the ratio $f(x)/f'(x) \approx
x^{20}/(20x^{19}) = x/20$, so $x_i - f(x_i)/f'(x_i) \approx (19/20)x_i$
yielding slow, linear convergence until $f''(x_i)$ is moderate. For

View File

@@ -70,7 +70,7 @@ function perimeter_area_graphic_graph(n)
size=fig_size,
xlim=(0,10), ylim=(0,10))
scatter!(plt, [w], [h], color=:orange, markersize=5)
annotate!(plt, [(w/2, h/2, "Area=$(round(w*h,digits=1))")])
annotate!(plt, [(w/2, h/2, L"Area$=\; %$(round(w*h,digits=1))$")])
plt
end
@@ -79,7 +79,7 @@ caption = """
Some possible rectangles that satisfy the constraint on the perimeter and their area.
"""
n = 6
n = 5
anim = @animate for i=1:n
perimeter_area_graphic_graph(i-1)
end
@@ -187,8 +187,11 @@ ts = range(0, stop=pi, length=50)
x1,y1 = 4, 4.85840
x2,y2 = 3, 6.1438
delta = 4
p = plot(delta .+ x1*[0, 1,1,0], y1*[0,0,1,1], linetype=:polygon, fillcolor=:blue, legend=false)
plot!(p, x2*[0, 1,1,0], y2*[0,0,1,1], linetype=:polygon, fillcolor=:blue)
p = plot(delta .+ x1*[0, 1,1,0], y1*[0,0,1,1];
linetype=:polygon, fillcolor=:blue, legend=false,
aspect_ratio=:equal)
plot!(p, x2*[0, 1,1,0], y2*[0,0,1,1];
linetype=:polygon, fillcolor=:blue)
plot!(p, delta .+ x1/2 .+ x1/2*cos.(ts), y1.+x1/2*sin.(ts), linetype=:polygon, fillcolor=:red)
plot!(p, x2/2 .+ x2/2*cos.(ts), y2 .+ x2/2*sin.(ts), linetype=:polygon, fillcolor=:red)
@@ -308,7 +311,7 @@ A₀ = w₀ * h₀ + pi * (w₀/2)^2 / 2
Perim = 2*h₀ + w₀ + pi * w₀/2
h₁ = solve(Perim - 20, h₀)[1]
A₁ = A₀(h₀ => h₁)
w₁ = solve(diff(A₁,w₀), w₀)[1]
w₁ = solve(diff(A₁,w₀) ~ 0, w₀)[1]
```
We know that `w₀` is the maximum in this example from our previous work. We shall see soon, that just knowing that the second derivative is negative at `w₀` would suffice to know this. Here we check that condition:
@@ -392,14 +395,29 @@ The figure shows a ladder of length $l_1 + l_2$ that got stuck - it was too long
```{julia}
#| hold: true
#| echo: false
p = plot([0, 0, 15], [15, 0, 0], color=:blue, legend=false)
plot!(p, [5, 5, 15], [15, 8, 8], color=:blue)
plot!(p, [0,14.53402874075368], [12.1954981558864, 0], linewidth=3)
plot!(p, [0,5], [8,8], color=:orange)
plot!(p, [5,5], [0,8], color=:orange)
annotate!(p, [(13, 1/2, "θ"),
(2.5, 11, "l₂"), (10, 5, "l₁"), (2.5, 7.0, "l₂ ⋅ cos(θ)"),
(5.1, 4, "l₁ ⋅ sin(θ)")])
let
gr()
p = plot([0, 0, 15], [15, 0, 0],
xticks = [0,5, 15],
yticks = [0,8, 12],
line=(:blue, 2),
legend=false)
plot!(p, [5, 5, 15], [15, 8, 8]; line=(:blue,2))
plot!(p, [0,14.53402874075368], [12.1954981558864, 0], linewidth=3)
plot!(p, [0,5], [8,8], color=:orange)
plot!(p, [5,5], [0,8], color=:orange)
annotate!(p, [(13, 1/2, L"\theta"),
(2.5, 11, L"l_2"),
(10, 5, L"l_1"),
(2.5, 7.0, L"l_2 \cos(\theta)"),
(5.1, 4, text(L"l_1 \sin(\theta)", :top,rotation=90))])
end
```
```{julia}
#| echo: false
plotly()
nothing
```
We approach this problem in reverse. It is easy to see when a ladder is too long. It gets stuck at some angle $\theta$. So for each $\theta$ we find that ladder length that is just too long. Then we find the minimum length of all these ladders that are too long. If a ladder is this length or more it will get stuck for some angle. However, if it is less than this length it will not get stuck. So to maximize a ladder length, we minimize a different function. Neat.
@@ -834,10 +852,12 @@ A rancher with $10$ meters of fence wishes to make a pen adjacent to an existing
```{julia}
#| hold: true
#| echo: false
p = plot(; legend=false, aspect_ratio=:equal, axis=nothing, border=:none)
p = plot(; legend=false, aspect_ratio=:equal, axis=nothing, border=:none)
plot!([0,10, 10, 0, 0], [0,0,10,10,0]; linewidth=3)
plot!(p, [10,14,14,10], [2, 2, 8,8]; linewidth = 1)
annotate!(p, [(15, 5, "x"), (12,1, "y")])
annotate!(p, [(14-0.1, 5, text("x", :right)), (12,2, text("y",:bottom))])
p
```
@@ -1353,7 +1373,12 @@ p = 1/2
x = a/p
plot!(plt, [0, b*(1+p), 0, 0], [0, 0, a+x, 0])
plot!(plt, [b,b,0,0],[0,a,a,0])
annotate!(plt, [(b/2,0, "b"), (0,a/2,"a"), (0,a+x/2,"x"), (b+b*p/2,0,"bp")])
annotate!(plt, [
(b/2,0, text("b",:top)),
(0,a/2, text("a",:right)),
(0,a+x/2, text("x",:right)),
(b+b*p/2,0, text("bp",:top))
])
plt
```

View File

@@ -18,7 +18,7 @@ using SymPy
---
Related rates problems involve two (or more) unknown quantities that are related through an equation. As the two variables depend on each other, also so do their rates - change with respect to some variable which is often time, though exactly how remains to be discovered. Hence the name "related rates."
Related rates problems involve two (or more) unknown quantities that are related through an equation. As the two variables depend on each other, also so do their rates - change with respect to some variable which is often time. Exactly how remains to be discovered. Hence the name "related rates."
#### Examples
@@ -27,7 +27,7 @@ Related rates problems involve two (or more) unknown quantities that are related
The following is a typical "book" problem:
> A screen saver displays the outline of a $3$ cm by $2$ cm rectangle and then expands the rectangle in such a way that the $2$ cm side is expanding at the rate of $4$ cm/sec and the proportions of the rectangle never change. How fast is the area of the rectangle increasing when its dimensions are $12$ cm by $8$ cm? [Source.](http://oregonstate.edu/instruct/mth251/cq/Stage9/Practice/ratesProblems.html)
> A *vintage* screen saver displays the outline of a $3$ cm by $2$ cm rectangle and then expands the rectangle in such a way that the $2$ cm side is expanding at the rate of $4$ cm/sec and the proportions of the rectangle never change. How fast is the area of the rectangle increasing when its dimensions are $12$ cm by $8$ cm? [Source.](http://oregonstate.edu/instruct/mth251/cq/Stage9/Practice/ratesProblems.html)
@@ -125,7 +125,7 @@ w(t) = 2 + 4*t
```{julia}
h(t) = 3/2 * w(t)
h(t) = 3 * w(t) / 2
```
This means again that area depends on $t$ through this formula:
@@ -198,6 +198,50 @@ A ladder, with length $l$, is leaning against a wall. We parameterize this probl
If the ladder starts to slip away at the base, but remains in contact with the wall, express the rate of change of $h$ with respect to $t$ in terms of $db/dt$.
```{julia}
#| echo: false
let
gr()
l = 12
b = 6
h = sqrt(l^2 - b^2)
plot(;
axis=([],false),
legend=false,
aspect_ratio=:equal)
P,Q = (0,h),(b,0)
w = 0.2
S = Shape([-w,0,0,-w],[0,0,h+1,h+1])
plot!(S; fillstyle=:/, fillcolor=:gray80, fillalpha=0.5)
R = Shape([-w,b+2,b+2,-w],[-w,-w,0,0])
plot!(R, fill=(:gray, 0.25))
plot!([P,Q]; line=(:black, 2))
scatter!([P,Q])
b = b + 3/2
h = sqrt(l^2 - b^2)
plot!([b,b],[0,0]; arrow=true, side=:head, line=(:blue, 3))
plot!([0,0], [h,h]; arrow=true, side=:head, line=(:blue, 3))
annotate!([
(b,-w,text(L"(b(t),0)",:top)),
(-w, h, text(L"(0,h(t))", :bottom, rotation=90)),
(b/2, h/2, text(L"L", rotation = -atand(h,b), :bottom))
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
We have from implicitly differentiating in $t$ the equation $l^2 = h^2 + b^2$, noting that $l$ is a constant, that:
@@ -236,7 +280,7 @@ As $b$ goes to $l$, $h$ goes to $0$, so $b/h$ blows up. Unless $db/dt$ goes to $
:::{.callout-note}
## Note
Often, this problem is presented with $db/dt$ having a constant rate. In this case, the ladder problem defies physics, as $dh/dt$ eventually is faster than the speed of light as $h \rightarrow 0+$. In practice, were $db/dt$ kept at a constant, the ladder would necessarily come away from the wall. The trajectory would follow that of a tractrix were there no gravity to account for.
Often, this problem is presented with $db/dt$ having a constant rate. In this case, the ladder problem defies physics, as $dh/dt$ eventually is faster than the speed of light as $h \rightarrow 0+$. In practice, were $db/dt$ kept at a constant, the ladder would necessarily come away from the wall.
:::
@@ -247,12 +291,15 @@ Often, this problem is presented with $db/dt$ having a constant rate. In this ca
```{julia}
#| hold: true
#| echo: false
#| eval: false
caption = "A man and woman walk towards the light."
imgfile = "figures/long-shadow-noir.png"
ImageFile(:derivatives, imgfile, caption)
```
![A man and woman walk towards the light](./figures/long-shadow-noir.png)
Shadows are a staple of film noir. In the photo, suppose a man and a woman walk towards a street light. As they approach the light the length of their shadow changes.
@@ -340,7 +387,7 @@ This can be solved for the unknown: $dx/dt = 50/20$.
A batter hits a ball toward third base at $75$ ft/sec and runs toward first base at a rate of $24$ ft/sec. At what rate does the distance between the ball and the batter change when $2$ seconds have passed?
We will answer this with `SymPy`. First we create some symbols for the movement of the ball towards third base, `b(t)`, the runner toward first base, `r(t)`, and the two velocities. We use symbolic functions for the movements, as we will be differentiating them in time:
We will answer this symbolically. First we create some symbols for the movement of the ball towards third base, `b(t)`, the runner toward first base, `r(t)`, and the two velocities. We use symbolic functions for the movements, as we will be differentiating them in time:
```{julia}

View File

@@ -10,15 +10,6 @@ This section uses the `TermInterface` add-on package.
using TermInterface
```
```{julia}
#| echo: false
const frontmatter = (
title = "Symbolic derivatives",
description = "Calculus with Julia: Symbolic derivatives",
tags = ["CalculusWithJulia", "derivatives", "symbolic derivatives"],
);
```
---

View File

@@ -1,4 +1,4 @@
# Taylor Polynomials and other Approximating Polynomials
# Taylor polynomials and other approximating polynomials
{{< include ../_common_code.qmd >}}
@@ -42,12 +42,14 @@ gr()
taylor(f, x, c, n) = series(f, x, c, n+1).removeO()
function make_taylor_plot(u, a, b, k)
k = 2k
plot(u, a, b, title="plot of T_$k", linewidth=5, legend=false, size=fig_size, ylim=(-2,2.5))
if k == 1
plot!(zero, range(a, stop=b, length=100))
else
plot!(taylor(u, x, 0, k), range(a, stop=b, length=100))
end
plot(u, a, b;
title = L"plot of $T_{%$k}$",
line = (:black, 3),
legend = false,
size = fig_size,
ylim = (-2,2.5))
fn = k == 1 ? zero : taylor(u, x, 0, k)
plot!(fn, range(a, stop=b, length=100); line=(:red,2))
end
@@ -76,7 +78,7 @@ ImageFile(imgfile, caption)
## The secant line and the tangent line
We approach this general problem **much** more indirectly than is needed. We introduce notations that are attributed to Newton and proceed from there. By leveraging `SymPy` we avoid tedious computations and *hopefully* gain some insight.
Heads up: we approach this general problem **much** more indirectly than is needed by introducing notations that are attributed to Newton and proceed from there. By leveraging `SymPy` we avoid tedious computations and *hopefully* gain some insight.
Suppose $f(x)$ is a function which is defined in a neighborhood of $c$ and has as many continuous derivatives as we care to take at $c$.
@@ -102,7 +104,10 @@ $$
tl(x) = f(c) + f'(c) \cdot(x - c).
$$
The key is the term multiplying $(x-c)$ for the secant line this is an approximation to the related term for the tangent line. That is, the secant line approximates the tangent line, which is the linear function that best approximates the function at the point $(c, f(c))$. This is quantified by the *mean value theorem* which states under our assumptions on $f(x)$ that there exists some $\xi$ between $x$ and $c$ for which:
The key is the term multiplying $(x-c)$---for the secant line this is an approximation to the related term for the tangent line. That is, the secant line approximates the tangent line, which is the linear function that best approximates the function at the point $(c, f(c))$.
This is quantified by the *mean value theorem* which states under our assumptions on $f(x)$ that there exists some $\xi$ between $x$ and $c$ for which:
$$
@@ -189,7 +194,7 @@ function divided_differences(f, x, xs...)
end
```
In the following, by adding a `getindex` method, we enable the `[]` notation of Newton to work with symbolic functions, like `u()` defined below, which is used in place of $f$:
In the following---even though it is *type piracy*---by adding a `getindex` method, we enable the `[]` notation of Newton to work with symbolic functions, like `u()` defined below, which is used in place of $f$:
```{julia}
@@ -199,48 +204,38 @@ Base.getindex(u::SymFunction, xs...) = divided_differences(u, xs...)
ex = u[c, c+h]
```
We can take a limit and see the familiar (yet differently represented) value of $u'(c)$:
A limit as $h\rightarrow 0$ would show a value of $u'(c)$.
```{julia}
limit(ex, h => 0)
```
The choice of points is flexible. Here we use $c-h$ and $c+h$:
```{julia}
limit(u[c-h, c+h], h=>0)
```
Now, let's look at:
```{julia}
ex₂ = u[c, c+h, c+2h]
simplify(ex₂)
```
Not so bad after simplification. The limit shows this to be an approximation to the second derivative divided by $2$:
If multiply by $2$ and simplify, a discrete approximation for the second derivative---the second order forward [difference equation](http://tinyurl.com/n4235xy)---is seen:
```{julia}
limit(ex₂, h => 0)
simplify(2ex₂)
```
(The expression is, up to a divisor of $2$, the second order forward [difference equation](http://tinyurl.com/n4235xy), a well-known approximation to $f''$.)
This relationship between higher-order divided differences and higher-order derivatives generalizes. This is expressed in this [theorem](http://tinyurl.com/zjogv83):
> Suppose $m=x_0 < x_1 < x_2 < \dots < x_n=M$ are distinct points. If $f$ has $n$ continuous derivatives then there exists a value $\xi$, where $m < \xi < M$, satisfying:
:::{.callout-note}
## Mean value theorem for Divided differences
Suppose $m=x_0 < x_1 < x_2 < \dots < x_n=M$ are distinct points. If $f$ has $n$ continuous derivatives then there exists a value $\xi$, where $m < \xi < M$, satisfying:
$$
f[x_0, x_1, \dots, x_n] = \frac{1}{n!} \cdot f^{(n)}(\xi).
$$
:::
This immediately applies to the above, where we parameterized by $h$: $x_0=c, x_1=c+h, x_2 = c+2h$. For then, as $h$ goes to $0$, it must be that $m, M \rightarrow c$, and so the limit of the divided differences must converge to $(1/2!) \cdot f^{(2)}(c)$, as $f^{(2)}(\xi)$ converges to $f^{(2)}(c)$.
@@ -496,16 +491,20 @@ f[x_0] &+ f[x_0,x_1] \cdot (x - x_0) + f[x_0, x_1, x_2] \cdot (x - x_0)\cdot(x-x
$$
and taking $x_i = c + i\cdot h$, for a given $n$, we have in the limit as $h > 0$ goes to zero that coefficients of this polynomial converge to the coefficients of the *Taylor Polynomial of degree n*:
and taking $x_i = c + i\cdot h$, for a given $n$, we have in the limit as $h > 0$ goes to zero that coefficients of this polynomial converge:
:::{.callout-note}
## Taylor polynomial of degree $n$
Suppose $f(x)$ has $n+1$ derivatives (continuous on $c$ and $x$), then
$$
f(c) + f'(c)\cdot(x-c) + \frac{f''(c)}{2!}(x-c)^2 + \cdots + \frac{f^{(n)}(c)}{n!} (x-c)^n.
T_n(x) = f(c) + f'(c)\cdot(x-c) + \frac{f''(c)}{2!}(x-c)^2 + \cdots + \frac{f^{(n)}(c)}{n!} (x-c)^n,
$$
This polynomial will be the best approximation of degree $n$ or less to the function $f$, near $c$. The error will be given - again by an application of the Cauchy mean value theorem:
will be the best approximation of degree $n$ or less to $f$, near $c$.
The error will be given - again by an application of the Cauchy mean value theorem:
$$
@@ -513,9 +512,10 @@ $$
$$
for some $\xi$ between $c$ and $x$.
:::
The Taylor polynomial for $f$ about $c$ of degree $n$ can be computed by taking $n$ derivatives. For such a task, the computer is very helpful. In `SymPy` the `series` function will compute the Taylor polynomial for a given $n$. For example, here is the series expansion to 10 terms of the function $\log(1+x)$ about $c=0$:
The Taylor polynomial for $f$ about $c$ of degree $n$ can be computed by taking $n$ derivatives. For such a task, the computer is very helpful. In `SymPy` the `series` function will compute the Taylor polynomial for a given $n$. For example, here is the series expansion to $10$ terms of the function $\log(1+x)$ about $c=0$:
```{julia}
@@ -794,7 +794,7 @@ This is re-expressed as $2s + s \cdot p$ with $p$ given by:
```{julia}
cancel((a_b - 2s)/s)
p = cancel((a_b - 2s)/s)
```
Now, $2s = m - s\cdot m$, so the above can be reworked to be $\log(1+m) = m - s\cdot(m-p)$.
@@ -803,36 +803,28 @@ Now, $2s = m - s\cdot m$, so the above can be reworked to be $\log(1+m) = m - s\
(For larger values of $m$, a similar, but different approximation, can be used to minimize floating point errors.)
How big can the error be between this *approximations* and $\log(1+m)$? We plot to see how big $s$ can be:
How big can the error be between this *approximations* and $\log(1+m)$? The expression $m/(2+m)$ increases for $m > 0$, so, on this interval $s$ is as big as
```{julia}
@syms v
plot(v/(2+v), sqrt(2)/2 - 1, sqrt(2)-1)
```
This shows, $s$ is as big as
```{julia}
Max = (v/(2+v))(v => sqrt(2) - 1)
Max = (x/(2+x))(x => sqrt(2) - 1)
```
The error term is like $2/19 \cdot \xi^{19}$ which is largest at this value of $M$. Large is relative - it is really small:
```{julia}
(2/19)*Max^19
(2/19) * Max^19
```
Basically that is machine precision. Which means, that as far as can be told on the computer, the value produced by $2s + s \cdot p$ is about as accurate as can be done.
To try this out to compute $\log(5)$. We have $5 = 2^2(1+0.25)$, so $k=2$ and $m=0.25$.
We try this out to compute $\log(5)$. We have $5 = 2^2(1+ 1/4)$, so $k=2$ and $m=1/4$.
```{julia}
k, m = 2, 0.25
k, m = 2, 1/4
s = m / (2+m)
pₗ = 2 * sum(s^(2i)/(2i+1) for i in 1:8) # where the polynomial approximates the logarithm...
@@ -1209,7 +1201,10 @@ $$
$$
h(x)=b_0 + b_1 (x-x_n) + b_2(x-x_n)(x-x_{n-1}) + \cdots + b_n (x-x_n)(x-x_{n-1})\cdot\cdots\cdot(x-x_1).
\begin{align*}
h(x)&=b_0 + b_1 (x-x_n) + b_2(x-x_n)(x-x_{n-1}) + \cdots \\
&+ b_n (x-x_n)(x-x_{n-1})\cdot\cdots\cdot(x-x_1).
\end{align*}
$$
These two polynomials are of degree $n$ or less and have $u(x) = h(x)-g(x)=0$, by uniqueness. So the coefficients of $u(x)$ are $0$. We have that the coefficient of $x^n$ must be $a_n-b_n$ so $a_n=b_n$. Our goal is to express $a_n$ in terms of $a_{n-1}$ and $b_{n-1}$. Focusing on the $x^{n-1}$ term, we have:

View File

@@ -1,3 +1,7 @@
---
engine: julia
---
# Differential vector calculus
This section discussions generalizations of the derivative to functions which have more than one input and/or one output.

View File

@@ -1,4 +1,5 @@
[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
CalculusWithJulia = "a2e0e22d-7d4c-5312-9169-8b992201a882"
Contour = "d38c429a-6771-53c6-b99e-75d170b6e991"
@@ -17,6 +18,7 @@ QuadGK = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
ScatteredInterpolation = "3f865c0f-6dca-5f4d-999b-29fe1e7e3c92"
SplitApplyCombine = "03a91e81-4c3e-53e1-a0a4-9c0c8f19dd66"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -9,6 +9,7 @@ files = (
"scalar_functions",
"scalar_functions_applications",
"vector_fields",
"matrix_calculus_notes.qmd",
"plots_plotting",
)

View File

@@ -0,0 +1,931 @@
# Matrix calculus
This section illustrates a more general setting for taking derivatives, that unifies the different expositions taken prior.
::: {.callout-note appearance="minimal"}
## Based on Bright, Edelman, and Johnson's notes
This section has essentially no original contribution, as it basically samples material from the notes [Matrix Calculus (for Machine Learning and Beyond)](https://arxiv.org/abs/2501.14787) by Paige Bright, Alan Edelman, and Steven G. Johnson. Their notes cover material taught in a course at MIT. Support materials for their course in `Julia` are available at [https://github.com/mitmath/matrixcalc/tree/main](https://github.com/mitmath/matrixcalc/tree/main). For more details and examples, please refer to the source.
:::
## Review
We have seen several "derivatives" of a function, based on the number of inputs and outputs. The first one was for functions $f: R \rightarrow R$.
In this case, we saw that $f$ has a derivative at $c$ if this limit exists:
$$
\lim_{h \rightarrow 0}\frac{f(c + h) - f(c)}{h}.
$$
The derivative as a function of $x$ uses this rule for any $x$ in the domain.
Common notation is:
$$
f'(x) = \frac{dy}{dx} = \lim_{h \rightarrow 0}\frac{f(x + h) - f(x)}{h}
$$
(when the limit exists).
This limit gets re-expressed in different ways:
* linearization writes $f(x+\Delta x) - f(x) \approx f'(x)\Delta x$, where $\Delta x$ is a small displacement from $x$. The reason there isn't equality is the unwritten higher order terms that vanish in a limit.
* Alternate limits. Another way of writing this is in terms of explicit smaller order terms:
$$
(f(x+h) - f(x)) - f'(x)h = \mathscr{o}(h),
$$
which means if we divide both sides by $h$ and take the limit, we will get $0$ on the right and the relationship on the left.
* Differential notation simply writes this as $dy = f'(x)dx$. Focusing on $f$ and not $y=f(x)$, we might write
$$
df = f(x+dx) - f(x) = f'(x) dx.
$$
In the above, $df$ and $dx$ are differentials, made rigorous by a limit, which hides the higher order terms.
We will see all the derivatives encountered so far can be similarly expressed as this last characterization.
### Univariate, vector-valued
For example, when $f: R \rightarrow R^m$ was a vector-valued function the derivative was defined similarly through a limit of $(f(t + \Delta t) - f(t))/{\Delta t}$, where each component needed to have a limit. This can be rewritten through $f(t + dt) - f(t) = f'(t) dt$, again using differentials to avoid the higher order terms.
### Multivariate, scalar-valued
When $f: R^n \rightarrow R$ is a scalar-valued function with vector inputs, differentiability was defined by a gradient existing with $f(c+h) - f(c) - \nabla{f}(c) \cdot h$ being $\mathscr{o}(\|h\|)$. In other words $df = f(c + dh) - f(c) = \nabla{f}(c) \cdot dh$. The gradient has the same shape as $c$, a column vector. If we take the row vector (e.g. $f'(c) = \nabla{f}(c)^T$) then again we see $df = f(c+dh) - f(c) = f'(c) dh$, where the last term uses matrix multiplication of a row vector times a column vector.
### Multivariate, vector-valued
Finally, when $f:R^n \rightarrow R^m$, the Jacobian was defined and characterized by
$\| f(x + dx) - f(x) - J_f(x)dx \|$ being $\mathscr{o}(\|dx\|)$. Again, we can express this through $df = f(x + dx) - f(x) = f'(x)dx$ where $f'(x) = J_f(x)$.
### Vector spaces
The generalization of the derivative involves linear operators which are defined for vector spaces.
A [vector space](https://en.wikipedia.org/wiki/Vector_space) is a set of mathematical objects which can be added together and also multiplied by a scalar. Vectors of similar size, as previously discussed, are the typical example, with vector addition and scalar multiplication already defined. Matrices of similar size (and some subclasses) also form a vector space.
Additionally, many other set of objects form vector spaces. Certain families of functions form examples such as: polynomial functions of degree $n$ or less; continuous functions, or functions with a certain number of derivatives. The last two are infinite dimensional; our focus here is on finite dimensional vector spaces.
Let's take differentiable functions as an example. These form a vector space as the derivative of a linear combination of differentiable functions is defined through the simplest derivative rule: $[af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'$. If $f$ and $g$ are differentiable, then so is $af(x)+bg(x)$.
A finite vector space is described by a *basis*---a minimal set of vectors needed to describe the space, after consideration of linear combinations. For some typical vector spaces, this is the set of special vectors with $1$ as one of the entries, and $0$ otherwise.
A key fact about a basis for a finite vector space is every vector in the vector space can be expressed *uniquely* as a linear combination of the basis vectors. The set of numbers used in the linear combination, along with an order to the basis, means an element in a finite vector space can be associated with a unique coordinate vector.
Vectors and matrices have properties that are generalizations of the real numbers. As vectors and matrices form vector spaces, the concept of addition of vectors and matrices is defined, as is scalar multiplication. Additionally, we have seen:
* The dot product between two vectors of the same length is defined easily ($v\cdot w = \Sigma_i v_i w_i$). It is coupled with the length as $\|v\|^2 = v\cdot v$.
* Matrix multiplication is defined for two properly sized matrices. If $A$ is $m \times k$ and $B$ is $k \times n$ then $AB$ is a $m\times n$ matrix with $(i,j)$ term given by the dot product of the $i$th row of $A$ (viewed as a vector) and the $j$th column of $B$ (viewed as a vector). Matrix multiplication is associative but *not* commutative. (E.g. $(AB)C = A(BC)$ but $AB$ and $BA$ need not be equal, or even defined, as the shapes may not match up.)
* A square matrix $A$ has an *inverse* $A^{-1}$ if $AA^{-1} = A^{-1}A = I$, where $I$ is the identity matrix (a matrix which is zero except on its diagonal entries, which are all $1$). Square matrices may or may not have an inverse. A matrix without an inverse is called *singular*.
* Viewing a vector as a matrix is possible. The association chosen here is common and is through a *column* vector.
* The *transpose* of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so $v\cdot w = v^T w$, where we use a superscript $T$ for the transpose. The transpose of a product, is the product of the transposes---reversed: $(AB)^T = B^T A^T$; the transpose of a transpose is an identity operation: $(A^T)^T = A$; the inverse of a transpose is the transpose of the inverse: $(A^{-1})^T = (A^T)^{-1}$.
* Matrices for which $A = A^T$ are called symmetric.
* The *adjoint* of a matrix is related to the transpose, only complex conjugates are also taken. When a matrix has real components, the adjoint and transpose are identical operations.
* The trace of a square matrix is just the sum of its diagonal terms
* The determinant of a square matrix is involved to compute, but was previously seen to have a relationship to the volume of a certain parallellpiped.
These operations have different inputs and outputs: the determinant and trace take a (square) matrix and return a scalar; the inverse takes a square matrix and returns a square matrix (when defined); the transpose and adjoint take a rectangular matrix and return a rectangular matrix.
In addition to these, there are a few other key operations on matrices described in the following.
### Linear operators
The @BrightEdelmanJohnson notes cover differentiation of functions in this uniform manner extending the form by treating derivatives more generally as *linear operators*.
A [linear operator](https://en.wikipedia.org/wiki/Operator_(mathematics)) is a mathematical object which satisfies
$$
f[\alpha v + \beta w] = \alpha f[v] + \beta f[w].
$$
where the $\alpha$ and $\beta$ are scalars, and $v$ and $w$ come from a *vector space*.
Taking the real numbers as a vector space, then regular multiplication is a linear operation, as $c \cdot (ax + by) = a\cdot(cx) + b\cdot(cy)$ using the distributive and commutative properties.
Taking $n$-dimensional vectors as vector space, matrix multiplication by an $n \times n$ matrix on the left will be a linear operator as $M(av + bw) = a(Mv) + b(Mw)$, using distribution and the commutative properties of scalar multiplication.
We saw differential functions form a vector space, the derivative is a linear operator, as $[af(x) + bg(x)]' = af'(x) + bg'(x)$.
::: {.callout-note appearance="minimal"}
## The use of `[]`
The referenced notes identify $f'(x) dx$ as $f'(x)[dx]$, the latter emphasizing $f'(x)$ acts on $dx$ and the notation is not commutative (e.g., it is not $dx f'(x)$). The use of $[]$ is to indicate that $f'(x)$ "acts" on $dx$ in a linear manner. It may be multiplication, matrix multiplication, or something else. Parentheses are not used which might imply function application or multiplication.
:::
## The derivative as a linear operator
We take the view that a derivative is a linear operator where $df = f(x+dx) + f(x) = f'(x)[dx]$.
In writing $df = f(x + dx) - f(x) = f'(x)[dx]$ generically, some underlying facts are left implicit: $dx$ has the same shape as $x$ (so can be added) and there is an underlying concept of distance and size that allows the above to be made rigorous. This may be an absolute value or a norm.
##### Example: directional derivatives
Suppose $f: R^n \rightarrow R$, a scalar-valued function of a vector. Then the directional derivative at $x$ in the direction $v$ was defined for a scalar $\alpha$ by:
$$
\frac{\partial}{\partial \alpha}f(x + \alpha v) \mid_{\alpha = 0} =
\lim_{\Delta\alpha \rightarrow 0} \frac{f(x + \Delta\alpha v) - f(x)}{\Delta\alpha}.
$$
This rate of change in the direction of $v$ can be expressed through the linear operator $f'(x)$ via
$$
df = f(x + d\alpha v) - f(x) = f'(x) [d\alpha v] = d\alpha f'(x)[v],
$$
using linearity to move the scalar multiplication by $d\alpha$ outside the action of the linear operator. This connects the partial derivative at $x$ in the direction of $v$ with $f'(x)$:
$$
\frac{\partial}{\partial \alpha}f(x + \alpha v) \mid_{\alpha = 0} =
f'(x)[v].
$$
Not only does this give a connection in notation with the derivative, it naturally illustrates how the derivative as a linear operator can act on non-infinitesimal values, in this case on $v$.
Previously, we wrote $\nabla f \cdot v$ for the directional derivative, where the gradient is a column vector.
The above uses the identification $f' = (\nabla f)^T$.
For $f: R^n \rightarrow R$ we have $df = f(x + dx) - f(x) = f'(x) [dx]$ is a scalar, so if $dx$ is a column vector, $f'(x)$ is a row vector with the same number of components (just as $\nabla f$ is a column vector with the same number of components). The operation $f'(x)[dx]$ is just matrix multiplication, which is a linear operation.
##### Example: derivative of a matrix expression
@BrightEdelmanJohnson include this example to show that the computation of derivatives using components can be avoided. Consider $f(x) = x^T A x$ where $x$ is a vector in $R^n$ and $A$ is an $n\times n$ matrix. This type of expression is common.
Then $f: R^n \rightarrow R$ and its derivative can be computed:
$$
\begin{align*}
df &= f(x + dx) - f(x)\\
&= (x + dx)^T A (x + dx) - x^TAx \\
&= \textcolor{blue}{x^TAx} + dx^TA x + x^TAdx + \textcolor{red}{dx^T A dx} - \textcolor{blue}{x^TAx}\\
&= dx^TA x + x^TAdx \\
&= (dx^TAx)^T + x^TAdx \\
&= x^T A^T dx + x^T A dx\\
&= x^T(A^T + A) dx
\end{align*}
$$
The term $dx^t A dx$ is dropped, as it is higher order (goes to zero faster), it containing two $dx$ terms.
In the second to last step, an identity operation (taking the transpose of the scalar quantity) is taken to simplify the algebra. Finally, as $df = f'(x)[dx]$ the identity of $f'(x) = x^T(A^T+A)$ is made, or taking transposes $\nabla f(x) = (A + A^T)x$.
Compare the elegance above with the component version, even though simplified, which still requires a specification of the size to carry the following out:
```{julia}
using SymPy
@syms x[1:3]::real A[1:3, 1:3]::real
u = x' * A * x
grad_u = [diff(u, xi) for xi in x]
```
Compare to the formula for the gradient just derived:
```{julia}
grad_u_1 = (A + A')*x
```
The two are, of course, equal
```{julia}
all(a == b for (a,b) ∈ zip(grad_u, grad_u_1))
```
##### Example: derivative of matrix application
For $f: R^n \rightarrow R^m$, @BrightEdelmanJohnson give an example of computing the Jacobian without resorting to component wise computations. Let $f(x) = Ax$ with $A$ being a $m \times n$ matrix, it follows that
$$
\begin{align*}
df &= f(x + dx) - f(x)\\
&= A(x + dx) - Ax\\
&= Adx\\
&= f'(x)[dx].
\end{align*}
$$
The Jacobian is the linear operator $A$ acting on $dx$. (Seeing that $Adx = f'(x)[dx]$ implies $f'(x)=A$ comes as this action is true for any $dx$, hence the actions must be the same.)
## Differentation rules
Various differentiation rules are still available such as the sum, product, and chain rules.
### Sum and product rules for the derivative
Using the differential notation---which implicitly ignores higher order terms as they vanish in a limit---the sum and product rules can be derived.
For the sum rule, let $f(x) = g(x) + h(x)$. Then
$$
\begin{align*}
df &= f(x + dx) - f(x) \\
&= f'(x)[dx]\\
&= \left(g(x+dx) + h(x+dx)\right) - \left(g(x) + h(x)\right)\\
&= \left(g(x + dx) - g(x)\right) + \left(h(x + dx) - h(x)\right)\\
&= g'(x)[dx] + h'(x)[dx]\\
&= \left(g'(x) + h'(x)\right)[dx]
\end{align*}
$$
Comparing we get $f'(x)[dx] = (g'(x) + h'(x))[dx]$ or $f'(x) = g'(x) + h'(x)$. (The last two lines above show how the new linear operator $g'(x) + h'(x)$ is defined on a value, but adding the application for each.
The sum rule has the same derivation as was done with univariate, scalar functions. Similarly for the product rule.
The product rule for $f(x) = g(x)h(x)$ comes as:
$$
\begin{align*}
df &= f(x + dx) - f(x) \\
&= g(x+dx)h(x + dx) - g(x) h(x)\\
&= \left(g(x) + g'(x)[dx]\right)\left(h(x) + h'(x) [dx]\right) - g(x) h(x) \\
&= \textcolor{blue}{g(x)h(x)} + g'(x) [dx] h(x) + g(x) h'(x) [dx] + \textcolor{red}{g'(x)[dx] h'(x) [dx]} - \textcolor{blue}{g(x) h(x)}\\
&= \left(g'(x)[dx]\right)h(x) + g(x)\left(h'(x) [dx]\right)\\
&= dg h + g dh
\end{align*}
$$
**after** dropping the higher order term and cancelling $gh$ terms of opposite signs in the fourth row.
##### Example
These two rules can be used to directly show the last two examples.
First, if $f(x) = Ax$ and $A$ is a constant, then:
$$
df = (dA)x + A(dx) = 0x + A dx = A dx,
$$
Next, to differentiate $f(x) = x^TAx$:
$$
\begin{align*}
df &= dx^T (Ax) + x^T d(Ax) \\
&= (dx^T (Ax))^T + x^T A dx \\
&= x^T A^T dx + x^T A dx \\
&= x^T(A^T + A) dx
\end{align*}
$$
In the second line the transpose of the scalar quantity $x^TAdx$ it taken to simplify the expression and the first calculation is used.
When $A^T = A$ ($A$ is symmetric) this simplifies to a more familiar looking $2x^TA$, but we see that this requires assumptions not needed in the scalar case.
##### Example
@BrightEdelmanJohnson consider what in `Julia` is `.*`. That is the operation:
$$
v .* w =
\begin{bmatrix}
v_1w_1 \\
v_2w_2 \\
\vdots\\
v_nw_n
\end{bmatrix}
=
\begin{bmatrix}
v_1 & 0 & \cdots & 0 \\
0 & v_2 & \cdots & 0 \\
& & \vdots & \\
0 & 0 & \cdots & v_n
\end{bmatrix}
\begin{bmatrix}
w_1 \\
w_2 \\
\vdots\\
w_n
\end{bmatrix}
= \text{diag}(v) w.
$$
They compute the derivative of $f(x) = A(x .* x)$ for some fixed matrix $A$ of the proper size.
We can see by the product rule that $d (\text{diag}(v)w) = d(\text{diag}(v)) w + \text{diag}(v) dw = (dx) .* w + x .* dw$. So
$df = A(dx .* x + x .* dx) = 2A(x .* dx)$, as $.*$ is commutative by its definition. Writing this as $df = 2A(x .* dx) = 2A(\text{diag}(x) dx) = (2A\text{diag}(x)) dx$, we identify $f'(x) = 2A\text{diag}(x)$.
This operation is called the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) and it extends to matrices and arrays.
::: {.callout-note appearance="minimal"}
## Numerator layout
The Wikipedia page on [matrix calculus](https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions) has numerous such "identities" for derivatives of different common matrix/vector expressions. As vectors are viewed as column vectors; the "numerator layout" identities apply.
:::
### The chain rule
Like the product rule, the chain rule is shown by @BrightEdelmanJohnson in this notation with $f(x) = g(h(x))$:
$$
\begin{align*}
df &= f(x + dx) - f(x)\\
&= g(h(x + dx)) - g(h(x))\\
&= g(h(x) + h'(x)[dx]) - g(h(x))\\
&= g(h(x)) + g'(h(x))[h'(x)[dx]] - g(h(x))\\
&= g'(h(x)) [h'(x) [dx]]\\
&= (g'(h(x)) h'(x)) [dx]
\end{align*}
$$
The operator $f'(x)= g'(h(x)) h'(x)$ is a product of matrices.
### Computational differences with expressions from the chain rule
Of note here is the application of the chain rule to three (or more compositions) where $c:R^n \rightarrow R^j$, $b:R^j \rightarrow R^k$, and $a:R^k \rightarrow R^m$:
If $f(x) = a(b(c(x)))$ then the derivative is:
$$
f'(x) = a'(b(c(x))) b'(c(x)) c'(x),
$$
which can be expressed as three matrix multiplications two ways:
$$
f' = (a'b')c' \text{ or } f' = a'(b'c')
$$
Multiplying left to right (the first) is called reverse mode; multiplying right to left (the second) is called forward mode. The distinction becomes important when considering the computational cost of the multiplications.
* If $f: R^n \rightarrow R^m$ has $n$ much bigger than $1$ and $m=1$, then it is much faster to do left to right multiplication (many more inputs than outputs)
* if $f:R^n \rightarrow R^m$ has $n=1$ and $m$ much bigger than one, the it is faster to do right to left multiplication (many outputs than inputs)
The reason comes down to the shape of the matrices. To see, we need to know that matrix multiplication of an $m \times q$ matrix times a $q \times n$ matrix takes an order of $mqn$ operations.
When $m=1$, the derivative is a product of matrices of size $n\times j$, $j\times k$, and $k \times 1$ yielding a matrix of size $n \times 1$ matching the function dimension.
The operations involved in multiplication from left to right can be quantified. The first operation takes $njk$ operation leaving an $n\times k$ matrix, the next multiplication then takes another $nk1$ operations or $njk + nk$ together.
Whereas computing from the right to left is first $jk1$ operations leaving a $j \times 1$ matrix. The next operation would take another $nk1$ operations. In total:
* left to right is $njk + nk$ = $nk \cdot (j + 1)$.
* right to left is $jk + jn = j\cdot (k+n)$.
When $j=k$, say, we can compare and see the second is a factor less in terms of operations. This can be quite significant in higher dimensions.
##### Example
Using the `BenchmarkTools` package, we can check the time to compute various products:
```{julia}
using BenchmarkTools
n,j,k,m = 20,15,10,1
@btime A*(B*C) setup=(A=rand(n,j); B=rand(j,k); C=rand(k,m));
@btime (A*B)*C setup=(A=rand(n,j); B=rand(j,k); C=rand(k,m));
```
The latter computation is about 1.5 times slower.
Whereas the relationship is changed when the first matrix is skinny and the last is not:
```{julia}
@btime A*(B*C) setup=(A=rand(m,k); B=rand(k,j); C=rand(j,n));
@btime (A*B)*C setup=(A=rand(m,k); B=rand(k,j); C=rand(j,n));
```
----
In calculus, we typically have $n$ and $m$ are $1$, $2$,or $3$. But that need not be the case, especially if differentiation is over a parameter space.
## Derivatives of matrix functions
What is the the derivative of $f(A) = A^2$?
The function $f$ takes a $n\times n$ matrix and returns a matrix of the same size.
This derivative can be derived directly from the *product rule*:
$$
\begin{align*}
df &= d(A^2) = d(AA)\\
&= dA A + A dA
\end{align*}
$$
That is $f'(A)$ is the operator $f'(A)[\delta A] = A \delta A + \delta A A$. (This is not $2A\delta A$, as $A$ may not commute with $\delta A$.)
### Vectorization of a matrix
Alternatively, we can identify $A$ through its
components, as a vector in $R^{n^2}$ and then leverage the Jacobian.
One such identification is vectorization---consecutively stacking the
column vectors into a single vector. In `Julia` the `vec` function does this
operation:
```{julia}
@syms A[1:2, 1:2]
vec(A)
```
The stacking by column follows how `Julia` stores matrices and how `Julia` references entries in a matrix by linear index:
```{julia}
vec(A) == [A[i] for i in eachindex(A)]
```
With this vectorization operation, $f$ may be viewed as
$\tilde{f}:R^{n^2} \rightarrow R^{n^2}$ through:
$$
\tilde{f}(\text{vec}(A)) = \text{vec}(f(A))
$$
We use `SymPy` to compute the Jacobian of this vector valued function.
```{julia}
@syms A[1:3, 1:3]::real
f(x) = x^2
J = vec(f(A)).jacobian(vec(A)) # jacobian of f̃
```
We do this via linear algebra first, then see a more elegant manner following the notes.
A course in linear algebra shows that any linear operator on a finite vector space can be represented as a matrix. The basic idea is to represent what the operator does to each *basis* element and put these values as columns of the matrix.
In this $3 \times 3$ case, the linear operator works on an object with $9$ slots and returns an object with $9$ slots, so the matrix will be $9 \times 9$.
The basis elements are simply the matrices with a $1$ in spot $(i,j)$ and zero elsewhere. Here we generate them through a function:
```{julia}
basis(i,j,A) = (b=zeros(Int, size(A)...); b[i,j] = 1; b)
JJ = [vec(basis(i,j,A)*A + A*basis(i,j,A)) for j in 1:3 for i in 1:3]
```
The elements of `JJ` show the representation of each of the $9$ basis elements under the linear transformation.
To construct the matrix representing the linear operator, we need to concatenate these horizontally as column vectors
```{julia}
JJ = hcat(JJ...)
```
The matrix $JJ$ is identical to $J$, above:
```{julia}
all(j == jj for (j, jj) in zip(J, JJ))
```
### Kronecker products
But how can we see the Jacobian, $J$, from the linear operator $f'(A)[\delta A] = \delta A A + A \delta A$?
To make this less magical, a related operation to `vec` is defined.
The $\text{vec}$ function takes a matrix and stacks its columns.
The $\text{vec}$ function can turn a matrix into a vector, so it can be used for finding the Jacobian, as above. However the shape of the matrix is lost, as are the fundamental matrix operations, like multiplication.
The [Kronecker product](https://en.wikipedia.org/wiki/Kronecker_product) replicates values making a bigger matrix. That is, if $A$ and $B$ are matrices, the Kronecker product replaces each value in $A$ with that value times $B$, making a bigger matrix, as each entry in $A$ is replaced by an entry with size $B$.
Formally,
$$
A \otimes B =
\begin{bmatrix}
a_{11}B & a_{12}B & \cdots & a_{1n}B \\
a_{21}B & a_{22}B & \cdots & a_{2n}B \\
&\vdots & & \\
a_{m1}B & a_{m2}B & \cdots & a_{mn}B
\end{bmatrix}
$$
The function `kron` forms this product:
```{julia}
@syms A[1:2, 1:3] B[1:3, 1:4]
kron(A, B) # same as hcat((vcat((A[i,j]*B for i in 1:2)...) for j in 1:3)...)
```
The $m\times n$ matrix $A$ and $j \times k$ matrix $B$ has a Kronecker product with size $mj \times nk$.
The Kronecker product has a certain algebra, including:
* transposes: $(A \otimes B)^T = A^T \otimes B^T$
* orthogonal: $(A\otimes B)^T = (A\otimes B)$ if both $A$ and $B$ has the same property
* trace (sum of diagonal): $\text{tr}(A \otimes B) = \text{tr}(A)\text{tr}(B)$.* determinants: $\det(A\otimes B) = \det(A)^m \det(B)^n$, where $A$ is $n\times n$, $B$ is $m \times m$.
* inverses: $(A \otimes B)^{-1} = (A^{-1}) \otimes (B^{-1})$
* multiplication: $(A\otimes B)(C \otimes D) = (AC) \otimes (BD)$
The main equation coupling `vec` and `kron` is the fact that if $A$, $B$, and $C$ have appropriate sizes, then:
$$
(A \otimes B) \text{vec}(C) = \text{vec}(B C A^T).
$$
Appropriate sizes for $A$, $B$, and $C$ are determined by the various products in $BCA^T$.
If $A$ is $m \times n$ and $B$ is $r \times s$, then since $BC$ is defined, $C$ has $s$ rows, and since $CA^T$ is defined, $C$ must have $n$ columns, as $A^T$ is $n \times m$, so $C$ must be $s\times n$. Checking this is correct on the other side, $A \times B$ would be size $mr \times ns$ and $\vec{C}$ would be size $sn$, so that product works, size wise.
The referred to notes have an explanation for this formula, but we only confirm it with an example using $m=n=2$ and $r=s=3$:
```{julia}
@syms A[1:2, 1:2]::real B[1:3, 1:3]::real C[1:3, 1:2]::real
L, R = kron(A,B)*vec(C), vec(B*C*A')
all(l == r for (l, r) ∈ zip(L, R))
```
----
Now to use this relationship to recognize $df = A dA + dA A$ with the Jacobian computed from $\text{vec}(f(a))$.
We have $\text{vec}(A dA + dA A) = \text{vec}(A dA) + \text{vec}(dA A)$, by obvious linearity of $\text{vec}$. Now inserting an identity matrix, $I$, which is symmteric, in a useful spot we have:
$$
\text{vec}(A dA) = \text{vec}(A dA I^T) = (I \otimes A) \text{vec}(dA),
$$
and
$$
\text{vec}(dA A) = \text{vec}(I dA (A^T)^T) = (A^T \otimes I) \text{vec}(dA).
$$
This leaves
$$
\text{vec}(A dA + dA A) =
\left((I \otimes A) + (A^T \otimes I)\right) \text{vec}(dA)
$$
We should then get the Jacobian we computed from the following:
```{julia}
@syms A[1:3, 1:3]::real
using LinearAlgebra: I
J = vec(A^2).jacobian(vec(A))
JJ = kron(I(3), A) + kron(A', I(3))
all(j == jj for (j,jj) in zip(J,JJ))
```
This technique can also be used with other powers, say $f(A) = A^3$, where the resulting $df = A^2 dA + A dA A + dA A^2$ is one answer that can be compared to a Jacobian through
$$
\begin{align*}
df &= \text{vec}(A^2 dA I^T) + \text{vec}(A dA A) + \text{vec}(I dA A^2)\\
&= (I \otimes A^2)\text{vec}(dA) + (A^T \otimes A) \text{vec}(dA) + ((A^T)^2 \otimes I) \text{vec}(dA)
\end{align*}
$$
The above shows how to relate the derivative of a matrix function to
the Jacobian of a vectorized function, but only for illustration. It
is certainly not necessary to express the derivative of $f$ in terms of
the derivative of its vectorized counterpart.
##### Example: derivative of the matrix inverse
What is the derivative of $f(A) = A^{-1}$? The same technique used to find the derivative of the inverse of a univariate, scalar-valued function is useful.
Starting with $I = AA^{-1}$ and noting $dI$ is $0$ we have
$$
\begin{align*}
0 &= d(AA^{-1})\\
&= dAA^{-1} + A d(A^{-1})
\end{align*}
$$
So, $d(A^{-1}) = -A^{-1} dA A^{-1}$.
This could be re-expressed as a linear operator through
$$
\text{vec}(dA^{-1}) =
\left((A^{-1})^T \otimes A^{-1}\right) \text{vec}(dA)
= \left((A^T)^{-1} \otimes A^{-1}\right) \text{vec}(dA).
$$
##### Example: derivative of the matrix determinant
Let $f(A) = \text{det}(A)$. What is the derivative?
First, the determinant of a square, $n\times n$, matrix $A$ is a scalar summary of $A$. There are different means to compute the determinant, but this recursive one in particular is helpful here:
$$
\text{det}(A) = a_{1j}C_{1j} + a_{2j}C_{2j} + \cdots a_{nj}C_{nj}
$$
for any $j$. The *cofactor* $C_{ij}$ is the determinant of the $(n-1)\times(n-1)$ matrix with the $i$th row and $j$th column deleted times $(-1)^{i+j}$.
To find the *gradient* of $f$, we differentiate by each of the $A_{ij}$ variables, and so
$$
\frac{\partial\text{det}(A)}{\partial A_{ij}} =
\frac{\partial (a_{1j}C_{1j} + a_{2j}C_{2j} + \cdots a_{nj}C_{nj})}{\partial A_{ij}} =
C_{ij},
$$
as each cofactor in the expansion has no dependence on $A_{ij}$ as the cofactor removes the $i$th row and $j$th column.
So the gradient is the matrix of cofactors.
@BrightEdelmanJohnson also give a different proof, starting with this observation:
$$
\text{det}(I + dA) - \text{det}(I) = \text{tr}(dA).
$$
Assuming that, then by the fact $\text{det}(AB) = \text{det}(A)\text{det}(B)$:
$$
\begin{align*}
\text{det}(A + A(A^{-1}dA)) - \text{det}(A) &= \text{det}(A)\cdot(\text{det}(I+ A^{-1}dA) - \text{det}(I)) \\
&= \text{det}(A) \text{tr}(A^{-1}dA)\\
&= \text{tr}(\text{det}(A)A^{-1}dA).
\end{align*}
$$
This agrees through a formula to compute the inverse of a matrix through its cofactor matrix divided by its determinant.
That the trace gets involved, can be seen from this computation, which shows the only first-order terms are from the diagonal sum:
```{julia}
using LinearAlgebra
@syms dA[1:2, 1:2]
det(I + dA) - det(I)
```
## The adjoint method
The chain rule brings about a series of products. The adjoint method illustrated by @BrightEdelmanJohnson and summarize below, shows how to approach the computation of the series in a direction that minimizes the computational cost, illustrating why reverse mode is preferred to forward mode when a scalar function of several variables is considered.
@BrightEdelmanJohnson consider the derivative of
$$
g(p) = f(A(p)^{-1} b)
$$
This might arise from applying a scalar-valued $f$ to the solution of $Ax = b$, where $A$ is parameterized by $p$. The number of parameters might be quite large, so how the resulting computation is organized might effect the computational costs.
The chain rule gives the following computation to find the derivative (or gradient):
$$
\begin{align*}
dg
&= f'(x)[dx]\\
&= f'(x) [d(A(p)^{-1} b)]\\
&= f'(x)[-A(p)^{-1} dA A(p)^{-1} b + 0]\\
&= -\textcolor{red}{f'(x) A(p)^{-1}} dA\textcolor{blue}{A(p)^{-1}[b]}.
\end{align*}
$$
By setting $v^T = f'(x)A(p)^{-1}$ and writing $x = A(p)^{-1}[b]$ this becomes
$$
dg = -v^T dA x.
$$
This product of three terms can be computed in two directions:
*From left to right:*
First $v$ is found by solving $v^T = f'(x) A^{-1}$ through
the solving of
$v = (A^{-1})^T (f'(x))^T = (A^T)^{-1} \nabla(f)$
or by solving $A^T v = \nabla f$. This is called the *adjoint* equation.
The partial derivatives in $p$ of $g$ are related to each partial derivative of $dA$ through:
$$
\frac{\partial g}{\partial p_k} = -v^T\frac{\partial A}{\partial p_k} x,
$$
as the scalar factor commutes through. With $v$ and $x$ solved for (via the adjoint equation and from solving $Ax=b$) the partials in $p_k$ are computed with dot products. There are just two costly operations.
*From right to left:*
The value of $x$ can be solved for, as above, but computing the value of
$$
\frac{\partial g}{\partial p_k} =
-f'(x) \left(A^{-1} \frac{\partial A}{\partial p_k} x \right)
$$
requires a costly solve of $A^{-1}\frac{\partial A}{\partial p_k} x$ for each $p_k$, and $p$ may have many components. This is the difference: left to right only has the solve of the one adjoint equation.
As mentioned above, the reverse mode offers advantages when there are many input parameters ($p$) and a single output parameter.
##### Example
Suppose $x(p)$ solves some system of equations $h(x(p),p) = 0$ in $R^n$ ($n$ possibly just $1$) and $g(p) = f(x(p))$ is some non-linear transformation of $x$. What is the derivative of $g$ in $p$?
Suppose the *implicit function theorem* applies to $h(x,p) = 0$, that is *locally* the response $x(p)$ has a derivative, and moreover by the chain rule
$$
0 = \frac{\partial h}{\partial p} dp + \frac{\partial h}{\partial x} dx.
$$
Solving the above for $dx$ gives:
$$
dx = -\left(\frac{\partial h}{\partial x}\right)^{-1} \frac{\partial h}{\partial p} dp.
$$
The chain rule applied to $g(p) = f(x(p))$ then yields
$$
dg = f'(x) dx = - f'(x) \left(\frac{\partial h}{\partial x}\right)^{-1} \frac{\partial h}{\partial p} dp = -v^T\frac{\partial h}{\partial p} dp,
$$
by setting
$$
v^T = f'(x) \left(\frac{\partial h}{\partial x}\right)^{-1}.
$$
Here $v$ can be solved for by taking adjoints (as before). Let $A = \partial h/\partial x$, then $v^T = f'(x) A^{-1}$ or $v = (A^{-1})^T (f'(x))^t= (A^T)^{-1} \nabla f$. That is $v$ solves $A^Tv=\nabla f$. As before it would take two solves to get both $g$ and its gradient.
## Second derivatives, Hessian
We reference a theorem presented by @CarlssonNikitinTroedssonWendt for exposition with some modification
::: {.callout-note appearance="minimal"}
Theorem 1. Let $f:X \rightarrow Y$, where $X,Y$ are finite dimensional *inner product* spaces with elements in $R$. Suppose $f$ is smooth (a certain number of derivatives). Then for each $x$ in $X$ there exists a unique linear operator, $f'(x)$, and a unique *bilinear* *symmetric* operator $f'': X \oplus X \rightarrow Y$ such that
$$
f(x + \delta x) = f(x) + f'(x)[\delta x] + \frac{1}{2}f''(x)[\delta x, \delta x] + \mathscr(||\delta x ||^2).
$$
:::
New terms include *bilinear*, *symmetric*, and *inner product*. An operator ($X\oplus X \rightarrow Y$) is bilinear if it is a linear operator in each of its two arguments. Such an operator is *symmetric* if interchanging its two arguments makes no difference in its output. Finally, an *inner product* space is one with a generalization of the dot product. An inner product takes two vectors $x$ and $y$ and returns a scalar; it is denoted $\langle x,y\rangle$; and has properties of symmetry, linearity, and non-negativity ($\langle x,x\rangle \geq 0$, and equal $0$ only if $x$ is the zero vector.) Inner products can be used to form a norm (or length) for a vector through $||x||^2 = \langle x,x\rangle$.
We reference this, as the values denoted $f'$ and $f''$ are *unique*. So if we identify them one way, we have identified them.
Specializing to $X=R^n$ and $Y=R^1$, we have, $f'=\nabla f^T$ and $f''$ is the Hessian.
Take $n=2$. Previously we wrote a formula for Taylor's theorem for $f:R^n \rightarrow R$ that with $n=2$ has with $x=\langle x_1,x_2\rangle$:
$$
\begin{align*}
f(x + dx) &= f(x) +
\frac{\partial f}{\partial x_1} dx_1 + \frac{\partial f}{\partial x_2} dx_2\\
&{+} \frac{1}{2}\left(
\frac{\partial^2 f}{\partial x_1^2}dx_1^2 +
\frac{\partial^2 f}{\partial x_1 \partial x_2}dx_1dx_2 +
\frac{\partial^2 f}{\partial x_2^2}dx_2^2
\right) + \mathscr{o}(dx).
\end{align*}
$$
We can see $\nabla{f} \cdot dx = f'(x) dx$ to tidy up part of the first line, and more over the second line can be seen to be a matrix product:
$$
[dx_1 dx_2]
\begin{bmatrix}
\frac{\partial^2 f}{\partial x_1^2} &
\frac{\partial^2 f}{\partial x_1 \partial x_2}\\
\frac{\partial^2 f}{\partial x_2 \partial x_1} &
\frac{\partial^2 f}{\partial x_2^2}
\end{bmatrix}
\begin{bmatrix}
dx_1\\
dx_2
\end{bmatrix}
= dx^T H dx,
$$
$H$ being the *Hessian* with entries $H_{ij} = \frac{\partial f}{\partial x_i \partial x_j}$.
This formula---$f(x+dx)-f(x) \approx f'(x)dx + dx^T H dx$---is valid for any $n$, showing $n=2$ was just for ease of notation when expressing in the coordinates and not as matrices.
By uniqueness, we have under these assumptions that the Hessian is *symmetric* and the expression $dx^T H dx$ is a *bilinear* form, which we can identify as $f''(x)[dx,dx]$.
That the Hessian is symmetric could also be derived under these assumptions by directly computing that the mixed partials can have their order exchanged. But in this framework, as explained by @BrightEdelmanJohnson (and shown later) it is a result of the underlying vector space having an addition that is commutative (e.g. $u+v = v+u$).
The mapping $(u,v) \rightarrow u^T A v$ for a matrix $A$ is bilinear. For a fixed $u$, it is linear as it can be viewed as $(u^TA)[v]$ and matrix multiplication is linear. Similarly for a fixed $v$.
@BrightEdelmanJohnson extend this characterization to a broader setting.
We have for some function $f$
$$
df = f(x + dx) - f(x) = f'(x)[dx]
$$
Then if $d\tilde{x}$ is another differential change with the same shape as $x$ we can look at the differential of $f'(x)$:
$$
d(f') = f'(x + d\tilde{x}) - f'(x) = f''(x)[d\tilde{x}]
$$
Now, $d(f')$ has the same shape as $f'$, a linear operator, hence $d(f')$ is also a linear operator. Acting on $dx$, we have
$$
d(f')[dx] = f''(x)[d\tilde{x}][dx] = f''(x)[d\tilde{x}, dx].
$$
The last equality a definition. As $f''$ is linear in the the application to $d\tilde{x}$ and also linear in application to $dx$, $f''(x)$ is a bilinear operator.
Moreover, the following shows it is *symmetric*:
$$
\begin{align*}
f''(x)[d\tilde{x}][dx] &= (f'(x + d\tilde{x}) - f'(x))[dx]\\
&= f'(x + d\tilde{x})[dx] - f'(x)[dx]\\
&= (f(x + d\tilde{x} + dx) - f(x + d\tilde{x})) - (f(x+dx) - f(x))\\
&= (f(x + dx + d\tilde{x}) - f(x + dx)) - (f(x + d\tilde{x}) - f(x))\\
&= f'(x + dx)[d\tilde{x}] - f'(x)[d\tilde{x}]\\
&= f''(x)[dx][d\tilde{x}]
\end{align*}
$$
So $f''(x)[d\tilde{x},dx] = f''(x)[dx, d\tilde{x}]$. The key is the commutivity of vector addition to say $dx + d\tilde{x} = d\tilde{x} + dx$ in the third line.
##### Example: Hessian is symmetric
As mentioned earlier, the Hessian is the matrix arising from finding the second derivative of a multivariate, scalar-valued function $f:R^n \rightarrow R$. As a bilinear form on a finite vector space, it can be written as $\tilde{x}^T A x$. As this second derivative is symmetric, and this value above a scalar, it follows that $\tilde{x}^T A x = \tilde{x}^T A^T x$. That is $H = A$ must also be symmetric from general principles.
##### Example: second derivative of $x^TAx$
Consider an expression from earlier $f(x) = x^T A x$ for some constant $A$.
We have seen that $f' = (\nabla f)^T = x^T(A+A^T)$. That is $\nabla f = (A^T+A)x$ is linear in $x$. The Jacobian of $\nabla f$ is the Hessian, $H = f'' = A + A^T$.
##### Example: second derivative of $\text{det}(A)$
Consider $f(A) = \text{det}(A)$. We saw previously that:
$$
\begin{align*}
\text{tr}(A + B) &= \text{tr}(A) + \text{tr}(B)\\
\text{det}(A + dA') &= \text{det}(A) + \text{det}(A)\text{tr}(A^{-1}dA')\\
(A + dA') &= A^{-1} - A^{-1} dA' A^{-1}
\end{align*}
$$
These are all used to simplify:
$$
\begin{align*}
\text{det}(A+dA')&\text{tr}((A + dA')^{-1} dA) - \text{det}(A) \text{tr}(A^{-1}dA) \\
&= \left(
\text{det}(A) + \text{det}(A)\text{tr}(A^{-1}dA')
\right)
\text{tr}((A^{-1} - A^{-1}dA' A^{-1})dA)\\
&\quad{-} \text{det}(A) \text{tr}(A^{-1}dA) \\
&=
\textcolor{blue}{\text{det}(A) \text{tr}(A^{-1}dA)}\\
&\quad{+} \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA) \\
&\quad{-} \text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA)\\
&\quad{-} \textcolor{red}{\text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA' A^{-1}dA)}\\
&\quad{-} \textcolor{blue}{\text{det}(A) \text{tr}(A^{-1}dA)} \\
&= \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA) - \text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA)\\
&\quad{+} \textcolor{red}{\text{third order term}}
\end{align*}
$$
So, after dropping the third-order term, we see:
$$
f''(A)[dA,dA']
= \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA) -
\text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA).
$$

View File

@@ -1,4 +1,4 @@
# Polar Coordinates and Curves
# Polar coordinates and curves
{{< include ../_common_code.qmd >}}
@@ -226,7 +226,7 @@ The folium has radial part $0$ when $\cos(\theta) = 0$ or $\sin(2\theta) = b/4a$
plot_polar(𝒂0..(pi/2-𝒂0), 𝒓)
```
The second - which is too small to appear in the initial plot without zooming in - with
The second---which is too small to appear in the initial plot without zooming in---with
```{julia}

View File

@@ -388,7 +388,7 @@ For a scalar function, Define a *level curve* as the solutions to the equations
contour(xsₛ, ysₛ, zzsₛ)
```
Were one to walk along one of the contour lines, then there would be no change in elevation. The areas of greatest change in elevation - basically the hills - occur where the different contour lines are closest. In this particular area, there is a river that runs from the upper right through to the lower left and this is flanked by hills.
Were one to walk along one of the contour lines, then there would be no change in elevation. The areas of greatest change in elevation---basically the hills--- occur where the different contour lines are closest. In this particular area, there is a river that runs from the upper right through to the lower left and this is flanked by hills.
The $c$ values for the levels drawn may be specified through the `levels` argument:
@@ -636,7 +636,7 @@ This says, informally, for any scale about $L$ there is a "ball" about $C$ (not
In the univariate case, it can be useful to characterize a limit at $x=c$ existing if *both* the left and right limits exist and the two are equal. Generalizing to getting close in $R^m$ leads to the intuitive idea of a limit existing in terms of any continuous "path" that approaches $C$ in the $x$-$y$ plane has a limit and all are equal. Let $\gamma$ describe the path, and $\lim_{s \rightarrow t}\gamma(s) = C$. Then $f \circ \gamma$ will be a univariate function. If there is a limit, $L$, then this composition will also have the same limit as $s \rightarrow t$. Conversely, if for *every* path this composition has the *same* limit, then $f$ will have a limit.
The "two path corollary" is a trick to show a limit does not exist - just find two paths where there is a limit, but they differ, then a limit does not exist in general.
The "two path corollary" is a trick to show a limit does not exist---just find two paths where there is a limit, but they differ, then a limit does not exist in general.
### Continuity of scalar functions
@@ -997,7 +997,7 @@ The figure suggests a potential geometric relationship between the gradient and
We see here how the gradient of $f$, $\nabla{f} = \langle f_{x_1}, f_{x_2}, \dots, f_{x_n} \rangle$, plays a similar role as the derivative does for univariate functions.
First, we consider the role of the derivative for univariate functions. The main characterization - the derivative is the slope of the line that best approximates the function at a point - is quantified by Taylor's theorem. For a function $f$ with a continuous second derivative:
First, we consider the role of the derivative for univariate functions. The main characterization---the derivative is the slope of the line that best approximates the function at a point---is quantified by Taylor's theorem. For a function $f$ with a continuous second derivative:
$$
@@ -1174,7 +1174,7 @@ atand(mean(slopes))
Which seems about right for a generally uphill trail section, as this is.
In the above example, the data is given in terms of a sample, not a functional representation. Suppose instead, the surface was generated by `f` and the path - in the $x$-$y$ plane - by $\gamma$. Then we could estimate the maximum and average steepness by a process like this:
In the above example, the data is given in terms of a sample, not a functional representation. Suppose instead, the surface was generated by `f` and the path---in the $x$-$y$ plane---by $\gamma$. Then we could estimate the maximum and average steepness by a process like this:
```{julia}

View File

@@ -918,7 +918,7 @@ zs = fₗ.(xs, ys)
scatter3d!(xs, ys, zs)
```
A contour plot also shows that some - and only one - extrema happens on the interior:
A contour plot also shows that some---and only one---extrema happens on the interior:
```{julia}
@@ -967,10 +967,10 @@ We confirm this by looking at the Hessian and noting $H_{11} > 0$:
Hₛ = subs.(hessian(exₛ, [x,y]), x=>xstarₛ[x], y=>xstarₛ[y])
```
As it occurs at $(\bar{x}, \bar{y})$ where $\bar{x} = (x_1 + x_2 + x_3)/3$ and $\bar{y} = (y_1+y_2+y_3)/3$ - the averages of the three values - the critical point is an interior point of the triangle.
As it occurs at $(\bar{x}, \bar{y})$ where $\bar{x} = (x_1 + x_2 + x_3)/3$ and $\bar{y} = (y_1+y_2+y_3)/3$---the averages of the three values---the critical point is an interior point of the triangle.
As mentioned by Strang, the real problem is to minimize $d_1 + d_2 + d_3$. A direct approach with `SymPy` - just replacing `d2` above with the square root fails. Consider instead the gradient of $d_1$, say. To avoid square roots, this is taken implicitly from $d_1^2$:
As mentioned by Strang, the real problem is to minimize $d_1 + d_2 + d_3$. A direct approach with `SymPy`---just replacing `d2` above with the square root fails. Consider instead the gradient of $d_1$, say. To avoid square roots, this is taken implicitly from $d_1^2$:
$$
@@ -1016,7 +1016,7 @@ psₛₗ = [a*u for (a,u) in zip(asₛ₁, usₛ)]
plot!(polygon(psₛₗ)...)
```
Let's see where the minimum distance point is by constructing a plot. The minimum must be on the boundary, as the only point where the gradient vanishes is the origin, not in the triangle. The plot of the triangle has a contour plot of the distance function, so we see clearly that the minimum happens at the point `[0.5, -0.866025]`. On this plot, we drew the gradient at some points along the boundary. The gradient points in the direction of greatest increase - away from the minimum. That the gradient vectors have a non-zero projection onto the edges of the triangle in a direction pointing away from the point indicates that the function `d` would increase if moved along the boundary in that direction, as indeed it does.
Let's see where the minimum distance point is by constructing a plot. The minimum must be on the boundary, as the only point where the gradient vanishes is the origin, not in the triangle. The plot of the triangle has a contour plot of the distance function, so we see clearly that the minimum happens at the point `[0.5, -0.866025]`. On this plot, we drew the gradient at some points along the boundary. The gradient points in the direction of greatest increase---away from the minimum. That the gradient vectors have a non-zero projection onto the edges of the triangle in a direction pointing away from the point indicates that the function `d` would increase if moved along the boundary in that direction, as indeed it does.
```{julia}
@@ -1064,7 +1064,7 @@ The smallest value is when $t=0$ or $t=1$, so at one of the points, as `li` is d
##### Example: least squares
We know that two points determine a line. What happens when there are more than two points? This is common in statistics where a bivariate data set (pairs of points $(x,y)$) are summarized through a linear model $\mu_{y|x} = \alpha + \beta x$, That is the average value for $y$ given a particular $x$ value is given through the equation of a line. The data is used to identify what the slope and intercept are for this line. We consider a simple case - $3$ points. The case of $n \geq 3$ being similar.
We know that two points determine a line. What happens when there are more than two points? This is common in statistics where a bivariate data set (pairs of points $(x,y)$) are summarized through a linear model $\mu_{y|x} = \alpha + \beta x$, That is the average value for $y$ given a particular $x$ value is given through the equation of a line. The data is used to identify what the slope and intercept are for this line. We consider a simple case---$3$ points. The case of $n \geq 3$ being similar.
We have a line $l(x) = \alpha + \beta(x)$ and three points $(x_1, y_1)$, $(x_2, y_2)$, and $(x_3, y_3)$. Unless these three points *happen* to be collinear, they can't possibly all lie on the same line. So to *approximate* a relationship by a line requires some inexactness. One measure of inexactness is the *vertical* distance to the line:
@@ -1118,7 +1118,7 @@ As found, the formulas aren't pretty. If $x_1 + x_2 + x_3 = 0$ they simplify. Fo
subs(outₗₛ[β], sum(xₗₛ) => 0)
```
Let $\vec{x} = \langle x_1, x_2, x_3 \rangle$ and $\vec{y} = \langle y_1, y_2, y_3 \rangle$ this is simply $(\vec{x} \cdot \vec{y})/(\vec{x}\cdot \vec{x})$, a formula that will generalize to $n > 3$. The assumption is not a restriction - it comes about by subtracting the mean, $\bar{x} = (x_1 + x_2 + x_3)/3$, from each $x$ term (and similarly subtract $\bar{y}$ from each $y$ term). A process called "centering."
Let $\vec{x} = \langle x_1, x_2, x_3 \rangle$ and $\vec{y} = \langle y_1, y_2, y_3 \rangle$ this is simply $(\vec{x} \cdot \vec{y})/(\vec{x}\cdot \vec{x})$, a formula that will generalize to $n > 3$. The assumption is not a restriction---it comes about by subtracting the mean, $\bar{x} = (x_1 + x_2 + x_3)/3$, from each $x$ term (and similarly subtract $\bar{y}$ from each $y$ term). A process called "centering."
With this observation, the formulas can be re-expressed through:
@@ -1587,7 +1587,7 @@ $$
G(\epsilon_1, \epsilon_2) = L.
$$
Now, Lagrange's method can be employed. This will be fruitful - even though we know the answer - it being $\epsilon_1 = \epsilon_2 = 0$!
Now, Lagrange's method can be employed. This will be fruitful---even though we know the answer---it being $\epsilon_1 = \epsilon_2 = 0$!
Forging ahead, we compute $\nabla{F}$ and $\lambda \nabla{G}$ and set $\epsilon_1 = \epsilon_2 = 0$ where the two are equal. This will lead to a description of $y$ in terms of $y'$.

View File

@@ -24,7 +24,7 @@ For a scalar function $f: R^n \rightarrow R$, the gradient of $f$, $\nabla{f}$,
| $f: R\rightarrow R$ | univariate | familiar graph of function | $f$ |
| $f: R\rightarrow R^m$ | vector-valued | space curve when n=2 or 3 | $\vec{r}$, $\vec{N}$ |
| $f: R^n\rightarrow R$ | scalar | a surface when n=2 | $f$ |
| $F: R^n\rightarrow R^n$ | vector field | a vector field when n=2 | $F$ |
| $F: R^n\rightarrow R^n$ | vector field | a vector field when n=2, 3| $F$ |
| $F: R^n\rightarrow R^m$ | multivariable | n=2,m=3 describes a surface | $F$, $\Phi$ |
@@ -34,7 +34,9 @@ After an example where the use of a multivariable function is of necessity, we d
## Vector fields
We have seen that the gradient of a scalar function, $f:R^2 \rightarrow R$, takes a point in $R^2$ and associates a vector in $R^2$. As such $\nabla{f}:R^2 \rightarrow R^2$ is a vector field. A vector field can be visualized by sampling a region and representing the field at those points. The details, as previously mentioned, are in the `vectorfieldplot` function of `CalculusWithJulia`.
We have seen that the gradient of a scalar function, $f:R^2 \rightarrow R$, takes a point in $R^2$ and associates a vector in $R^2$. As such $\nabla{f}:R^2 \rightarrow R^2$ is a vector field. A vector field is a vector-valued function from $R^n \rightarrow R^n$ for $n \geq 2$.
An input/output pair can be visualized by identifying the input values as a point, and the output as a vector visualized by anchoring the vector at the point. A vector field is a sampling of such pairs, usually taken over some ordered grid. The details, as previously mentioned, are in the `vectorfieldplot` function of `CalculusWithJulia`.
```{julia}
@@ -78,6 +80,7 @@ Vector fields are also useful for other purposes, such as transformations, examp
For transformations, a useful visualization is to plot curves where one variables is fixed. Consider the transformation from polar coordinates to cartesian coordinates $F(r, \theta) = r \langle\cos(\theta),\sin(\theta)\rangle$. The following plot will show in blue fixed values of $r$ (circles) and in red fixed values of $\theta$ (rays).
::: {#fig-transformation-partial-derivative}
```{julia}
#| hold: true
@@ -95,10 +98,21 @@ pt = [1, pi/4]
J = ForwardDiff.jacobian(F, pt)
arrow!(F(pt...), J[:,1], linewidth=5, color=:red)
arrow!(F(pt...), J[:,2], linewidth=5, color=:blue)
pt = [0.5, pi/8]
J = ForwardDiff.jacobian(F, pt)
arrow!(F(pt...), J[:,1], linewidth=5, color=:red)
arrow!(F(pt...), J[:,2], linewidth=5, color=:blue)
```
Plot of a vector field from $R^2 \rightarrow R^2$ illustrated by drawing curves with fixed $r$ and $\theta$. The partial derivatives are added as layers.
:::
To the plot, we added the partial derivatives with respect to $r$ (in red) and with respect to $\theta$ (in blue). These are found with the soon-to-be discussed Jacobian. From the graph, you can see that these vectors are tangent vectors to the drawn curves.
The curves form a non-rectangular grid. Were the cells exactly parallelograms, the area would be computed taking into account the length of the vectors and the angle between them---the same values that come out of a cross product.
## Parametrically defined surfaces
@@ -136,7 +150,7 @@ When a surface is described as a level curve, $f(x,y,z) = c$, then the gradient
When a surface is described parametrically, there is no "gradient." The *partial* derivatives are of interest, e.g., $\partial{F}/\partial{\theta}$ and $\partial{F}/\partial{\phi}$, vectors defined componentwise. These will be lie in the tangent plane of the surface, as they can be viewed as tangent vectors for parametrically defined curves on the surface. Their cross product will be *normal* to the surface. The magnitude of the cross product, which reflects the angle between the two partial derivatives, will be informative as to the surface area.
### Plotting parametrized surfaces in `Julia`
### Plotting parameterized surfaces in `Julia`
Consider the parametrically described surface above. How would it be plotted? Using the `Plots` package, the process is quite similar to how a surface described by a function is plotted, but the $z$ values must be computed prior to plotting.
@@ -234,6 +248,217 @@ arrow!(Phi(pt...), out₁[:,1], linewidth=3)
arrow!(Phi(pt...), out₁[:,2], linewidth=3)
```
##### Example: A detour into plotting
The presentation of a 3D figure in a 2D format requires the use of linear perspective. The `Plots` package adds lighting effects, to nicely render a surface, as seen.
In this example, we see some of the mathematics behind how drawing a surface can be done more primitively to showcase some facts about vectors. We follow a few techniques learned from @Angenent.
```{julia}
#| echo: false
gr()
nothing
```
For our purposes we wish to mathematically project a figure onto a 2D plane.
The plane here is described by a view point in 3D space, $\vec{v}$. Taking this as one vector in an orthogonal coordinate system, the other two can be easily produced, the first by switching two coordinates, as would be done in 2D; the second through the cross product:
```{julia}
function projection_plane(v)
vx, vy, vz = v
a = [-vy, vx, 0] # v ⋅ a = 0
b = v × a # so v ⋅ b = 0
return (a/norm(a), b/norm(b))
end
```
Using these two unit vectors to describe the plane, the projection of a point onto the plane is simply found by taking dot products:
```{julia}
function project(x, v)
â, b̂ = projection_plane(v)
(x ⋅ â, x ⋅ b̂) # (x ⋅ â) â + (x ⋅ b̂) b̂
end
```
Let's see this in action by plotting a surface of revolution given by
```{julia}
radius(t) = 1 / (1 + exp(t))
t₀, tₙ = 0, 3
surf(t, θ) = [t, radius(t)*cos(θ), radius(t)*sin(θ)]
```
We begin by fixing a view point and plotting the projected axes. We do the latter with a function for re-use.
```{julia}
v = [2, -2, 1]
function plot_axes()
empty_style = (xaxis = ([], false),
yaxis = ([], false),
legend=false)
plt = plot(; empty_style...)
axis_values = [[(0,0,0), (3.5,0,0)], # x axis
[(0,0,0), (0, 2.0 * radius(0), 0)], # yaxis
[(0,0,0), (0, 0, 1.5 * radius(0))]] # z axis
for (ps, ax) ∈ zip(axis_values, ("x", "y", "z"))
p0, p1 = ps
a, b = project(p0, v), project(p1, v)
annotate!([(b...,text(ax, :bottom))])
plot!([a, b]; arrow=true, head=:tip, line=(:gray, 1)) # gr() allows arrows
end
plt
end
plt = plot_axes()
```
We are using the vector of tuples interface (representing points) to specify the curve to draw.
Now we add on some curves for fixed $t$ and then fixed $\theta$ utilizing the fact that `project` returns a tuple of $x$---$y$ values to display.
```{julia}
for t in range(t₀, tₙ, 20)
curve = [project(surf(t, θ), v) for θ in range(0, 2pi, 100)]
plot!(curve; line=(:black, 1))
end
for θ in range(0, 2pi, 60)
curve = [project(surf(t, θ), v) for t in range(t₀, tₙ, 20)]
plot!(curve; line=(:black, 1))
end
plt
```
The graphic is a little busy!
Let's focus on the cells layering the surface. These have equal size in the $t \times \theta$ range, but unequal area on the screen. Where they parallellograms, the area could be found by taking the 2-dimensional cross product of the two partial derivatives, resulting in a formula like: $a_x b_y - a_y b_x$.
When we discuss integrals related to such figures, this amount of area will be characterized by a computation involving the determinant of the upcoming Jacobian function.
We make a function to close over the viewpoint vector that can be passed to `ForwardDiff`, as it will return a vector and not a tuple.
```{julia}
function psurf(v)
(t,θ) -> begin
v1, v2 = project(surf(t, θ), v)
[v1, v2] # or call collect to make a tuple into a vector
end
end
```
The function returned by `psurf` is from $R^2 \rightarrow R^2$. With such a function, the computation of this approximate area becomes:
```{julia}
function detJ(F, t, θ)
∂θ = ForwardDiff.derivative(θ -> F(t, θ), θ)
∂t = ForwardDiff.derivative(t -> F(t, θ), t)
(ax, ay), (bx, by) = ∂θ, ∂t
ax * by - ay * bx
end
```
For our purposes, we are interested in the sign of the returned value. Plotting, we can see that some "area" is positive, some "negative":
```{julia}
t = 1
G = psurf(v)
plot(θ -> detJ(G, t, θ), 0, 2pi)
```
With this parameterization and viewpoint, the positive area for the surface is when the normal vector points towards the viewing point. In the following, we only plot such values:
```{julia}
plt = plot_axes()
function I(F, t, θ)
x, y = F(t, θ)
detJ(F, t, θ) >= 0 ? (x, y) : (x, NaN) # use NaN for y value
end
for t in range(t₀, tₙ, 20)
curve = [I(G, t, θ) for θ in range(0, 2pi, 100)]
plot!(curve; line=(:gray, 1))
end
for θ in range(0, 2pi, 60)
curve = [I(G, t, θ) for t in range(t₀, tₙ, 20)]
plot!(curve; line=(:gray, 1))
end
plt
```
The values for which `detJ` is zero form the visible boundary of the object. We can plot just those to get an even less busy view. We identify them by finding the value of $\theta$ in $[0,\pi]$ and $[\pi,2\pi]$ that makes the `detJ` function zero:
```{julia}
fold(F, t, θmin, θmax) = find_zero(θ -> detJ(F, t, θ), (θmin, θmax))
ts = range(t₀, tₙ, 100)
back_edge = fold.(G, ts, 0, pi)
front_edge = fold.(G, ts, pi, 2pi)
plt = plot_axes()
plot!(project.(surf.(ts, back_edge), (v,)); line=(:black, 1))
plot!(project.(surf.(ts, front_edge), (v,)); line=(:black, 1))
```
Adding caps makes the graphic stand out. The caps are just discs (fixed values of $t$) which are filled in with gray using a transparency so that the axes aren't masked.
```{julia}
θs = range(0, 2pi, 100)
S = Shape(project.(surf.(t₀, θs), (v,)))
plot!(S; fill=(:gray, 0.33))
S = Shape(project.(surf.(tₙ, θs), (v,)))
plot!(S; fill=(:gray, 0.33))
```
Finally, we introduce some shading using the same technique but assuming the light comes from a different position.
```{julia}
lightpt = [2, -2, 5] # from further above
H = psurf(lightpt)
light_edge = fold.(H, ts, pi, 2pi);
```
Angles between the light edge and the front edge would be in shadow. We indicate this by drawing lines for fixed $t$ values. As denser lines indicate more shadow, we feather how these are drawn:
```{julia}
for (i, (t, top, bottom)) in enumerate(zip(ts, light_edge, front_edge))
λ = iseven(i) ? 1.0 : 0.8
top = bottom + λ*(top - bottom)
curve = [project(surf(t, θ), v) for θ in range(bottom, top, 20)]
plot!(curve, line=(:black, 1))
end
plt
```
We can compare to the graph produced by `surface` for the same function:
```{julia}
ts = range(t₀, tₙ, 50)
θs = range(0, 2pi, 100)
surface(unzip(surf.(ts, θs'))...; legend=false)
```
```{julia}
#| echo: false
plotly()
nothing
```
## The total derivative
@@ -839,8 +1064,337 @@ Taking $\partial/\partial{a_i}$ gives equations $2a_i\sigma_i^2 + \lambda = 0$,
For the special case of a common variance, $\sigma_i=\sigma$, the above simplifies to $a_i = 1/n$ and the estimator is $\sum X_i/n$, the familiar sample mean, $\bar{X}$.
##### Example: The mean value theorem
[Perturbing the Mean Value Theorem: Implicit Functions, the Morse Lemma, and Beyond](https://www.jstor.org/stable/48661587) by Lowry-Duda, and Wheeler presents an interesting take on the mean-value theorem by asking if the endpoint $b$ moves continuously, does the value $c$ move continuously?
Fix the left-hand endpoint, $a_0$, and consider:
$$
F(b,c) = \frac{f(b) - f(a_0)}{b-a_0} - f'(c).
$$
Solutions to $F(b,c)=0$ satisfy the mean value theorem for $f$.
Suppose $(b_0,c_0)$ is one such solution.
By using the implicit function theorem, the question of finding a $C(b)$ such that $C$ is continuous near $b_0$ and satisfied $F(b, C(b)) =0$ for $b$ near $b_0$ can be characterized.
To analyze this question, Lowry-Duda and Wheeler fix a set of points $a_0 = 0$, $b_0=3$ and consider functions $f$ with $f(a_0) = f(b_0) = 0$. Similar to how Rolle's theorem easily proves the mean value theorem, this choice imposes no loss of generality.
Suppose further that $c_0 = 1$, where $c_0$ solves the mean value theorem:
$$
f'(c_0) = \frac{f(b_0) - f(a_0)}{b_0 - a_0}.
$$
Again, this is no loss of generality. By construction $(b_0, c_0)$ is a zero of the just defined $F$.
We are interested in the shape of the level set $F(b,c) = 0$ which reveals other solutions $(b,c)$. For a given $f$, a contour plot, with $b>c$, can reveal this shape.
To find a source of examples for such functions, polynomials are considered, beginning with these constraints:
$$
f(a_0) = 0, f(b_0) = 0, f(c_0) = 1, f'(c_0) = 0
$$
With four conditions, we might guess a cubic parabola with four unknowns should fit. We use `SymPy` to identify the coefficients.
```{julia}
a₀, b₀, c₀ = 0, 3, 1
@syms x
@syms a[0:3]
p = sum(aᵢ*x^(i-1) for (i,aᵢ) ∈ enumerate(a))
dp = diff(p,x)
p, dp
```
The constraints are specified as follows; `solve` has no issue with this system of equations.
```{julia}
eqs = (p(x=>a₀) ~ 0,
p(x=>b₀) ~ 0,
p(x=>c₀) ~ 1,
dp(x=>c₀) ~ 0)
d = solve(eqs, a)
q = p(d...)
```
We can plot $q$ and emphasize the three points with:
```{julia}
xlims = (-0.5, 3.5)
plot(q; xlims, legend=false)
scatter!([a₀, b₀, c₀], [0,0,1]; marker=(5, 0.25))
```
We now make a plot of the level curve $F(x,y)=0$ using `contour` and the constraint that $b>c$ to graphically identify $C(b)$:
```{julia}
dq = diff(q, x)
λ(b,c) = b > c ? (q(b) - q(a₀)) / (b - a₀) - dq(c) : -Inf
bs = cs = range(0.5,3.5, 100)
plot(; legend=false)
contour!(bs, cs, λ; levels=[0])
plot!(identity; line=(1, 0.25))
scatter!([b₀], [c₀]; marker=(5, 0.25))
```
The curve that passes through the point $(3,1)$ is clearly continuous, and following it, we see continuous changes in $b$ result in continuous changes in $c$.
Following a behind-the-scenes blog post by [Lowry-Duda](https://davidlowryduda.com/choosing-functions-for-mvt-abscissa/) we wrap some of the above into a function to find a polynomial given a set of conditions on values for its self or its derivatives at a point.
```{julia}
function _interpolate(conds; x=x)
np1 = length(conds)
n = np1 - 1
as = [Sym("a$i") for i in 0:n]
p = sum(as[i+1] * x^i for i in 0:n)
# set p⁽ᵏ⁾(xᵢ) = v
eqs = Tuple(diff(p, x, k)(x => xᵢ) ~ v for (xᵢ, k, v) ∈ conds)
soln = solve(eqs, as)
p(soln...)
end
# sets p⁽⁰⁾(a₀) = 0, p⁽⁰⁾(b₀) = 0, p⁽⁰⁾(c₀) = 1, p⁽¹⁾(c₀) = 0
basic_conditions = [(a₀,0,0), (b₀,0,0), (c₀,0,1), (c₀,1,0)]
_interpolate(basic_conditions; x)
```
Before moving on, polynomial interpolation can suffer from the Runge phenomenon, where there can be severe oscillations between the points. To tamp these down, an additional *control* point is added which is adjusted to minimize the size of the derivative through the value $\int \| f'(x) \|^2 dx$ (the $L_2$ norm of the derivative):
```{julia}
function interpolate(conds)
@syms x, D
# set f'(2) = D, then adjust D to minimize L₂ below
new_conds = vcat(conds, [(2, 1, D)])
p = _interpolate(new_conds; x)
# measure size of p with ∫₀⁴f'(x)^2 dx
dp = diff(p, x)
L₂ = integrate(dp^2, (x, 0, 4))
dL₂ = diff(L₂, D)
soln = first(solve(dL₂ ~ 0, D)) # critical point to minimum L₂
p(D => soln)
end
q = interpolate(basic_conditions)
```
We also make a plotting function to show both `q` and the level curve of `F`:
```{julia}
function plot_q_level_curve(q; title="", layout=[1;1])
x = only(free_symbols(q)) # fish out x
dq = diff(q, x)
xlims = ylims = (-0.5, 4.5)
p₁ = plot(; xlims, ylims, title,
legend=false, aspect_ratio=:equal)
plot!(p₁, q; xlims, ylims)
scatter!(p₁, [a₀, b₀, c₀], [0,0,1]; marker=(5, 0.25))
λ(b,c) = b > c ? (q(b) - q(a₀)) / (b - a₀) - dq(c) : -Inf
bs = cs = range(xlims..., 100)
p₂ = plot(; xlims, ylims, legend=false, aspect_ratio=:equal)
contour!(p₂, bs, cs, λ; levels=[0])
plot!(p₂, identity; line=(1, 0.25))
scatter!(p₂, [b₀], [c₀]; marker=(5, 0.25))
plot(p₁, p₂; layout)
end
```
```{julia}
plot_q_level_curve(q; layout=(1,2))
```
Like previously, this highlights the presence of a continuous function in $b$ yielding $c$.
This is not the only possibility. Another such from their paper (Figure 3) looks like the following where some additional constraints are added ($f''(c_0) = 0, f'''(c_0)=3, f'(b_0)=-3$):
```{julia}
new_conds = [(c₀, 2, 0), (c₀, 3, 3), (b₀, 1, -3)]
q = interpolate(vcat(basic_conditions, new_conds))
plot_q_level_curve(q;layout=(1,2))
```
For this shape, if $b$ increases away from $b_0$, the secant line connecting $(a_0,0)$ and $(b, f(b)$ will have a negative slope, but there are no points nearby $x=c_0$ where the derivative has a tangent line with negative slope, so the continuous function is only on the left side of $b_0$. Mathematically, as $f$ is increasing $c_0$---as $f'''(c_0) = 3 > 0$---and $f$ is decreasing at $f(b_0)$---as $f'(b_0) = -1 < 0$, the signs alone suggest the scenario. The contour plot reveals, not one, but two one-sided functions of $b$ giving $c$.
---
Now to characterize all possibilities.
Suppose $F(x,y)$ is differentiable. Then $F(x,y)$ has this approximation (where $F_x$ and $F_y$ are the partial derivatives):
$$
F(x,y) \approx F(x_0,y_0) + F_x(x_0,y_0) (x - x_0) + F_y(x_0,y_0) (y-y_0)
$$
If $(x_0,y_0)$ is a zero of $F$, then the above can be solved for $y$ assuming $F_y$ does not vanish:
$$
y \approx y_0 - \frac{F_x(x_0, y_0)}{F_y(x_0, y_0)} \cdot (x - x_0)
$$
The main tool used in the authors' investigation is the implicit function theorem. The implicit function theorem states there is some function continuously describing $y$, not just approximately, under the above assumption of $F_y$ not vanishing.
Again, with $F(b,c) = (f(b) - f(a_0)) / (b -a_0) - f'(c)$ and assuming $f$ has at least two continuous derivatives, then:
$$
\begin{align*}
F(b_0,c_0) &= 0,\\
F_c(b_0, c_0) &= -f''(c_0).
\end{align*}
$$
Assuming $f''(c_0)$ is *non*-zero, then this proves that if $b$ moves continuously, a corresponding solution to the mean value theorem will as well, or there is a continuous function $C(b)$ with $F(b,C(b)) = 0$.
Further, they establish if $f'(b_0) \neq f'(c_0)$ then there is a continuous $B(c)$ near $c_0$ such that $F(B(c),c) = 0$; and that there are no other nearby solutions to $F(b,c)=0$ near $(b_0, c_0)$.
This leaves for consideration the possibilities when $f''(c_0) = 0$ and $f'(b_0) = f'(c_0)$.
One such possibility looks like:
```{julia}
new_conds = [(c₀, 2, 0), (c₀, 3, 3), (b₀, 1, 0), (b₀, 2, 3)]
q = interpolate(vcat(basic_conditions, new_conds))
plot_q_level_curve(q;layout=(1,2))
```
This picture shows more than one possible choice for a continuous function, as the contour plot has this looping intersection point at $(b_0,c_0)$.
To characterize possible behaviors, the authors recall the [Morse lemma](https://en.wikipedia.org/wiki/Morse_theory) applied to functions $f:R^2 \rightarrow R$ with vanishing gradient, but non-vanishing Hession. This states that after some continuous change of coordinates, $f$ looks like $\pm u^2 \pm v^2$. Only this one-dimensional Morse lemma (and a generalization) is required for this analysis:
> if $g(x)$ is three-times continuously differentiable with $g(x_0) = g'(x_0) = 0$ but $g''(x_0) \neq 0$ then *near* $x_0$ $g(x)$ can be transformed through a continuous change of coordinates to look like $\pm u^2$, where the sign is the sign of the second derivative of $g$.
That is, locally the function can be continuously transformed into a parabola opening up or down depending on the sign of the second derivative. Their proof starts with Taylor's remainder theorem to find a candidate for the change of coordinates and shows with the implicit function theorem this is a viable change.
Setting:
$$
\begin{align*}
g_1(b) &= (f(b) - f(a_0))/(b - a_0) - f'(c_0)\\
g_2(c) &= f'(c) - f'(c_0).
\end{align*}
$$
Then $F(c, b) = g_1(b) - g_2(c)$.
By construction, $g_2(c_0) = 0$ and $g_2^{(k)}(c_0) = f^{(k+1)}(c_0)$,
Adjusting $f$ to have a vanishing second---but not third---derivative at $c_0$ means $g_2$ will satisfy the assumptions of the lemma assuming $f$ has at least four continuous derivatives (as all our example polynomials do).
As for $g_1$, we have by construction $g_1(b_0) = 0$. By differentiation we get a pattern for some constants $c_j = (j+1)\cdot(j+2)\cdots \cdot k$ with $c_k = 1$.
$$
g^{(k)}(b) = k! \cdot \frac{f(a_0) - f(b)}{(a_0-b)^{k+1}} - \sum_{j=1}^k c_j \frac{f^{(j)}(b)}{(a_0 - b)^{k-j+1}}.
$$
Of note that when $f(a_0) = f(b_0) = 0$ that if $f^{(k)}(b_0)$ is the first non-vanishing derivative of $f$ at $b_0$ that $g^{(k)}(b_0) = f^{(k)}(b_0)/(b_0 - a_0)$ (they have the same sign).
In particular, if $f(a_0) = f(b_0) = 0$ and $f'(b_0)=0$ and $f''(b_0)$ is non-zero, the lemma applies to $g_1$, again assuming $f$ has at least four continuous derivatives.
Let $\sigma_1 = \text{sign}(f''(b_0))$ and $\sigma_2 = \text{sign}(f'''(c_0))$, then we have $F(b,c) = \sigma_1 u^2 - \sigma_2 v^2$ after some change of variables. The authors conclude:
* If $\sigma_1$ and $\sigma_2$ have different signs, then $F(b,c) = 0$ is like $u^2 = - v^2$ which has only one isolated solution, as the left hand side and right hand sign will have different signs except when $0$.
* If $\sigma_1$ and $\sigma_2$ have the same sign, then $F(b,c) = 0$ is like $u^2 = v^2$ which has two solutions $u = \pm v$.
Applied to the problem at hand:
* if $f''(b_0)$ and $f'''(c_0)$ have different signs, the $c_0$ can not be extended to a continuous function near $b_0$.
* if the two have the same sign, then there are two such functions possible.
```{julia}
conds₁ = [(b₀,1,0), (b₀,2,3), (c₀,2,0), (c₀,3,-3)]
conds₂ = [(b₀,1,0), (b₀,2,3), (c₀,2,0), (c₀,3, 3)]
q₁ = interpolate(vcat(basic_conditions, conds₁))
q₂ = interpolate(vcat(basic_conditions, conds₂))
p₁ = plot_q_level_curve(q₁)
p₂ = plot_q_level_curve(q₂)
plot(p₁, p₂; layout=(1,2))
```
There are more possibilities, as pointed out in the article.
Say a function, $h$, has *a zero of order $k$ at $x_0$* if the first $k-1$ derivatives of $h$ are zero at $x_0$, but that $h^{(k)}(x_0) \neq 0$. Now suppose $f$ has order $k$ at $b_0$ and order $l$ at $c_0$. Then $g_1$ will be order $k$ at $b_0$ and $g_2$ will have order $l-1$ at $c_0$. In the above, we had orders $2$ and $3$ respectively.
A generalization of the Morse lemma to the function, $h$ having a zero of order $k$ at $x_0$ is $h(x) = \pm u^k$ where if $k$ is odd either sign is possible and if $k$ is even, then the sign is that of $h^{(k)}(x_0)$.
With this, we get the following possibilities for $f$ with a zero of order $k$ at $b_0$ and $l$ at $c_0$:
* If $l$ is even, then there is one continuous solution near $(b_0,c_0)$
* If $l$ is odd and $k$ is even and $f^{(k)}(b_0)$ and $f^{(l)}(c_0)$ have the *same* sign, then there are two continuous solutions
* If $l$ is odd and $k$ is even and $f^{(k)}(b_0)$ and $f^{(l)}(c_0)$ have *opposite* signs, the $(b_0, c_0)$ is an isolated solution.
* If $l$ is add and $k$ is odd, then there are two continuous solutions, but only defined in a a one-sided neighborhood of $b_0$ where $f^{(k)}(b_0) f^{(l)}(c_0) (b - b_0) > 0$.
To visualize these four cases, we take $(l=2,k=1)$, $(l=3, k=2)$ (twice) and $(l=3, k=3)$.
```{julia}
condsₑ = [(c₀,2,3), (b₀,1,-3)]
condsₒₑ₊₊ = [(c₀,2,0), (c₀,3, 10), (b₀,1,0), (b₀,2,10)]
condsₒₑ₊₋ = [(c₀,2,0), (c₀,3,-20), (b₀,1,0), (b₀,2,20)]
condsₒₒ = [(c₀,2,0), (c₀,3,-20), (b₀,1,0), (b₀,2, 0), (b₀,3, 20)]
qₑ = interpolate(vcat(basic_conditions, condsₑ))
qₒₑ₊₊ = interpolate(vcat(basic_conditions, condsₒₑ₊₊))
qₒₑ₊₋ = interpolate(vcat(basic_conditions, condsₒₑ₊₋))
qₒₒ = interpolate(vcat(basic_conditions, condsₒₒ))
p₁ = plot_q_level_curve(qₑ; title = "(e,.)")
p₂ = plot_q_level_curve(qₒₑ₊₊; title = "(o,e,same)")
p₃ = plot_q_level_curve(qₒₑ₊₋; title = "(o,e,different)")
p₄ = plot_q_level_curve(qₒₒ; title = "(o,o)")
plot(p₁, p₂, p₃, p₄; layout=(1,4))
```
This handles most cases, but leaves the possibility that a function with infinite vanishing derivatives to consider. We steer the interested reader to the article for thoughts on that.
## Questions
##### Question
```{julia}
#| echo: false
gr()
p1 = vectorfieldplot((x,y) -> [x,y], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="A");
p2 = vectorfieldplot((x,y) -> [x-y,x], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9,title="B");
p3 = vectorfieldplot((x,y) -> [y,0], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="C");
p4 = vectorfieldplot((x,y) -> [-y,x], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="D");
plot(p1, p2, p3, p4; layout=[2,2])
```
In the above figure, match the function with the vector field plot.
```{julia}
#| echo: false
plotly()
matchq(("`F(x,y)=[-y ,x]`", "`F(x,y)=[y,0]`",
"`F(x,y)=[x-y,x]`", "`F(x,y)=[x,y]`"),
("A", "B", "C", "D"),
(4,3,2,1);
label="For each function mark the correct vector field plot"
)
```
###### Question

View File

@@ -981,7 +981,7 @@ $$
\vec{v} \times \vec{c} = GM \hat{x} + \vec{d}.
$$
As $\vec{x}$ and $\vec{v}\times\vec{c}$ lie in the same plane - orthogonal to $\vec{c}$ - so does $\vec{d}$. With a suitable re-orientation, so that $\vec{d}$ is along the $x$ axis, $\vec{c}$ is along the $z$-axis, then we have $\vec{c} = \langle 0,0,c\rangle$ and $\vec{d} = \langle d ,0,0 \rangle$, and $\vec{x} = \langle x, y, 0 \rangle$. Set $\theta$ to be the angle, then $\hat{x} = \langle \cos(\theta), \sin(\theta), 0\rangle$.
As $\vec{x}$ and $\vec{v}\times\vec{c}$ lie in the same plane---orthogonal to $\vec{c}$---so does $\vec{d}$. With a suitable re-orientation, so that $\vec{d}$ is along the $x$ axis, $\vec{c}$ is along the $z$-axis, then we have $\vec{c} = \langle 0,0,c\rangle$ and $\vec{d} = \langle d ,0,0 \rangle$, and $\vec{x} = \langle x, y, 0 \rangle$. Set $\theta$ to be the angle, then $\hat{x} = \langle \cos(\theta), \sin(\theta), 0\rangle$.
Now
@@ -1662,7 +1662,7 @@ $$
The first equation relates the steering angle with the curvature. If the steering angle is not changed ($d\alpha/du=0$) then the curvature is constant and the motion is circular. It will be greater for larger angles (up to $\pi/2$). As the curvature is the reciprocal of the radius, this means the radius of the circular trajectory will be smaller. For the same constant steering angle, the curvature will be smaller for longer wheelbases, meaning the circular trajectory will have a larger radius. For cars, which have similar dynamics, this means longer wheelbase cars will take more room to make a U-turn.
The second equation may be interpreted in ratio of arc lengths. The infinitesimal arc length of the rear wheel is proportional to that of the front wheel only scaled down by $\cos(\alpha)$. When $\alpha=0$ - the bike is moving in a straight line - and the two are the same. At the other extreme - when $\alpha=\pi/2$ - the bike must be pivoting on its rear wheel and the rear wheel has no arc length. This cosine, is related to the speed of the back wheel relative to the speed of the front wheel, which was used in the initial differential equation.
The second equation may be interpreted in ratio of arc lengths. The infinitesimal arc length of the rear wheel is proportional to that of the front wheel only scaled down by $\cos(\alpha)$. When $\alpha=0$---the bike is moving in a straight line---and the two are the same. At the other extreme---when $\alpha=\pi/2$---the bike must be pivoting on its rear wheel and the rear wheel has no arc length. This cosine, is related to the speed of the back wheel relative to the speed of the front wheel, which was used in the initial differential equation.
The last equation, relates the curvature of the back wheel track to the steering angle of the front wheel. When $\alpha=\pm\pi/2$, the rear-wheel curvature, $k$, is infinite, resulting in a cusp (no circle with non-zero radius will approximate the trajectory). This occurs when the front wheel is steered orthogonal to the direction of motion. As was seen in previous graphs of the trajectories, a cusp can happen for quite regular front wheel trajectories.
@@ -1875,7 +1875,7 @@ $$
$$
We see $\vec\beta'$ is zero (the curve is non-regular) when $\kappa'(s) = 0$. The curvature changes from increasing to decreasing, or vice versa at each of the $4$ crossings of the major and minor axes - there are $4$ non-regular points, and we see $4$ cusps in the evolute.
We see $\vec\beta'$ is zero (the curve is non-regular) when $\kappa'(s) = 0$. The curvature changes from increasing to decreasing, or vice versa at each of the $4$ crossings of the major and minor axes--there are $4$ non-regular points, and we see $4$ cusps in the evolute.
The curve parameterized by $\vec{r}(t) = 2(1 - \cos(t)) \langle \cos(t), \sin(t)\rangle$ over $[0,2\pi]$ is cardiod. It is formed by rolling a circle of radius $r$ around another similar sized circle. The following graphically shows the evolute is a smaller cardiod (one-third the size). For fun, the evolute of the evolute is drawn:

File diff suppressed because it is too large Load Diff

View File

@@ -59,7 +59,7 @@ in a spirit similar to a section of a book. Just like a book, there
are try-it-yourself questions at the end of each page. All have a
limited number of self-graded answers. These notes borrow ideas from
many sources, for example @Strang, @Knill, @Schey, @Thomas,
@RogawskiAdams, several Wikipedia pages, and other sources.
@RogawskiAdams, @Angenent, several Wikipedia pages, and other sources.
These notes are accompanied by a `Julia` package `CalculusWithJulia`
that provides some simple functions to streamline some common tasks
@@ -77,7 +77,9 @@ These notes may be compiled into a `pdf` file through Quarto. As the result is r
-->
To *contribute* -- say by suggesting additional topics, correcting a
mistake, or fixing a typo -- click the "Edit this page" link and join the list of [contributors](https://github.com/jverzani/CalculusWithJuliaNotes.jl/graphs/contributors). Thanks to all contributors and a *very* special thanks to `@fangliu-tju` for their careful and most-appreciated proofreading.
mistake, or fixing a typo -- click the "Edit this page" link and join the list of [contributors](https://github.com/jverzani/CalculusWithJuliaNotes.jl/graphs/contributors). Thanks to all contributors.
A *very* special thanks goes out to `@fangliu-tju` for their careful and most-appreciated proofreading and error spotting spread over a series of PRs.
## Running Julia

View File

@@ -4,9 +4,14 @@ ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
HCubature = "19dc6840-f33b-545b-b366-655c7e3ffd49"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
ImplicitIntegration = "bc256489-3a69-4a66-afc4-127cc87e6182"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
PlotlyBase = "a03496cd-edff-5a9b-9e67-9cda94a718b5"
PlotlyKaleido = "f2990250-8cf9-495f-b13a-cce12b45703c"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuadGK = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -1,4 +1,4 @@
# The Gradient, Divergence, and Curl
# The gradient, divergence, and curl
{{< include ../_common_code.qmd >}}

View File

@@ -1,4 +1,4 @@
# Line and Surface Integrals
# Line and surface integrals
{{< include ../_common_code.qmd >}}
@@ -340,7 +340,7 @@ W = integrate(F(r(t)) ⋅ T(r(t)), (t, 0, 2PI))
There are technical assumptions about curves and regions that are necessary for some statements to be made:
* Let $C$ be a [Jordan](https://en.wikipedia.org/wiki/Jordan_curve_theorem) curve - a non-self-intersecting continuous loop in the plane. Such a curve divides the plane into two regions, one bounded and one unbounded. The normal to a Jordan curve is assumed to be in the direction of the unbounded part.
* Let $C$ be a [Jordan](https://en.wikipedia.org/wiki/Jordan_curve_theorem) curve---a non-self-intersecting continuous loop in the plane. Such a curve divides the plane into two regions, one bounded and one unbounded. The normal to a Jordan curve is assumed to be in the direction of the unbounded part.
* Further, we will assume that our curves are *piecewise smooth*. That is comprised of finitely many smooth pieces, continuously connected.
* The region enclosed by a closed curve has an *interior*, $D$, which we assume is an *open* set (one for which every point in $D$ has some "ball" about it entirely within $D$ as well.)
* The region $D$ is *connected* meaning between any two points there is a continuous path in $D$ between the two points.
@@ -471,7 +471,7 @@ The flow integral is typically computed for a closed (Jordan) curve, measuring t
:::{.callout-note}
## Note
For a Jordan curve, the positive orientation of the curve is such that the normal direction (proportional to $\hat{T}'$) points away from the bounded interior. For a non-closed path, the choice of parameterization will determine the normal and the integral for flow across a curve is dependent - up to its sign - on this choice.
For a Jordan curve, the positive orientation of the curve is such that the normal direction (proportional to $\hat{T}'$) points away from the bounded interior. For a non-closed path, the choice of parameterization will determine the normal and the integral for flow across a curve is dependent---up to its sign---on this choice.
:::

View File

@@ -1,4 +1,4 @@
# Quick Review of Vector Calculus
# Quick review of vector calculus
{{< include ../_common_code.qmd >}}
@@ -133,7 +133,7 @@ $$
$$
The generalization to $n>2$ is clear - the partial derivative in $x_i$ is the derivative of $f$ when the *other* $x_j$ are held constant.
The generalization to $n>2$ is clear---the partial derivative in $x_i$ is the derivative of $f$ when the *other* $x_j$ are held constant.
This may be viewed as the derivative of the univariate function $(f\circ\vec{r})(t)$ where $\vec{r}(t) = p + t \hat{e}_i$, $\hat{e}_i$ being the unit vector of all $0$s except a $1$ in the $i$th component.

View File

@@ -1,4 +1,4 @@
# Green's Theorem, Stokes' Theorem, and the Divergence Theorem
# Green's theorem, Stokes' theorem, and the divergence theorem
{{< include ../_common_code.qmd >}}
@@ -721,13 +721,13 @@ The fluid would flow along the blue (stream) lines. The red lines have equal pot
# https://en.wikipedia.org/wiki/Jiffy_Pop#/media/File:JiffyPop.jpg
imgfile ="figures/jiffy-pop.png"
caption ="""
The Jiffy Pop popcorn design has a top surface that is designed to expand to accommodate the popped popcorn. Viewed as a surface, the surface area grows, but the boundary - where the surface meets the pan - stays the same. This is an example that many different surfaces can have the same bounding curve. Stokes' theorem will relate a surface integral over the surface to a line integral about the bounding curve.
The Jiffy Pop popcorn design has a top surface that is designed to expand to accommodate the popped popcorn. Viewed as a surface, the surface area grows, but the boundary---where the surface meets the pan---stays the same. This is an example that many different surfaces can have the same bounding curve. Stokes' theorem will relate a surface integral over the surface to a line integral about the bounding curve.
"""
# ImageFile(:integral_vector_calculus, imgfile, caption)
nothing
```
![The Jiffy Pop popcorn design has a top surface that is designed to expand to accommodate the popped popcorn. Viewed as a surface, the surface area grows, but the boundary - where the surface meets the pan - stays the same. This is an example that many different surfaces can have the same bounding curve. Stokes' theorem will relate a surface integral over the surface to a line integral about the bounding curve.
![The Jiffy Pop popcorn design has a top surface that is designed to expand to accommodate the popped popcorn. Viewed as a surface, the surface area grows, but the boundary---where the surface meets the pan---stays the same. This is an example that many different surfaces can have the same bounding curve. Stokes' theorem will relate a surface integral over the surface to a line integral about the bounding curve.
](./figures/jiffy-pop.png)
Were the figure of Jiffy Pop popcorn animated, the surface of foil would slowly expand due to pressure of popping popcorn until the popcorn was ready. However, the boundary would remain the same. Many different surfaces can have the same boundary. Take for instance the upper half unit sphere in $R^3$ it having the curve $x^2 + y^2 = 1$ as a boundary curve. This is the same curve as the surface of the cone $z = 1 - (x^2 + y^2)$ that lies above the $x-y$ plane. This would also be the same curve as the surface formed by a Mickey Mouse glove if the collar were scaled and positioned onto the unit circle.
@@ -761,7 +761,7 @@ $$
$$
In terms of our expanding popcorn, the boundary integral - after accounting for cancellations, as in Green's theorem - can be seen as a microscopic sum of boundary integrals each of which is approximated by a term $\nabla\times{F}\cdot\hat{N} \Delta{S}$ which is viewed as a Riemann sum approximation for the the integral of the curl over the surface. The cancellation depends on a proper choice of orientation, but with that we have:
In terms of our expanding popcorn, the boundary integral---after accounting for cancellations, as in Green's theorem---can be seen as a microscopic sum of boundary integrals each of which is approximated by a term $\nabla\times{F}\cdot\hat{N} \Delta{S}$ which is viewed as a Riemann sum approximation for the the integral of the curl over the surface. The cancellation depends on a proper choice of orientation, but with that we have:
::: {.callout-note icon=false}
## Stokes' theorem

View File

@@ -1,3 +1,7 @@
---
engine: julia
---
# Integrals
Identifying the area under a curve between two values is an age-old problem. In this chapter we see that for many case the Fundamental Theorem of Calculus can be used to identify the area. When not applicable, we will see how such areas may be accurately estimated.

View File

@@ -0,0 +1,41 @@
# Appendix
```{julia}
#| hold: true
#| echo: false
gr()
## For **some reason** having this in the natural place messes up the plots.
## {{{approximate_surface_area}}}
xs,ys = range(-1, stop=1, length=50), range(-1, stop=1, length=50)
f(x,y)= 2 - (x^2 + y^2)
dr = [1/2, 3/4]
df = [f(dr[1],0), f(dr[2],0)]
function sa_approx_graph(i)
p = plot(xs, ys, f, st=[:surface], legend=false)
for theta in range(0, stop=i/10*2pi, length=10*i )
path3d!(p,sin(theta)*dr, cos(theta)*dr, df)
end
p
end
n = 10
anim = @animate for i=1:n
sa_approx_graph(i)
end
imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)
caption = L"""
Surface of revolution of $f(x) = 2 - x^2$ about the $y$ axis. The lines segments are the images of rotating the secant line connecting $(1/2, f(1/2))$ and $(3/4, f(3/4))$. These trace out the frustum of a cone which approximates the corresponding surface area of the surface of revolution. In the limit, this approximation becomes exact and a formula for the surface area of surfaces of revolution can be used to compute the value.
"""
plotly()
ImageFile(imgfile, caption)
```

View File

@@ -76,21 +76,34 @@ To see why, any partition of the interval $[a,b]$ by $a = t_0 < t_1 < \cdots < t
## {{{arclength_graph}}}
gr()
function make_arclength_graph(n)
x(t) = cos(t)/t
y(t) = sin(t)/t
a, b = 1, 4pi
ns = [10,15,20, 30, 50]
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
ns = [10,15,20, 30, 50]
plot(; empty_style..., aspect_ratio=:equal, size=fig_size)
title!("Approximate arc length with $(ns[n]) points")
g(t) = cos(t)/t
f(t) = sin(t)/t
ts = range(a, b, 250)
plot!(x.(ts), y.(ts); line=(:black,2))
pttn = range(a, b, ns[n])
plot!(x.(pttn), y.(pttn); line=(:red, 2))
ts = range(1, stop=4pi, length=200)
tis = range(1, stop=4pi, length=ns[n])
ts = range(0, 2pi, 100)
p = plot(g, f, 1, 4pi, legend=false, size=fig_size,
title="Approximate arc length with $(ns[n]) points")
plot!(p, map(g, tis), map(f, tis), color=:orange)
λ = 0.01
C = Plots.scale(Shape(:circle), λ)
p
for (u,v) ∈ zip(x.(pttn), y.(pttn))
S = Plots.translate(C, u,v)
plot!(S; fill=(:white,), line=(:black,2))
end
current()
end
n = 5
@@ -542,7 +555,9 @@ plot(t -> g(𝒔(t)), t -> f(𝒔(t)), 0, sinv(2*pi))
Following (faithfully) [Kantorwitz and Neumann](https://www.researchgate.net/publication/341676916_The_English_Galileo_and_His_Vision_of_Projectile_Motion_under_Air_Resistance), we consider a function $f(x)$ with the property that **both** $f$ and $f'$ are strictly concave down on $[a,b]$ and suppose $f(a) = f(b)$. Further, assume $f'$ is continuous. We will see this implies facts about arc-length and other integrals related to $f$.
The following figure is clearly of a concave down function. The asymmetry about the critical point will be seen to be a result of the derivative also being concave down. This asymmetry will be characterized in several different ways in the following including showing that the arc length from $(a,0)$ to $(c,f(c))$ is longer than from $(c,f(c))$ to $(b,0)$.
@fig-kantorwitz-neumann is clearly of a concave down function. The asymmetry about the critical point will be seen to be a result of the derivative also being concave down. This asymmetry will be characterized in several different ways in the following including showing that the arc length from $(a,0)$ to $(c,f(c))$ is longer than from $(c,f(c))$ to $(b,0)$.
::: {#fig-kantorwitz-neumann}
```{julia}
@@ -577,7 +592,12 @@ plot!(zero)
annotate!([(0, 𝒚, "a"), (152, 𝒚, "b"), (u, 𝒚, "u"), (v, 𝒚, "v"), (c, 𝒚, "c")])
```
Take $a < u < c < v < b$ with $f(u) = f(v)$ and $c$ a critical point, as in the picture. There must be a critical point by Rolle's theorem, and it must be unique, as the derivative, which exists by the assumptions, must be strictly decreasing due to concavity of $f$ and hence there can be at most $1$ critical point.
Graph of function $f(x)$ with both $f$ and $f'$ strictly concave down.
:::
By Rolle's theorem there exists $c$ in $(a,b)$, a critical point, as in the picture. There must be a critical point by Rolle's theorem, and it must be unique, as the derivative, which exists by the assumptions, must be strictly decreasing due to concavity of $f$ and hence there can be at most $1$ critical point.
Take $a < u < c < v < b$ with $f(u) = f(v)$.
Some facts about this picture can be proven from the definition of concavity:
@@ -640,7 +660,7 @@ By the fundamental theorem of calculus:
$$
(f_1^{-1}(y) + f_2^{-1}(y))\big|_\alpha^\beta > 0
(f_1^{-1}(y) + f_2^{-1}(y))\Big|_\alpha^\beta > 0
$$
On rearranging:

View File

@@ -16,6 +16,8 @@ using Roots
---
![A jigsaw puzzle needs a certain amount of area to complete. For a traditional rectangular puzzle, this area is comprised of the sum of the areas for each piece. Decomposing a total area into the sum of smaller, known, ones---even if only approximate---is the basis of definite integration.](figures/jigsaw.png)
The question of area has long fascinated human culture. As children, we learn early on the formulas for the areas of some geometric figures: a square is $b^2$, a rectangle $b\cdot h$, a triangle $1/2 \cdot b \cdot h$ and for a circle, $\pi r^2$. The area of a rectangle is often the intuitive basis for illustrating multiplication. The area of a triangle has been known for ages. Even complicated expressions, such as [Heron's](http://tinyurl.com/mqm9z) formula which relates the area of a triangle with measurements from its perimeter have been around for 2000 years. The formula for the area of a circle is also quite old. Wikipedia dates it as far back as the [Rhind](http://en.wikipedia.org/wiki/Rhind_Mathematical_Papyrus) papyrus for 1700 BC, with the approximation of $256/81$ for $\pi$.
@@ -81,39 +83,46 @@ gr()
f(x) = x^2
colors = [:black, :blue, :orange, :red, :green, :orange, :purple]
## Area of parabola
## Area of parabola
function make_triangle_graph(n)
title = "Area of parabolic cup ..."
n==1 && (title = "Area = 1/2")
n==2 && (title = "Area = previous + 1/8")
n==3 && (title = "Area = previous + 2*(1/8)^2")
n==4 && (title = "Area = previous + 4*(1/8)^3")
n==5 && (title = "Area = previous + 8*(1/8)^4")
n==6 && (title = "Area = previous + 16*(1/8)^5")
n==7 && (title = "Area = previous + 32*(1/8)^6")
n==1 && (title = L"Area $= 1/2$")
n==2 && (title = L"Area $=$ previous $+\; \frac{1}{8}$")
n==3 && (title = L"Area $=$ previous $+\; 2\cdot\frac{1}{8^2}$")
n==4 && (title = L"Area $=$ previous $+\; 4\cdot\frac{1}{8^3}$")
n==5 && (title = L"Area $=$ previous $+\; 8\cdot\frac{1}{8^4}$")
n==6 && (title = L"Area $=$ previous $+\; 16\cdot\frac{1}{8^5}$")
n==7 && (title = L"Area $=$ previous $+\; 32\cdot\frac{1}{8^6}$")
plt = plot(f, 0, 1, legend=false, size = fig_size, linewidth=2)
annotate!(plt, [(0.05, 0.9, text(title,:left))]) # if in title, it grows funny with gr
n >= 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1], color=colors[1], linetype=:polygon, fill=colors[1], alpha=.2)
n == 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1], color=colors[1], linewidth=2)
plt = plot(f, 0, 1;
legend=false,
size = fig_size,
linewidth=2)
annotate!(plt, [
(0.05, 0.9, text(title,:left))
]) # if in title, it grows funny with gr
n >= 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1];
color=colors[1], linetype=:polygon,
fill=colors[1], alpha=.2)
n == 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1];
color=colors[1], linewidth=2)
for k in 2:n
xs = range(0, stop=1, length=1+2^(k-1))
ys = map(f, xs)
k < n && plot!(plt, xs, ys, linetype=:polygon, fill=:black, alpha=.2)
ys = f.(xs)
k < n && plot!(plt, xs, ys;
linetype=:polygon, fill=:black, alpha=.2)
if k == n
plot!(plt, xs, ys, color=colors[k], linetype=:polygon, fill=:black, alpha=.2)
plot!(plt, xs, ys, color=:black, linewidth=2)
plot!(plt, xs, ys;
color=colors[k], linetype=:polygon, fill=:black, alpha=.2)
plot!(plt, xs, ys;
color=:black, linewidth=2)
end
end
plt
end
n = 7
anim = @animate for i=1:n
make_triangle_graph(i)
@@ -183,13 +192,47 @@ $$
S_n = f(c_1) \cdot (x_1 - x_0) + f(c_2) \cdot (x_2 - x_1) + \cdots + f(c_n) \cdot (x_n - x_{n-1}).
$$
Clearly for a given partition and choice of $c_i$, the above can be computed. Each term $f(c_i)\cdot(x_i-x_{i-1})$ can be visualized as the area of a rectangle with base spanning from $x_{i-1}$ to $x_i$ and height given by the function value at $c_i$. The following visualizes left Riemann sums for different values of $n$ in a way that makes Beekman's intuition plausible that as the number of rectangles gets larger, the approximate sum will get closer to the actual area.
Clearly for a given partition and choice of $c_i$, the above can be computed. Each term $f(c_i)\cdot(x_i-x_{i-1}) = f(c_i)\Delta_i$ can be visualized as the area of a rectangle with base spanning from $x_{i-1}$ to $x_i$ and height given by the function value at $c_i$. The following visualizes left Riemann sums for different values of $n$ in a way that makes Beekman's intuition plausible that as the number of rectangles gets larger, the approximate sum will get closer to the actual area.
```{julia}
#| hold: true
#| echo: false
gr()
function left_riemann(n)
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
rectangle = (x, y, w, h) -> Shape(x .+ [0,w,w,0], y .+ [0,0,h,h])
f = x -> -(x+1/2)*(x-1)*(x-3) + 1
a, b= 1, 3
plot(; empty_style...)
plot!(f, a, b; line=(:black, 3))
plot!([a-.25, b+.25], [0,0]; axis_style...)
plot!([a-.1, a-.1], [-.25, .5 + f(a/2 +b/2)]; axis_style...)
Δ = (b-a)/n
for i ∈ 0:n-1
xᵢ = a + i*Δ
plot!(rectangle(xᵢ, 0, Δ, f(xᵢ)), opacity=0.5, color=:red)
end
area = round(sum(f(a + i*Δ)*Δ for i ∈ 0:n-1), digits=3)
annotate!([
(a, 0, text(L"a", :top)),
(b, 0, text(L"b", :top)),
(a, f(a/2+b/2), text("\$L_{$n} = $area\$", :left))
])
current()
end
#=
rectangle(x, y, w, h) = Shape(x .+ [0,w,w,0], y .+ [0,0,h,h])
function ₙ(j)
a = ("₋","","","₀","₁","₂","₃","₄","₅","₆","₇","₈","₉")
@@ -210,6 +253,7 @@ function left_riemann(n)
title!("L$(ₙ(n)) = $a")
p
end
=#
anim = @animate for i ∈ (2,4,8,16,32,64)
left_riemann(i)
@@ -319,7 +363,7 @@ When the integral exists, it is written $V = \int_a^b f(x) dx$.
:::{.callout-note}
## History note
The expression $V = \int_a^b f(x) dx$ is known as the *definite integral* of $f$ over $[a,b]$. Much earlier than Riemann, Cauchy had defined the definite integral in terms of a sum of rectangular products beginning with $S=(x_1 - x_0) f(x_0) + (x_2 - x_1) f(x_1) + \cdots + (x_n - x_{n-1}) f(x_{n-1})$ (the left Riemann sum). He showed the limit was well defined for any continuous function. Riemann's formulation relaxes the choice of partition and the choice of the $c_i$ so that integrability can be better understood.
The expression $V = \int_a^b f(x) dx$ is known as the *definite integral* of $f$ over $[a,b]$. Much earlier than Riemann, Cauchy had defined the definite integral in terms of a sum of rectangular products beginning with $S=f(x_0) \cdot (x_1 - x_0) + f(x_1) \cdot (x_2 - x_1) + \cdots + f(x_{n-1}) \cdot (x_n - x_{n-1}) $ (the left Riemann sum). He showed the limit was well defined for any continuous function. Riemann's formulation relaxes the choice of partition and the choice of the $c_i$ so that integrability can be better understood.
:::
@@ -329,18 +373,6 @@ The expression $V = \int_a^b f(x) dx$ is known as the *definite integral* of $f$
The following formulas are consequences when $f(x)$ is integrable. These mostly follow through a judicious rearranging of the approximating sums.
The area is $0$ when there is no width to the interval to integrate over:
> $$
> \int_a^a f(x) dx = 0.
> $$
Even our definition of a partition doesn't really apply, as we assume $a < b$, but clearly if $a=x_0=x_n=b$ then our only"approximating" sum could be $f(a)(b-a) = 0$.
The area under a constant function is found from the area of rectangle, a special case being $c=0$ yielding $0$ area:
@@ -353,16 +385,65 @@ The area under a constant function is found from the area of rectangle, a specia
For any partition of $a < b$, we have $S_n = c(x_1 - x_0) + c(x_2 -x_1) + \cdots + c(x_n - x_{n-1})$. By factoring out the $c$, we have a *telescoping sum* which means the sum simplifies to $S_n = c(x_n-x_0) = c(b-a)$. Hence any limit must be this constant value.
Scaling the $y$ axis by a constant can be done before or after computing the area:
::: {#fig-consequence-rectangle-area}
```{julia}
#| echo: false
gr()
let
c = 1
a,b = 0.5, 1.5
f(x) = c
Δ = 0.1
plt = plot(;
xaxis=([], false),
yaxis=([], false),
legend=false,
)
plot!(f, a, b; line=(:black, 2))
plot!([a-Δ, b + Δ], [0,0]; line=(:gray, 1), arrow=true, side=:head)
plot!([a-Δ/2, a-Δ/2], [-Δ, c + Δ]; line=(:gray, 1), arrow=true, side=:head)
plot!([a,a],[0,f(a)]; line=(:black, 1, :dash))
plot!([b,b],[0,f(b)]; line=(:black, 1, :dash))
annotate!([
(a, 0, text(L"a", :top, :right)),
(b, 0, text(L"b", :top, :left)),
(a-Δ/2-0.01, c, text(L"c", :right))
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration that the area under a constant function is that of a rectangle
:::
The area is $0$ when there is no width to the interval to integrate over:
> $$
> \int_a^b cf(x) dx = c \int_a^b f(x) dx.
> \int_a^a f(x) dx = 0.
> $$
Let $a=x_0 < x_1 < \cdots < x_n=b$ be any partition. Then we have $S_n= cf(c_1)(x_1-x_0) + \cdots + cf(c_n)(x_n-x_{n-1})$ $=$ $c\cdot\left[ f(c_1)(x_1 - x_0) + \cdots + f(c_n)(x_n - x_{n-1})\right]$. The "limit" of the left side is $\int_a^b c f(x) dx$. The "limit" of the right side is $c \cdot \int_a^b f(x)$. We call this a "sketch" as a formal proof would show that for any $\epsilon$ we could choose a $\delta$ so that any partition with norm $\delta$ will yield a sum less than $\epsilon$. Here, then our "any" partition would be one for which the $\delta$ on the left hand side applies. The computation shows that the same $\delta$ would apply for the right hand side when $\epsilon$ is the same.
Even our definition of a partition doesn't really apply, as we assume $a < b$, but clearly if $a=x_0=x_n=b$ then our only"approximating" sum could be $f(a)(b-a) = 0$.
#### Shifts
A jigsaw puzzle piece will have the same area if it is moved around on the table or flipped over. Similarly some shifts preserve area under a function.
The area is invariant under shifts left or right.
@@ -389,6 +470,59 @@ $$
The left side will have a limit of $\int_a^b f(x-c) dx$ the right would have a "limit" of $\int_{a-c}^{b-c}f(x)dx$.
::: {#fig-consequence-rectangle-area}
```{julia}
#| echo: false
gr()
let
f(x) = 2 + cospi(x^2/10)*sinpi(x)
plt = plot(;
xaxis=([], false),
yaxis=([], false),
legend=false,
)
a, b = 0,4
c = 5
plot!(f, a, b; line=(:black, 2))
plot!(x -> f(x-c), a+c, b+c; line=(:red, 2))
plot!([-1, b+c + 1], [0,0]; line=(:gray, 2), arrow=true, side=:head)
for x ∈ (a,b)
plot!([x,x],[0,f(x)]; line=(:black,1, :dash))
end
for x ∈ (a+c,b+c)
plot!([x,x],[0,f(x)]; line=(:red,1, :dash))
end
annotate!([
(a+c,0, text(L"a", :top)),
(b+c,0, text(L"b", :top)),
(a,0, text(L"a-c", :top)),
(b,0, text(L"b-c", :top)),
(1.0, 3, text(L"f(x)",:left)),
(1.0+c, 3, text(L"f(x-c)",:left)),
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration that the area under shift remains the same
:::
Similarly, reflections don't effect the area under the curve, they just require a new parameterization:
@@ -398,7 +532,79 @@ Similarly, reflections don't effect the area under the curve, they just require
The scaling operation $g(x) = f(cx)$ has the following:
::: {#fig-consequence-reflect-area}
```{julia}
#| echo: false
gr()
let
f(x) = 2 + cospi(x^2/10)*sinpi(x)
g(x) = f(-x)
plt = plot(;
xaxis=([], false),
yaxis=([], false),
legend=false,
)
a, b = 1,4
plot!(f, a, b; line=(:black, 2))
plot!(g, -b, -a; line=(:red, 2))
plot!([-5, 5], [0,0]; line=(:gray,1), arrow=true, side=:head)
plot!([0,0], [-0.1,3.15]; line=(:gray,1), arrow=true, side=:head)
for x in (a,b)
plot!([x,x], [0,f(x)]; line=(:black,1,:dash))
plot!([-x,-x], [0,g(-x)]; line=(:red,1,:dash))
end
annotate!([
(a,0, text(L"a", :top)),
(b,0, text(L"b", :top)),
(-a,0, text(L"-a", :top)),
(-b,0, text(L"-b", :top)),
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration that the area remains constant under reflection through $y$ axis.
:::
The "reversed" area is the same, only accounted for with a minus sign.
> $$
> \int_a^b f(x) dx = -\int_b^a f(x) dx.
> $$
#### Scaling
Scaling the $y$ axis by a constant can be done before or after computing the area:
> $$
> \int_a^b cf(x) dx = c \int_a^b f(x) dx.
> $$
Let $a=x_0 < x_1 < \cdots < x_n=b$ be any partition. Then we have $S_n= cf(c_1)(x_1-x_0) + \cdots + cf(c_n)(x_n-x_{n-1})$ $=$ $c\cdot\left[ f(c_1)(x_1 - x_0) + \cdots + f(c_n)(x_n - x_{n-1})\right]$. The "limit" of the left side is $\int_a^b c f(x) dx$. The "limit" of the right side is $c \cdot \int_a^b f(x)$. We call this a "sketch" as a formal proof would show that for any $\epsilon$ we could choose a $\delta$ so that any partition with norm $\delta$ will yield a sum less than $\epsilon$. Here, then our "any" partition would be one for which the $\delta$ on the left hand side applies. The computation shows that the same $\delta$ would apply for the right hand side when $\epsilon$ is the same.
The scaling operation on the $x$ axis, $g(x) = f(cx)$, has the following property:
> $$
@@ -413,8 +619,9 @@ The scaling operation shifts $a$ to $ca$ and $b$ to $cb$ so the limits of integr
Combining two operations above, the operation $g(x) = \frac{1}{h}f(\frac{x-c}{h})$ will leave the area between $a$ and $b$ under $g$ the same as the area under $f$ between $(a-c)/h$ and $(b-c)/h$.
---
#### Area is additive
When two jigsaw pieces interlock their combined area is that of each added. This also applies to areas under functions.
The area between $a$ and $b$ can be broken up into the sum of the area between $a$ and $c$ and that between $c$ and $b$.
@@ -428,35 +635,185 @@ The area between $a$ and $b$ can be broken up into the sum of the area between $
For this, suppose we have a partition for both the integrals on the right hand side for a given $\epsilon/2$ and $\delta$. Combining these into a partition of $[a,b]$ will mean $\delta$ is still the norm. The approximating sum will combine to be no more than $\epsilon/2 + \epsilon/2$, so for a given $\epsilon$, this $\delta$ applies.
This is due to the area on the left and right of $0$ being equivalent.
::: {#fig-consequence-additive-area}
```{julia}
#| echo: false
gr()
let
f(x) = 2 + cospi(x^2/7)*sinpi(x)
a,b,c = 0.1, 8, 3
xs = range(a,c,100)
A1 = Shape(vcat(xs,c,a), vcat(f.(xs), 0, 0))
xs = range(c,b,100)
A2 = Shape(vcat(xs,b,c), vcat(f.(xs), 0, 0))
The "reversed" area is the same, only accounted for with a minus sign.
plt = plot(;
xaxis=([], false),
yaxis=([], false),
legend=false,
)
plot!([0,0] .- 0.1,[-.1,3]; line=(:gray, 1), arrow=true, side=:head)
plot!([0-.2, b+0.5],[0,0]; line=(:gray, 1), arrow=true, side=:head)
plot!(A1; fill=(:gray60,1), line=(nothing,))
plot!(A2; fill=(:gray90,1), line=(nothing,))
plot!(f, a, b; line=(:black, 2))
for x in (a,b,c)
plot!([x,x], [0, f(x)]; line=(:black, 1, :dash))
end
> $$
> \int_a^b f(x) dx = -\int_b^a f(x) dx.
> $$
annotate!([(x,0,text(latexstring("$y"),:top)) for (x,y) in zip((a,b,c),("a","b","c"))])
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration that the area between $a$ and $b$ can be computed as area between $a$ and $c$ and then $c$ and $b$.
:::
A consequence of the last few statements is:
> If $f(x)$ is an even function, then $\int_{-a}^a f(x) dx = 2 \int_0^a f(x) dx$. If $f(x)$ is an odd function, then $\int_{-a}^a f(x) dx = 0$.
> If $f(x)$ is an even function, then $\int_{-a}^a f(x) dx = 2 \int_0^a f(x) dx$.
> If $f(x)$ is an odd function, then $\int_{-a}^a f(x) dx = 0$.
Additivity works in the $y$ direction as well.
If $f(x)$ and $g(x)$ are two functions then
> $$
> \int_a^b (f(x) + g(x)) dx = \int_a^b f(x) dx + \int_a^b g(x) dx
> $$
For any partitioning with $x_i, x_{i-1}$ and $c_i$ this holds:
$$
(f(c_i) + g(c_i)) \cdot (x_i - x_{i-1}) =
f(c_i) \cdot (x_i - x_{i-1}) + g(c_i) \cdot (x_i - x_{i-1})
$$
This leads to the same statement for the areas under the curves.
The *linearity* of the integration operation refers to this combination of the above:
> $$
> \int_a^b (cf(x) + dg(x)) dx = c\int_a^b f(x) dx + d \int_a^b g(x)dx
> $$
The integral of a shifted function satisfies:
> $$
> \int_a^b \left(D + C\cdot f(\frac{x - B}{A})\right) dx = D\cdot(b-a) + C \cdot A \int_{\frac{a-B}{A}}^{\frac{b-B}{A}} f(x) dx
> $$
This follows from a few of the statements above:
$$
\begin{align*}
\int_a^b \left(D + C\cdot f(\frac{x - B}{A})\right) dx &=
\int_a^b D dx + C \int_a^b f(\frac{x-B}{A}) dx \\
&= D\cdot(b-a) + C\cdot A \int_{\frac{a-B}{A}}^{\frac{b-B}{A}} f(x) dx
\end{align*}
$$
#### Inequalities
Area under a non-negative function is non-negative
> $$
> \int_a^b f(x) dx \geq 0,\quad\text{when } a < b, \text{ and } f(x) \geq 0
> $$
Under this assumption, for any partitioning with $x_i, x_{i-1}$ and $c_i$ it holds the $f(c_i)\cdot(x_i - x_{i-1}) \geq 0$. So any sum of non-negative values can only be non-negative, even in the limit.
If $g$ bounds $f$ then the area under $g$ will bound the area under $f$, in particular if $f(x)$ is non negative, so will the area under $f$ be non negative for any $a < b$. (This assumes that $g$ and $f$ are integrable.)
> If $0 \leq f(x) \leq g(x)$ then $\int_a^b f(x) dx \leq \int_a^b g(x) dx.$
If $g$ bounds $f$ then the area under $g$ will bound the area under $f$.
> $$
> $\int_a^b f(x) dx \leq \int_a^b g(x) dx \quad\text{when } a < b\text{ and } 0 \leq f(x) \leq g(x)
> $$
For any partition of $[a,b]$ and choice of $c_i$, we have the term-by-term bound $f(c_i)(x_i-x_{i-1}) \leq g(c_i)(x_i-x_{i-1})$ So any sequence of partitions that converges to the limits will have this inequality maintained for the sum.
::: {#fig-consequence-0-area}
```{julia}
#| echo: false
gr()
let
f(x) = 1/6+x^3*(2-x)/2
g(x) = 1/6+exp(x/3)+(1-x/1.7)^6-0.6
a, b = 0, 2
plot(; empty_style...)
plot!([a-.5, b+.25], [0,0]; line=(:gray, 1), arrow=true, side=:head)
plot!([0,0] .- 0.25, [-0.25, 1.8]; line=(:gray, 1), arrow=true, side=:head)
xs = range(a,b,250)
S = Shape(vcat(xs, reverse(xs)), vcat(f.(xs), g.(reverse(xs))))
plot!(S; fill=(:gray70, 0.3), line=(nothing,))
S = Shape(vcat(xs, reverse(xs)), vcat(zero.(xs), f.(reverse(xs))))
plot!(S; fill=(:gray90, 0.3), line=(nothing,))
plot!(f, a, b; line=(:black, 4))
plot!(g, a, b; line=(:black, 2))
for x in (a,b)
plot!([x,x], [0, g(x)]; line=(:black,1,:dash))
end
annotate!([(x,0,text(t, :top)) for (x,t) in zip((a,b),(L"a", L"b"))])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration that if $f(x) \le g(x)$ on $[a,b]$ then the integrals share the same property. The excess area is clearly positive.
:::
(This also follows by considering $h(x) = g(x) - f(x) \geq 0$ by assumption, so $\int_a^b h(x) dx \geq 0$.)
For non-negative functions, integrals over larger domains are bigger
> $$
> \int_a^c f(x) dx \le \int_a^b f(x) dx,\quad\text{when } c < b \text{ and } f(x) \ge 0
> $$
This follows as $\int_c^b f(x) dx$ is non-negative under these assumptions.
### Some known integrals
@@ -571,7 +928,7 @@ The main idea behind this is that the difference between the maximum and minimum
For example, the function $f(x) = 1$ for $x$ in $[0,1]$ and $0$ otherwise will be integrable, as it is continuous at all but two points, $0$ and $1$, where it jumps.
* Some functions can have infinitely many points of discontinuity and still be integrable. The example of $f(x) = 1/q$ when $x=p/q$ is rational, and $0$ otherwise is often used as an example.
* Some functions can have infinitely many points of discontinuity and still be integrable. The example of $f(x) = 1/q$ when $x=p/q$ is rational, and $0$ otherwise is often used to illustrate this.
## Numeric integration
@@ -601,11 +958,11 @@ deltas = diff(xs) # forms x2-x1, x3-x2, ..., xn-xn-1
cs = xs[1:end-1] # finds left-hand end points. xs[2:end] would be right-hand ones.
```
Now to multiply the values. We want to sum the product `f(cs[i]) * deltas[i]`, here is one way to do so:
We want to sum the products $f(c_i)\Delta_i$. Here is one way to do so using `zip` to iterate over the paired off values in `cs` and `deltas`.
```{julia}
sum(f(cs[i]) * deltas[i] for i in 1:length(deltas))
sum(f(ci)*Δi for (ci, Δi) in zip(cs, deltas))
```
Our answer is not so close to the value of $1/3$, but what did we expect - we only used $n=5$ intervals. Trying again with $50,000$ gives us:
@@ -617,7 +974,7 @@ n = 50_000
xs = a:(b-a)/n:b
deltas = diff(xs)
cs = xs[1:end-1]
sum(f(cs[i]) * deltas[i] for i in 1:length(deltas))
sum(f(ci)*Δi for (ci, Δi) in zip(cs, deltas))
```
This value is about $10^{-5}$ off from the actual answer of $1/3$.
@@ -631,19 +988,19 @@ Before continuing, we define a function to compute Riemann sums for us with an
```{julia}
#| eval: false
riemann(f, a, b, n; method="right") = riemann(f, range(a,b,n+1); method=method)
function riemann(f, xs; method="right")
Ms = (left = (f,a,b) -> f(a),
right= (f,a,b) -> f(b),
Ms = (left = (f,a,b) -> f(a),
right = (f,a,b) -> f(b),
trapezoid = (f,a,b) -> (f(a) + f(b))/2,
simpsons = (f,a,b) -> (c = a/2 + b/2; (1/6) * (f(a) + 4*f(c) + f(b))),
simpsons = (f,a,b) -> (c = a/2 + b/2; (1/6) * (f(a) + 4*f(c) + f(b)))
)
_riemann(Ms[Symbol(method)], f, xs)
end
function _riemann(M, f, xs)
M = Ms[Symbol(method)}
xs = zip(xs[1:end-1], xs[2:end])
sum(M(f, a, b) * (b-a) for (a,b) ∈ xs)
end
riemann(f, a, b, n; method="right") =
riemann(f, range(a,b,n+1); method)
```
(This function is defined in `CalculusWithJulia` and need not be copied over if that package is loaded.)
@@ -699,7 +1056,7 @@ Consider a function $g(x)$ defined through its piecewise linear graph:
```{julia}
#| echo: false
g(x) = abs(x) > 2 ? 1.0 : abs(x) - 1.0
plot(g, -3,3)
plot(g, -3,3; legend=false)
plot!(zero)
```
@@ -710,6 +1067,25 @@ plot!(zero)
We could add the signed area over $[0,1]$ to the above, but instead see a square of area $1$, a triangle with area $1/2$ and a triangle with signed area $-1$. The total is then $1/2$.
This figure---using equal sized axes---may make the above decomposition more clear:
```{julia}
#| echo: false
let
g(x) = abs(x) > 2 ? 1.0 : abs(x) - 1.0
xs = [ -3, -2, -1, 1, 2, 3]
plot(; legend=false, aspect_ratio=:equal)
plot!(Shape([-3,-2,-2,-3], [0,0,1,1]); fill=(:gray,))
plot!(Shape([-2,-1,-1,-2], [0,0,0,1]); fill=(:gray10,))
plot!(Shape([-1,0,1], [0,-1,0]); fill=(:gray90,))
plot!(Shape([1,2,2,1], [0,1,0,0]); fill=(:gray10,))
plot!(Shape([2,3,3,2], [0,0,1,1]); fill=(:gray,))
plot!([0,0], [0, g(0)]; line=(:black,1,:dash))
end
```
* Compute $\int_{-3}^{3} g(x) dx$:
@@ -791,7 +1167,9 @@ We have the well-known triangle [inequality](http://en.wikipedia.org/wiki/Triang
This suggests that the following inequality holds for integrals:
> $\lvert \int_a^b f(x) dx \rvert \leq \int_a^b \lvert f(x) \rvert dx$.
> $$
> \lvert \int_a^b f(x) dx \rvert \leq \int_a^b \lvert f(x) \rvert dx$.
> $$
@@ -869,7 +1247,8 @@ The formulas for an approximation to the integral $\int_{-1}^1 f(x) dx$ discusse
$$
\begin{align*}
S &= f(x_1) \Delta_1 + f(x_2) \Delta_2 + \cdots + f(x_n) \Delta_n\\
&= w_1 f(x_1) + w_2 f(x_2) + \cdots + w_n f(x_n).
&= w_1 f(x_1) + w_2 f(x_2) + \cdots + w_n f(x_n)\\
&= \sum_{i=1}^n w_i f(x_i).
\end{align*}
$$
@@ -904,7 +1283,7 @@ f(x) = x^5 - x + 1
quadgk(f, -2, 2)
```
The error term is $0$, answer is $4$ up to the last unit of precision (1 ulp), so any error is only in floating point approximations.
The error term is $0$, the answer is $4$ up to the last unit of precision (1 ulp), so any error is only in floating point approximations.
For the numeric computation of definite integrals, the `quadgk` function should be used over the Riemann sums or even Simpson's rule.
@@ -1423,6 +1802,70 @@ val, _ = quadgk(f, a, b)
numericq(val)
```
###### Question
Let $A=1.98$ and $B=1.135$ and
$$
f(x) = \frac{1 - e^{-Ax}}{B\sqrt{\pi}x} e^{-x^2}.
$$
Find $\int_0^1 f(x) dx$
```{julia}
#| echo: false
let
A,B = 1.98, 1.135
f(x) = (1 - exp(-A*x))*exp(-x^2)/(B*sqrt(pi)*x)
val,_ = quadgk(f, 0, 1)
numericq(val)
end
```
###### Question
A bound for the complementary error function ( positive function) is
$$
\text{erfc}(x) \leq \frac{1}{2}e^{-2x^2} + \frac{1}{2}e^{-x^2} \leq e^{-x^2}
\quad x \geq 0.
$$
Let $f(x)$ be the first bound, $g(x)$ the second.
Assuming this is true, confirm numerically using `quadgk` that
$$
\int_0^3 f(x) dx \leq \int_0^3 g(x) dx
$$
The value of $\int_0^3 f(x) dx$ is
```{julia}
#| echo: false
let
f(x) = 1/2 * exp(-2x^2) + 1/2 * exp(-x^2)
val,_ = quadgk(f, 0, 3)
numericq(val)
end
```
The value of $\int_0^3 g(x) dx$ is
```{julia}
#| echo: false
let
g(x) = exp(-x^2)
val,_ = quadgk(g, 0, 3)
numericq(val)
end
```
###### Question

View File

@@ -32,7 +32,7 @@ can be interpreted as the "signed" area between $f(x)$ and $g(x)$ over $[a,b]$.
```{julia}
#| hold: true
#| echo: false
#| label: fig-area-between-f-g
#| label: fig-area-between-f-g-shade
#| fig-cap: "Area between two functions"
f1(x) = x^2
g1(x) = sqrt(x)
@@ -64,7 +64,88 @@ $$
#### Examples
Find the area between
$$
\begin{align*}
f(x) &= \frac{x^3 \cdot (2-x)}{2} \text{ and } \\
g(x) &= e^{x/3} + (1-\frac{x}{1.7})^6 - 0.6
\end{align*}
$$
over the interval $[0.2, 1.7]$. The area is illustrated in the figure below.
```{julia}
f(x) = x^3*(2-x)/2
g(x) = exp(x/3) + (1 - (x/1.7))^6 - 0.6
a, b = 0.2, 1.7
h(x) = g(x) - f(x)
answer, _ = quadgk(h, a, b)
answer
```
::: {#fig-area-between-f-g}
```{julia}
#| echo: false
p = let
gr()
# area between graphs
# https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09areabetweengraphs.pdf
f(x) = 1/6+x^3*(2-x)/2
g(x) = 1/6+exp(x/3)+(1-x/1.7)^6-0.6
a,b =0.2, 1.7
A, B = 0, 2
A, B = A + .1, B - .1
n = 20
plot(; empty_style..., aspect_ratio=:equal, xlims=(A,B))
plot!(f, A, B; fn_style...)
plot!(g, A, B; fn_style...)
xp = range(a, b, n)
marked = n ÷ 2
for i in 1:n-1
x0, x1 = xp[i], xp[i+1]
mpt = (x0 + x1)/2
R = Shape([x0,x1,x1,x0], [f(mpt),f(mpt),g(mpt),g(mpt)])
color = i == marked ? :gray : :white
plot!(R; fill=(color, 0.5), line=(:black, 1))
end
# axis
plot!([(A,0),(B,0)]; axis_style...)
# highlight
x0, x1 = xp[marked], xp[marked+1]
_style = (;line=(:gray, 1, :dash))
plot!([(a,0), (a, f(a))]; _style...)
plot!([(b,0), (b,f(b))]; _style...)
plot!([(x0,0), (x0, f(x0))]; _style...)
plot!([(x1,0), (x1, f(x1))]; _style...)
annotate!([
(B, f(B), text(L"f(x)", 10, :left,:top)),
(B, g(B), text(L"g(x)", 10, :left, :bottom)),
(a, 0, text(L"a=x_0", 10, :top, :left)),
(b, 0, text(L"b=x_n", 10, :top, :left)),
(x0, 0, text(L"x_i", 10, :top)),
(x1, 0, text(L"x_{i+1}", 10, :top,:left))
])
current()
end
plotly()
p
```
Illustration of a Riemann sum approximation to estimate the area between $f(x)$ and $g(x)$ over an interval $[a,b]$. (Figure follows one by @Angenent.)
:::
##### Example
Find the area bounded by the line $y=2x$ and the curve $y=2 - x^2$.
@@ -367,7 +448,7 @@ When doing problems by hand this latter style can often reduce the complications
Consider two overlapping circles, one with smaller radius. How much area is in the larger circle that is not in the smaller? The question came up on the `Julia` [discourse](https://discourse.julialang.org/t/is-there-package-or-method-to-calculate-certain-area-in-julia-symbolically-with-sympy/99751) discussion board. A solution, modified from an answer of `@rocco_sprmnt21`, follows.
Without losing too-much generality, we can consider the smaller circle to have radius $a$, the larger circle to have radius $b$ and centered at $(0,c)$.
We assume some overlap -- $a \ge c-b$, but not too much -- $c-b \ge 0$ or $0 \le c-b \le a$.
We assume some overlap---$a \ge c-b$, but not too much---$c-b \ge 0$ or $0 \le c-b \le a$.
```{julia}
@syms x::real y::real a::positive b::positive c::positive
@@ -545,7 +626,6 @@ Each term describes the area of a trapezoid, possibly signed.
This figure illustrates for a simple case:
```{julia}
using Plots
xs = [1, 3, 4, 2, 1] # n = 4 to give 5=n+1 values
ys = [1, 1, 2, 3, 1]
p = plot(xs, ys; line=(3, :black), ylims=(0,4), legend=false)
@@ -970,4 +1050,3 @@ choices = ["The two enclosed areas should be equal",
"The two enclosed areas are clearly different, as they do not overap"]
radioq(choices, 1)
```

View File

@@ -1,4 +1,4 @@
# Center of Mass
# Center of mass
{{< include ../_common_code.qmd >}}

Binary file not shown.

After

Width:  |  Height:  |  Size: 727 KiB

View File

@@ -156,10 +156,75 @@ In Part 1, the integral $F(x) = \int_a^x f(u) du$ is defined for any Riemann int
:::
This figure relating the area under some continuous $f(x)$ from $a$ to both $x$ and $x+h$ for some small $h$ helps to visualize the two fundamental theorems.
::: {#fig-FTC-derivative}
```{julia}
#| echo: false
let
gr()
f(x) = sin(x)
A(x) = cos(x)
a,b = 0, 6pi/13
h = pi/20
xs = range(a, b, 100)
p1 = plot(; empty_style...)
plot!([0,0] .- 0.05,[-0.1, 1]; line=(:gray,1), arrow=true, side=:head)
plot!([-0.1, b+h + pi/10], [0,0]; line=(:gray,1), arrow=true, side=:head)
xs = range(a, b, 100)
S = Shape(vcat(xs, reverse(xs)), vcat(f.(xs), zero.(xs)))
plot!(S; fill=(:gray90, 0.25), line=(nothing,))
plot!(f, a, b+h; line=(:black, 2))
xs = range(b, b+h, 100)
S = Shape(vcat(xs, reverse(xs)), vcat(f.(xs), zero.(xs)))
plot!(S; fill=(:gray70, 0.25), line=(nothing,))
plot!([b,b,b+h,b+h],[0,f(b),f(b),0]; line=(:black,1,:dash))
annotate!([
(a,0,text(L"a", :top, :left)),
(b, 0, text(L"x", :top)),
(b+h,0,text(L"x+h", :top)),
(2b/3, 1/2, text(L"A(x)")),
(b + h/2, 1/2, text(L"f(x)\cdot h \approx A(x+h)-A(x)", rotation=90))
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Area under curve between $a$ and $b$ labeled with $A(b)$ for $b=x$ and $b=x+h$.
:::
The last rectangle is exactly $f(x)h$ and approximately $A(x+h)-A(x)$, the difference being the small cap above the shaded rectangle. This gives the approximate derivative:
$$
A'(x) \approx \frac{A(x+h) - A(x)}{h} \approx \frac{f(x)\cdot h}{h} = f(x)
$$
That is, by taking limits, $A(x) = \int_a^x f(u) du$ is an antiderivative of $f(x)$. Moreover, from geometric considerations of area, if $a < c < b$, then
$$
A(b) - A(c) = \int_a^b f(x) dx - \int_a^c f(x) dx = \int_c^b f(x) dx
$$
That is $A(x)$ satisfies the two parts of the fundamental theorem.
## Using the fundamental theorem of calculus to evaluate definite integrals
The major use of the FTC is the computation of $\int_a^b f(x) dx$. Rather than resort to Riemann sums or geometric arguments, there is an alternative - *when possible*, find a function $F$ with $F'(x) = f(x)$ and compute $F(b) - F(a)$.
The most visible use of the FTC is the computation of definite integrals, $\int_a^b f(x) dx$. Rather than resort to Riemann sums or geometric arguments, there is an alternative - *when possible*, find a function $F$ with $F'(x) = f(x)$ and compute $F(b) - F(a)$.
Some examples:
@@ -213,21 +278,21 @@ The expression $F(b) - F(a)$ is often written in this more compact form:
$$
\int_a^b f(x) dx = F(b) - F(a) = F(x)\big|_{x=a}^b, \text{ or just expr}\big|_{x=a}^b.
\int_a^b f(x) dx = F(b) - F(a) = F(x)\Big|_{x=a}^b, \text{ or just expr}\Big|_{x=a}^b.
$$
The vertical bar is used for the *evaluation* step, in this case the $a$ and $b$ mirror that of the definite integral. This notation lends itself to working inline, as we illustrate with this next problem where we "know" a function "$F$", so just express it "inline":
$$
\int_0^{\pi/4} \sec^2(x) dx = \tan(x) \big|_{x=0}^{\pi/4} = 1 - 0 = 1.
\int_0^{\pi/4} \sec^2(x) dx = \tan(x) \Big|_{x=0}^{\pi/4} = 1 - 0 = 1.
$$
A consequence of this notation is:
$$
F(x) \big|_{x=a}^b = -F(x) \big|_{x=b}^a.
F(x) \Big|_{x=a}^b = -F(x) \Big|_{x=b}^a.
$$
This says nothing more than $F(b)-F(a) = -F(a) - (-F(b))$, though more compactly.
@@ -324,13 +389,13 @@ Answers may not be available as elementary functions, but there may be special f
integrate(x / sqrt(1-x^3), x)
```
The different cases explored by `integrate` are after the questions.
Different cases explored by `integrate` are mentioned after the questions.
## Rules of integration
There are some "rules" of integration that allow integrals to be re-expressed. These follow from the rules of derivatives.
There are some "rules" of integration that allow indefinite integrals to be re-expressed.
* The integral of a constant times a function:
@@ -353,7 +418,7 @@ $$
This follows immediately as if $F(x)$ and $G(x)$ are antiderivatives of $f(x)$ and $g(x)$, then $[F(x) + G(x)]' = f(x) + g(x)$, so the right hand side will have a derivative of $f(x) + g(x)$.
In fact, this more general form where $c$ and $d$ are constants covers both cases:
In fact, this more general form where $c$ and $d$ are constants covers both cases and referred to by the linearity of the integral:
$$
@@ -373,7 +438,7 @@ $$
\begin{align*}
\int (a_n x^n + \cdots + a_1 x + a_0) dx
&= \int a_nx^n dx + \cdots + \int a_1 x dx + \int a_0 dx \\
&= a_n \int x^n dx + \cdots + a_1 \int x dx + a_0 \int dx \\
&= a_n \int x^n dx + \cdots + a_1 \int x^1 dx + a_0 \int x^0 dx \\
&= a_n\frac{x^{n+1}}{n+1} + \cdots + a_1 \frac{x^2}{2} + a_0 \frac{x}{1}.
\end{align*}
$$
@@ -417,12 +482,14 @@ This seems like a lot of work, and indeed it is more than is needed. The followi
$$
\int_0^\pi 100 \sin(x) dx = 100(-\cos(x)) \big|_0^{\pi} = 100 \cos(x) \big|_{\pi}^0 = 100(1) - 100(-1) = 200.
\int_0^\pi 100 \sin(x) dx = 100(-\cos(x)) \Big|_0^{\pi} = 100 \cos(x) \Big|_{\pi}^0 = 100(1) - 100(-1) = 200.
$$
## The derivative of the integral
The relationship that $[\int_a^x f(u) du]' = f(x)$ is a bit harder to appreciate, as it doesn't help answer many ready made questions. Here we give some examples of its use.
@@ -433,12 +500,16 @@ $$
F(x) = \int_a^x f(u) du.
$$
The value of $a$ does not matter, as long as the integral is defined.
The value of $a$ does not matter, as long as the integral is defined. This $F$ satisfies the first fundamental theorem, as $F(a)=0$.
```{julia}
#| hold: true
#| echo: false
#| eval: false
##{{{ftc_graph}}}
gr()
function make_ftc_graph(n)
@@ -479,9 +550,9 @@ imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)
plotly()
ImageFile(imgfile, caption)
```
The picture for this, for non-negative $f$, is of accumulating area as $x$ increases. It can be used to give insight into some formulas:
#The picture for this, for non-negative $f$, is of accumulating area as $x$ increases. It can be used to give insight into some formulas:
```
For any function, we know that $F(b) - F(c) + F(c) - F(a) = F(b) - F(a)$. For this specific function, this translates into this property of the integral:
@@ -550,7 +621,7 @@ In probability theory, for a positive, continuous random variable, the probabili
For example, the exponential distribution with rate $1$ has $f(x) = e^{-x}$. Compute $F(x)$.
This is just $F(x) = \int_0^x e^{-u} du = -e^{-u}\big|_0^x = 1 - e^{-x}$.
This is just $F(x) = \int_0^x e^{-u} du = -e^{-u}\Big|_0^x = 1 - e^{-x}$.
The "uniform" distribution on $[a,b]$ has
@@ -1120,6 +1191,192 @@ answ = 2
radioq(choices, answ)
```
###### Question
The error function (`erf`) is defined in terms of an integral:
$$
\text{erf}(x) = \frac{2}{\sqrt{\pi}} \int_0^x \exp(-t^2) dt, \quad{x \geq 0}
$$
The constant is chosen so that $\lim_{x \rightarrow \infty} \text{erf}(x) = 1$.
What is the derivative of $\text{erf}(x)$?
```{julia}
#| echo: false
choices = [L"\exp(-x^2)",
L"-2x \exp(-x^2)",
L"\frac{2}{\sqrt{\pi}} \exp(-x^2)"]
radioq(choices, 3; keep_order=true, explanation="Don't forget the scalar multiple")
```
Is the function $\text{erf(x)}$ *increasing* on $[0,\infty)$?
```{julia}
#| echo: false
choices = ["No",
"Yes, the derivative is positive on this interval",
"Yes, the derivative is negative on this interval",
"Yes, the derivative is increasing on this interval",
"Yes, the derivative is decreasing on this interval"]
radioq(choices, 2; keep_order=true)
```
Is the function $\text{erf(x)}$ *concave down* on $[0,\infty)$?
```{julia}
#| echo: false
choices = ["No",
"Yes, the derivative is positive on this interval",
"Yes, the derivative is negative on this interval",
"Yes, the derivative is increasing on this interval",
"Yes, the derivative is decreasing on this interval"]
radioq(choices, 5; keep_order=true)
```
For $x > 0$, consider the function
$$
F(x) = \frac{2}{\sqrt{\pi}} \int_{-x}^0 \exp(-t^2) dt
$$
Why is $F'(x) = \text{erf}'(x)$?
```{julia}
#| echo: false
choices = ["The integrand is an *even* function so the integral from ``0`` to ``x`` is the same as the integral from ``-x`` to ``0``",
"This isn't true"]
radioq(choices, 1; keep_order=true)
```
Consider the function
$$
F(x) = \frac{2}{\sqrt{\pi}} \int_0^{\sqrt{x}} \exp(-t^2) dt, \quad x \geq 0
$$
What is the derivative of $F$?
```{julia}
#| echo: false
choices = [L"\exp(-x^2)",
L"\frac{2}{\sqrt{\pi}} \exp(-x^2)",
L"\frac{2}{\sqrt{\pi}} \exp(-x^2) \cdot (-2x)"]
radioq(choices, 3; keep_order=true, explanation="Don't forget to apply the chain rule, as ``F(x) = \\text{erf}(\\sqrt{x})``")
```
###### Question
Define two function through the integrals:
$$
\begin{align*}
S(x) &= \int_0^x \sin(t^2) dt\\
C(x) &= \int_0^x \cos(t^2) dt
\end{align*}
$$
These are called *Fresnel Integrals*.
A non-performant implementation might look like:
```{julia}
S(x) = first(quadgk(t -> sin(t^2), 0, x))
```
Define a similar function for $C(x)$ and them make a parametric plot for $0 \le t \le 5$.
Describe the shape.
```{julia}
#| echo: false
choices = ["It makes a lovely star shape",
"It makes a lovely spiral shape",
"It makes a lovely circle"]
radioq(choices, 2; keep_order=true)
```
What is the value of $S'(x)^2 + C'(x)^2$ when $x=\pi$?
```{julia}
#| echo: false
numericq(1)
```
###### Question
Define a function with parameter $\alpha \geq 1$ by:
$$
\gamma(x; \alpha) = \int_0^x \exp(-t) t^{\alpha-1} dt, \quad x > 0
$$
What is the ratio of $\gamma'(2; 3) / \gamma'(2; 4)$?
```{julia}
#| echo: false
df(x,alpha) = exp(-x)*x^(alpha -1)
numericq(df(2,3)/df(2,4))
```
###### Question
Define a function
$$
i(x) = \int_0^{x^2} \exp(-t) t^{1/2} dt
$$
What is the derivative if $i$?
```{julia}
#| echo: false
choices = [L"\exp(-x) x^{1/2}",
L"\exp(-x) x^{1/2} \cdot 2x",
L"\exp(-x^2) (x^2)^{1/2}",
L"\exp(-x^2) (x^2)^{1/2}\cdot 2x"]
radioq(choices, 4; keep_order=true)
```
###### Question
The function `sinint` from `SpecialFunctions` computes
$$
F(x) = \int_0^x \frac{\sin(t)}{t} dt = \int_0^x \phi(t) dt,
$$
Where we define $\phi$ above to be $1$ when $t=0$, so that it will be continuous over $[0,x]$.
A related integral might be:
$$
G(x) = \int_0^x \frac{\sin(\pi t)}{\pi t} dt = \int_0^x \phi(\pi t) dt
$$
As this is an integral involving a simple transformation of $\phi(x)$, we can see that $G(x) = (1/\pi) F(\pi x)$. What is the derivative of $G$?
```{julia}
#| echo: false
choices = [
L"\phi(x)",
L"\phi(\pi x)",
L"\pi \phi(\pi x)"
]
radioq(choices, 2; keep_order=true)
```
###### Question
@@ -1144,12 +1401,14 @@ radioq(choices, answ, keep_order=true)
Barrow presented a version of the fundamental theorem of calculus in a 1670 volume edited by Newton, Barrow's student (cf. [Wagner](http://www.maa.org/sites/default/files/0746834234133.di020795.02p0640b.pdf)). His version can be stated as follows (cf. [Jardine](http://www.maa.org/publications/ebooks/mathematical-time-capsules)):
Consider the following figure where $f$ is a strictly increasing function with $f(0) = 0$. and $x > 0$. The function $A(x) = \int_0^x f(u) du$ is also plotted. The point $Q$ is $f(x)$, and the point $P$ is $A(x)$. The point $T$ is chosen to so that the length between $T$ and $x$ times the length between $Q$ and $x$ equals the length from $P$ to $x$. ($\lvert Tx \rvert \cdot \lvert Qx \rvert = \lvert Px \rvert$.) Barrow showed that the line segment $PT$ is tangent to the graph of $A(x)$. This figure illustrates the labeling for some function:
Consider the following figure where $f$ is a strictly increasing function with $f(0) = 0$. and $x > 0$. The function $A(x) = \int_0^x f(u) du$ is also plotted with a dashed red line. The point $Q$ is $f(x)$, and the point $P$ is $A(x)$. The point $T$ is chosen to so that the length between $T$ and $x$ times the length between $Q$ and $x$ equals the length from $P$ to $x$. ($\lvert Tx \rvert \cdot \lvert Qx \rvert = \lvert Px \rvert$.) Barrow showed that the line segment $PT$ is tangent to the graph of $A(x)$. This figure illustrates the labeling for some function:
```{julia}
#| hold: true
#| echo: false
let
gr()
f(x) = x^(2/3)
x = 2
A(x) = quadgk(f, 0, x)[1]
@@ -1160,14 +1419,21 @@ P = A(x)
secpt = u -> 0 + P/(x-T) * (u-T)
xs = range(0, stop=x+1/4, length=50
)
p = plot(f, 0, x + 1/4, legend=false)
plot!(p, A, 0, x + 1/4, color=:red)
p = plot(f, 0, x + 1/4, legend=false, line=(:black,2))
plot!(p, A, 0, x + 1/4, line=(:red, 2,:dash))
scatter!(p, [T, x, x, x], [0, 0, Q, P], color=:orange)
annotate!(p, collect(zip([T, x, x+.1, x+.1], [0-.15, 0-.15, Q-.1, P], ["T", "x", "Q", "P"])))
annotate!(p, collect(zip([T, x, x+.1, x+.1], [0-.15, 0-.15, Q-.1, P], [L"T", L"x", L"Q", L"P"])))
plot!(p, [T-1/4, x+1/4], map(secpt, [T-1/4, x + 1/4]), color=:orange)
plot!(p, [T, x, x], [0, 0, P], color=:green)
p
p
end
```
```{julia}
#| echo: false
plotly()
nothing
```
The fact that $\lvert Tx \rvert \cdot \lvert Qx \rvert = \lvert Px \rvert$ says what in terms of $f(x)$, $A(x)$ and $A'(x)$?

View File

@@ -1,4 +1,4 @@
# Improper Integrals
# Improper integrals
{{< include ../_common_code.qmd >}}
@@ -33,20 +33,26 @@ function make_sqrt_x_graph(n)
b = 1
a = 1/2^n
xs = range(1/2^8, stop=b, length=250)
x1s = range(a, stop=b, length=50)
xs = range(1/2^n, stop=b, length=1000)
x1s = range(a, stop=b, length=1000)
@syms x
f(x) = 1/sqrt(x)
val = N(integrate(f(x), (x, 1/2^n, b)))
title = "area under f over [1/$(2^n), $b] is $(rpad(round(val, digits=2), 4))"
plt = plot(f, range(a, stop=b, length=251), xlim=(0,b), ylim=(0, 15), legend=false, size=fig_size, title=title)
plot!(plt, [b, a, x1s...], [0, 0, map(f, x1s)...], linetype=:polygon, color=:orange)
title = L"area under $f$ over $[2^{-%$n}, %$b]$ is $%$(rpad(round(val, digits=2), 4))$"
plt = plot(f, range(a, stop=b, length=1000);
xlim=(0,b), ylim=(0, 15),
legend=false,
title=title)
plot!(plt, [b, a, x1s...], [0, 0, map(f, x1s)...];
linetype=:polygon, color=:orange)
plt
end
caption = L"""
Area under $1/\sqrt{x}$ over $[a,b]$ increases as $a$ gets closer to $0$. Will it grow unbounded or have a limit?
@@ -133,7 +139,7 @@ The limit is infinite, so does not exist except in an extended sense.
Before showing this, we recall the fundamental theorem of calculus. The limit existing is the same as saying the limit of $F(M) - F(a)$ exists for an antiderivative of $f(x)$.
For this particular problem, it can be shown by integration by parts that for positive, integer values of $n$ that an antiderivative exists of the form $F(x) = p(x)e^{-x}$, where $p(x)$ is a polynomial of degree $n$. But we've seen that for any $n>0$, $\lim_{x \rightarrow \infty} x^n e^{-x} = 0$, so the same is true for any polynomial. So, $\lim_{M \rightarrow \infty} F(M) - F(1) = -F(1)$.
For this particular problem, it can be shown with integration by parts that for positive, integer values of $n$ that an antiderivative exists of the form $F(x) = p(x)e^{-x}$, where $p(x)$ is a polynomial of degree $n$. But we've seen that for any $n>0$, $\lim_{x \rightarrow \infty} x^n e^{-x} = 0,$ so the same is true for any polynomial. So, $\lim_{M \rightarrow \infty} F(M) - F(1) = -F(1)$.
* The function $e^x$ is integrable over $(-\infty, a]$ but not
@@ -175,6 +181,87 @@ As $M$ goes to $\infty$, this will converge to $1$.
limit(sympy.Si(M), M => oo)
```
##### Example
To formally find the limit as $x\rightarrow \infty$ of
$$
\text{Si}(x) = \int_0^\infty \frac{\sin(t)}{t} dt
$$
we introduce a trick and rely on some theorems that have not been discussed.
First, we notice that $\Si(x)$ is the value of $I(\alpha)$ when $\alpha=0$ where
$$
I(\alpha) = \int_0^\infty \exp(-\alpha t) \frac{\sin(t)}{t} dt
$$
We differentiate $I$ in $\alpha$ to get:
$$
\begin{align*}
I'(\alpha) &= \frac{d}{d\alpha} \int_0^\infty \exp(-\alpha t) \frac{\sin(t)}{t} dt \\
&= \int_0^\infty \frac{d}{d\alpha} \exp(-\alpha t) \frac{\sin(t)}{t} dt \\
&= \int_0^\infty (-t) \exp(-\alpha t) \frac{\sin(t)}{t} dt \\
&= -\int_0^\infty \exp(-\alpha t) \sin(t) dt \\
\end{align*}
$$
As illustrated previously, this integral can be integrated by parts, though here we have infinite limits and have adjusted for the minus sign:
$$
\begin{align*}
-I'(\alpha) &= \int_0^\infty \exp(-\alpha t) \sin(t) dt \\
&=\sin(t) \frac{-\exp(-\alpha t)}{\alpha} \Big|_0^\infty -
\int_0^\infty \frac{-\exp(-\alpha t)}{\alpha} \cos(t) dt \\
&= 0 + \frac{1}{\alpha} \cdot \int_0^\infty \exp(-\alpha t) \cos(t) dt \\
&= \frac{1}{\alpha} \cdot \cos(t)\frac{-\exp(-\alpha t)}{\alpha} \Big|_0^\infty -
\frac{1}{\alpha} \cdot \int_0^\infty \frac{-\exp(-\alpha t)}{\alpha} (-\sin(t)) dt \\
&= \frac{1}{\alpha^2} - \frac{1}{\alpha^2} \cdot \int_0^\infty \exp(-\alpha t) \sin(t) dt
\end{align*}
$$
Combining gives:
$$
\left(1 + \frac{1}{\alpha^2}\right) \int_0^\infty \exp(-\alpha t) \sin(t) dt = \frac{1}{\alpha^2}
$$
Solving gives the desired integral as
$$
I'(\alpha) = -\frac{1}{\alpha^2} / (1 + \frac{1}{\alpha^2}) = -\frac{1}{1 + \alpha^2}.
$$
This has a known antiderivative: $I(\alpha) = -\tan^{-1}(\alpha) + C$. As $\alpha \rightarrow \infty$ *if* we can pass the limit *inside* the integral, then $I(\alpha) \rightarrow 0$. So $\lim_{x \rightarrow \infty} -\tan^{-1}(x) + C = 0$ or $C = \pi/2$.
As our question is answered by $I(0)$, we get $I(0) = \tan^{-1}(0) + C = C = \pi/2$.
The above argument requires two places where a *limit* is passed inside the integral. The first involved the derivative. The [Leibniz integral rule](https://en.wikipedia.org/wiki/Leibniz_integral_rule) can be used to verify the first use is valid:
:::{.callout-note icon=false}
## Leibniz integral rule
If $f(x,t)$ and the derivative in $x$ for a fixed $t$ is continuous (to be discussed later) in a region containing $a(x) \leq t \leq b(x)$ and $x_0 < x < x_1$ and both $a(x)$ and $b(x)$ are continuously differentiable, then
$$
\frac{d}{dx}\int_{a(x)}^{b(x)} f(x, t) dt =
\int_{a(x)}^{b(x)} \frac{d}{dx}f(x,t) dt +
f(x, b(x)) \frac{d}{dx}b(x) - f(x, a(x)) \frac{d}{dx}a(x).
$$
:::
This extends the fundamental theorem of calculus for cases where the integrand also depends on $x$. In our use, both $a'(x)$ and $b'(x)$ are $0$.
[Uniform convergence](https://en.wikipedia.org/wiki/Uniform_convergence) can be used to establish the other.
### Numeric integration

View File

@@ -1,4 +1,4 @@
# Integration By Parts
# Integration by parts
{{< include ../_common_code.qmd >}}
@@ -39,13 +39,13 @@ Now we turn our attention to the implications of the *product rule*: $[uv]' = u'
By the fundamental theorem of calculus:
$$
[u(x)\cdot v(x)]\big|_a^b = \int_a^b [u(x) v(x)]' dx = \int_a^b u'(x) \cdot v(x) dx + \int_a^b u(x) \cdot v'(x) dx.
[u(x)\cdot v(x)]\Big|_a^b = \int_a^b [u(x) v(x)]' dx = \int_a^b u'(x) \cdot v(x) dx + \int_a^b u(x) \cdot v'(x) dx.
$$
Or,
$$
\int_a^b u(x) v'(x) dx = [u(x)v(x)]\big|_a^b - \int_a^b v(x) u'(x)dx.
\int_a^b u(x) v'(x) dx = [u(x)v(x)]\Big|_a^b - \int_a^b v(x) u'(x)dx.
$$
:::
@@ -58,16 +58,16 @@ The following visually illustrates integration by parts:
#| label: fig-integration-by-parts
#| fig-cap: "Integration by parts figure ([original](http://en.wikipedia.org/wiki/Integration_by_parts#Visualization))"
let
## parts picture
## parts picture
gr()
u(x) = sin(x*pi/2)
v(x) = x
xs = range(0, stop=1, length=50)
a,b = 1/4, 3/4
p = plot(u, v, 0, 1, legend=false, axis=([], false))
plot!([0, u(1)], [0,0], line=(:black, 3))
plot!([0, 0], [0, v(1) ], line=(:black, 3))
plot!(p, zero, 0, 1)
p = plot(u, v, 0, 1; legend=false, axis=([], false), line=(:black,2))
plot!([0, u(1)], [0,0]; line=(:gray, 1), arrow=true, side=:head)
plot!([0, 0], [0, v(1) ]; line=(:gray, 1), arrow=true, side=:head)
xs = range(a, b, length=50)
plot!(Shape(vcat(u.(xs), reverse(u.(xs))),
@@ -81,21 +81,28 @@ plot!(p, [u(a),u(a),0, 0, u(b),u(b),u(a)],
[0, v(a), v(a), v(b), v(b), 0, 0],
linetype=:polygon, fill=(:brown3, 0.25))
annotate!(p, [(0.65, .25, "A"),
(0.4, .55, "B"),
(u(a),v(a) + .08, "(u(a),v(a))"),
(u(b),v(b)+.08, "(u(b),v(b))"),
(u(a),0, "u(a)",:top),
(u(b),0, "u(b)",:top),
(0, v(a), "v(a) ",:right),
(0, v(b), "v(b) ",:right)
annotate!(p, [(0.65, .25, text(L"A")),
(0.4, .55, text(L"B")),
(u(a),v(a), text(L"(u(a),v(a))", :bottom, :right)),
(u(b),v(b), text(L"(u(b),v(b))", :bottom, :right)),
(u(a),0, text(L"u(a)", :top)),
(u(b),0, text(L"u(b)", :top)),
(0, v(a), text(L"v(a)", :right)),
(0, v(b), text(L"v(b)", :right)),
(0,0, text(L"(0,0)", :top))
])
end
```
```{julia}
#| echo: false
plotly()
nothing
```
@fig-integration-by-parts shows a parametric plot of $(u(t),v(t))$ for $a \leq t \leq b$..
The total shaded area, a rectangle, is $u(b)v(b)$, the area of $A$ and $B$ combined is just $u(b)v(b) - u(a)v(a)$ or $[u(x)v(x)]\big|_a^b$. We will show that $A$ is $\int_a^b v(x)u'(x)dx$ and $B$ is $\int_a^b u(x)v'(x)dx$ giving the formula.
The total shaded area, a rectangle, is $u(b)v(b)$, the area of $A$ and $B$ combined is just $u(b)v(b) - u(a)v(a)$ or $[u(x)v(x)]\Big|_a^b$. We will show that $A$ is $\int_a^b v(x)u'(x)dx$ and $B$ is $\int_a^b u(x)v'(x)dx$ giving the formula.
We can compute $A$ by a change of variables with $x=u^{-1}(t)$ (so $u'(x)dx = dt$):
@@ -109,6 +116,7 @@ $$
$B$ is similar with the roles of $u$ and $v$ reversed.
---
Informally, the integration by parts formula is sometimes seen as $\int udv = uv - \int v du$, as well can be somewhat confusingly written as:
@@ -131,10 +139,10 @@ Consider the integral $\int_0^\pi x\sin(x) dx$. If we let $u=x$ and $dv=\sin(x)
$$
\begin{align*}
\int_0^\pi x\sin(x) dx &= \int_0^\pi u dv\\
&= uv\big|_0^\pi - \int_0^\pi v du\\
&= x \cdot (-\cos(x)) \big|_0^\pi - \int_0^\pi (-\cos(x)) dx\\
&= uv\Big|_0^\pi - \int_0^\pi v du\\
&= x \cdot (-\cos(x)) \Big|_0^\pi - \int_0^\pi (-\cos(x)) dx\\
&= \pi (-\cos(\pi)) - 0(-\cos(0)) + \int_0^\pi \cos(x) dx\\
&= \pi + \sin(x)\big|_0^\pi\\
&= \pi + \sin(x)\Big|_0^\pi\\
&= \pi.
\end{align*}
$$
@@ -166,8 +174,8 @@ Putting together gives:
$$
\begin{align*}
\int_1^2 x \log(x) dx
&= (\log(x) \cdot \frac{x^2}{2}) \big|_1^2 - \int_1^2 \frac{x^2}{2} \frac{1}{x} dx\\
&= (2\log(2) - 0) - (\frac{x^2}{4})\big|_1^2\\
&= (\log(x) \cdot \frac{x^2}{2}) \Big|_1^2 - \int_1^2 \frac{x^2}{2} \frac{1}{x} dx\\
&= (2\log(2) - 0) - (\frac{x^2}{4})\Big|_1^2\\
&= 2\log(2) - (1 - \frac{1}{4}) \\
&= 2\log(2) - \frac{3}{4}.
\end{align*}
@@ -204,7 +212,7 @@ Were this a definite integral problem, we would have written:
$$
\int_a^b \log(x) dx = (x\log(x))\big|_a^b - \int_a^b dx = (x\log(x) - x)\big|_a^b.
\int_a^b \log(x) dx = (x\log(x))\Big|_a^b - \int_a^b dx = (x\log(x) - x)\Big|_a^b.
$$
##### Example
@@ -214,14 +222,14 @@ Sometimes integration by parts is used two or more times. Here we let $u=x^2$ an
$$
\int_a^b x^2 e^x dx = (x^2 \cdot e^x)\big|_a^b - \int_a^b 2x e^x dx.
\int_a^b x^2 e^x dx = (x^2 \cdot e^x)\Big|_a^b - \int_a^b 2x e^x dx.
$$
But we can do $\int_a^b x e^xdx$ the same way:
$$
\int_a^b x e^x = (x\cdot e^x)\big|_a^b - \int_a^b 1 \cdot e^xdx = (xe^x - e^x)\big|_a^b.
\int_a^b x e^x = (x\cdot e^x)\Big|_a^b - \int_a^b 1 \cdot e^xdx = (xe^x - e^x)\Big|_a^b.
$$
Combining gives the answer:
@@ -229,8 +237,8 @@ Combining gives the answer:
$$
\int_a^b x^2 e^x dx
= (x^2 \cdot e^x)\big|_a^b - 2( (xe^x - e^x)\big|_a^b ) =
e^x(x^2 - 2x + 2) \big|_a^b.
= (x^2 \cdot e^x)\Big|_a^b - 2( (xe^x - e^x)\Big|_a^b ) =
e^x(x^2 - 2x + 2) \Big|_a^b.
$$
In fact, it isn't hard to see that an integral of $x^m e^x$, $m$ a positive integer, can be handled in this manner. For example, when $m=10$, `SymPy` gives:
@@ -247,14 +255,29 @@ The general answer is $\int x^n e^xdx = p(x) e^x$, where $p(x)$ is a polynomial
##### Example
The same technique is attempted for this integral, but ends differently. First in the following we let $u=\sin(x)$ and $dv=e^x dx$:
The same technique is attempted for the integral of $e^x\sin(x)$, but ends differently.
First we let $u=\sin(x)$ and $dv=e^x dx$, then
$$
du = \cos(x)dx \quad \text{and}\quad v = e^x.
$$
So:
$$
\int e^x \sin(x)dx = \sin(x) e^x - \int \cos(x) e^x dx.
$$
Now we let $u = \cos(x)$ and again $dv=e^x dx$:
Now we let $u = \cos(x)$ and again $dv=e^x dx$, then
$$
du = -\sin(x)dx \quad \text{and}\quad v = e^x.
$$
So:
$$
@@ -301,7 +324,7 @@ $$
This is called a reduction formula as it reduces the problem from an integral with a power of $n$ to one with a power of $n - 2$, so could be repeated until the remaining indefinite integral required knowing either $\int \cos(x) dx$ (which is $-\sin(x)$) or $\int \cos(x)^2 dx$, which by a double angle formula application, is $x/2 + \sin(2x)/4$.
`SymPy` is quite able to do this repeated bookkeeping. For example with $n=10$:
`SymPy` is able and willing to do this repeated bookkeeping. For example with $n=10$:
```{julia}
@@ -350,7 +373,7 @@ Using right triangles to simplify, the last value $\cos(\sin^{-1}(x))$ can other
The [trapezoid](http://en.wikipedia.org/wiki/Trapezoidal_rule) rule is an approximation to the definite integral like a Riemann sum, only instead of approximating the area above $[x_i, x_i + h]$ by a rectangle with height $f(c_i)$ (for some $c_i$), it uses a trapezoid formed by the left and right endpoints. That is, this area is used in the estimation: $(1/2)\cdot (f(x_i) + f(x_i+h)) \cdot h$.
Even though we suggest just using `quadgk` for numeric integration, estimating the error in this approximation is still of some theoretical interest.
Even though we suggest just using `quadgk` for numeric integration, estimating the error in this approximation is of theoretical interest.
Recall, just using *either* $x_i$ or $x_{i-1}$ for $c_i$ gives an error that is "like" $1/n$, as $n$ gets large, though the exact rate depends on the function and the length of the interval.
@@ -359,18 +382,18 @@ Recall, just using *either* $x_i$ or $x_{i-1}$ for $c_i$ gives an error that is
This [proof](http://www.math.ucsd.edu/~ebender/20B/77_Trap.pdf) for the error estimate is involved, but is reproduced here, as it nicely integrates many of the theoretical concepts of integration discussed so far.
First, for convenience, we consider the interval $x_i$ to $x_i+h$. The actual answer over this is just $\int_{x_i}^{x_i+h}f(x) dx$. By a $u$-substitution with $u=x-x_i$ this becomes $\int_0^h f(t + x_i) dt$. For analyzing this we integrate once by parts using $u=f(t+x_i)$ and $dv=dt$. But instead of letting $v=t$, we choose to add - as is our prerogative - a constant of integration $A$, so $v=t+A$:
First, for convenience, we consider the interval $x_i$ to $x_i+h$. The actual answer over this is just $\int_{x_i}^{x_i+h}f(x) dx$. By a $u$-substitution with $u=x-x_i$ this becomes $\int_0^h f(t + x_i) dt$. For analyzing this we integrate once by parts using $u=f(t+x_i)$ and $dv=dt$. But instead of letting $v=t$, we choose to add---as is our prerogative---a constant of integration $A$, so $v=t+A$:
$$
\begin{align*}
\int_0^h f(t + x_i) dt &= uv \big|_0^h - \int_0^h v du\\
&= f(t+x_i)(t+A)\big|_0^h - \int_0^h (t + A) f'(t + x_i) dt.
\int_0^h f(t + x_i) dt &= uv \Big|_0^h - \int_0^h v du\\
&= f(t+x_i)(t+A)\Big|_0^h - \int_0^h (t + A) f'(t + x_i) dt.
\end{align*}
$$
We choose $A$ to be $-h/2$, any constant is possible, for then the term $f(t+x_i)(t+A)\big|_0^h$ becomes $(1/2)(f(x_i+h) + f(x_i)) \cdot h$, or the trapezoid approximation. This means, the error over this interval - actual minus estimate - satisfies:
We choose $A$ to be $-h/2$, any constant is possible, for then the term $f(t+x_i)(t+A)\Big|_0^h$ becomes $(1/2)(f(x_i+h) + f(x_i)) \cdot h$, or the trapezoid approximation. This means, the error over this interval - actual minus estimate - satisfies:
$$
@@ -392,7 +415,7 @@ Again we added a constant of integration, $B$, to $v$. The error becomes:
$$
\text{error}_i = -(\frac{(t+A)^2}{2} + B)f'(t+x_i)\big|_0^h + \int_0^h (\frac{(t+A)^2}{2} + B) \cdot f''(t+x_i) dt.
\text{error}_i = -\left(\frac{(t+A)^2}{2} + B\right)f'(t+x_i)\Big|_0^h + \int_0^h \left(\frac{(t+A)^2}{2} + B\right) \cdot f''(t+x_i) dt.
$$
With $A=-h/2$, $B$ is chosen so $(t+A)^2/2 + B = 0$ at endpoints, or $B=-h^2/8$. The error becomes
@@ -406,14 +429,14 @@ Now, we assume the $\lvert f''(t)\rvert$ is bounded by $K$ for any $a \leq t \le
$$
\lvert \text{error}_i \rvert \leq K \int_0^h \lvert (\frac{(t-h/2)^2}{2} - \frac{h^2}{8}) \rvert dt.
\lvert \text{error}_i \rvert \leq K \int_0^h \lVert \left(\frac{(t-h/2)^2}{2} - \frac{h^2}{8}\right) \rVert dt.
$$
But what is the function in the integrand? Clearly it is a quadratic in $t$. Expanding gives $1/2 \cdot (t^2 - ht)$. This is negative over $[0,h]$ (and $0$ at these endpoints, so the integral above is just:
$$
\frac{1}{2}\int_0^h (ht - t^2)dt = \frac{1}{2} (\frac{ht^2}{2} - \frac{t^3}{3})\big|_0^h = \frac{h^3}{12}
\frac{1}{2}\int_0^h (ht - t^2)dt = \frac{1}{2} \left(\frac{ht^2}{2} - \frac{t^3}{3}\right)\Big|_0^h = \frac{h^3}{12}
$$
This gives the bound: $\vert \text{error}_i \rvert \leq K h^3/12$. The *total* error may be less, but is not more than the value found by adding up the error over each of the $n$ intervals. As our bound does not depend on the $i$, we have this sum satisfies:

View File

@@ -15,7 +15,8 @@ files = (
"center_of_mass",
"volumes_slice",
"arc_length",
"surface_area",
"surface_area",
"orthogonal_polynomials",
"twelve-qs",
)

View File

@@ -144,10 +144,10 @@ $$
So in particular $K$ is in $[m, M]$. But $m$ and $M$ correspond to values of $f(x)$, so by the intermediate value theorem, $K=f(c)$ for some $c$ that must lie in between $c_m$ and $c_M$, which means as well that it must be in $[a,b]$.
##### Proof of second part of Fundamental Theorem of Calculus
##### Proof of the second part of the Fundamental Theorem of Calculus
The mean value theorem is exactly what is needed to prove formally the second part of the Fundamental Theorem of Calculus. Again, suppose $f(x)$ is continuous on $[a,b]$ with $a < b$. For any $a < x < b$, we define $F(x) = \int_a^x f(u) du$. Then the derivative of $F$ exists and is $f$.
The mean value theorem is exactly what is needed to formally prove the second part of the Fundamental Theorem of Calculus. Again, suppose $f(x)$ is continuous on $[a,b]$ with $a < b$. For any $a < x < b$, we define $F(x) = \int_a^x f(u) du$. Then the derivative of $F$ exists and is $f$.
Let $h>0$. Then consider the forward difference $(F(x+h) - F(x))/h$. Rewriting gives:

View File

@@ -0,0 +1,724 @@
# Orthogonal polynomials
{{< include ../_common_code.qmd >}}
This section uses these add-on packages:
```{julia}
using SymPy
using QuadGK
using Roots
using ForwardDiff: derivative
```
This section takes a detour to give some background on why the underlying method of `quadgk` is more efficient than those of Riemann sums. Orthogonal polynomials play a key role. There are many families of such polynomials. We highlight two.
## Inner product
Define an operation between two integrable, real-valued functions $f(x)$ and $g(x)$ by:
$$
\langle f, g \rangle = \int_{-1}^1 f(x)g(x) dx
$$
The properties of the integral mean this operation satisfies these three main properties:
* symmetry: $\langle f, g \rangle = \langle g,f \rangle$
* positive definiteness: $\langle f, f \rangle > 0$ *unless* $f(x)=0$.
* linearity: if $a$ and $b$ are scalars, then $\langle af + bg, h \rangle = a\langle f, h \rangle + b \langle g, h \rangle$.
The set of integrable functions forms a *vector space*, which simply means two such functions can be added to yield another integrable function and an integrable function times a scalar is still an integrable function. Many different collections of objects form a vector space. In particular, other sets of functions form a vector space, for example the collection of polynomials of degree $n$ or less or just the set of all polynomials.
For a vector space, an operation like the above satisfying these three properties is called an *inner product*; the combination of an inner product and a vector space is called an *inner product space*. In the following, we assume $f$ and $g$ are from a vector space with a real-valued inner product.
Inner products introduce a sense of size through a *norm*:
$\lVert f \rVert = \sqrt{\langle f, f\rangle }$.
Norms satisfy two main properties:
* scalar: $\lVert af \rVert = |a|\lVert f\rVert$
* triangle inequality: $\lVert f + g \rVert \leq \lvert f \rVert + \lVert g \rVert$
Two elements of an inner product space, $f$ and $g$, are *orthogonal* if $\langle f, g \rangle = 0$. This is a generalization of perpendicular. The Pythagorean theorem for orthogonal elements holds: $\lVert f\rVert^2 + \lVert g\rVert^2 = \lVert f+g\rVert^2$.
As we assume a real-valued inner product, the angle between two elements can be defined by:
$$
\angle(f,g) = \cos^{-1}\left(\frac{\langle f, g\rangle}{\lVert f \rVert \lVert g \rVert}\right).
$$
This says, the angle between two orthogonal elements is $90$ degrees (in some orientation)
The Cauchy-Schwarz inequality, $|\langle f, g \rangle| \leq \lVert f \rVert \lVert g\rVert$, for an inner product space, ensures the argument to $\cos^{-1}$ is between $-1$ and $1$.
These properties generalize two-dimensional vectors, with components $\langle x, y\rangle$. Recall, these can be visualized by placing a tail at the origin and a tip at the point $(x,y)$. Such vectors can be added by placing the tail of one at the tip of the other and using the vector from the other tail to the other tip.
With this, we have a vector anchored at the origin can be viewed as a line segment with slope $y/x$ (rise over run). A perpendicular line segment would have slope $-x/y$ (the negative reciprocal) which would be associated with the vector $\langle y, -x \rangle$. The dot product is just the sum of the multiplied components, or for these two vectors $x\cdot y + y\cdot (-x)$, which is $0$, as the line segments are perpendicular (orthogonal).
Consider now two vectors, say $f$, $g$. We can make a new vector that is orthogonal to $f$ by combining $g$ with a piece of $f$. But what piece?
Consider this
$$
\begin{align*}
\langle f, g - \frac{\langle f,g\rangle}{\langle f, f\rangle} f \rangle
&= \langle f, g \rangle - \langle f, \frac{\langle f,g\rangle}{\langle f, f\rangle} f \rangle \\
&= \langle f, g \rangle - \frac{\langle f,g\rangle}{\langle f, f\rangle}\langle f,f \rangle \\
&= \langle f, g \rangle - \langle f, g \rangle = 0
\end{align*}
$$
Define
$$
proj_f(g) = \frac{\langle f,g\rangle }{\langle f, f\rangle} f,
$$
then we have $u_1 = f$ and $u_2 = g-proj_f(g)$, $u_1$ and $u_2$ are orthogonal.
A similar calculation shows if $h$ is added to the set of elements, then
$u_3 = h - proj_{u_1}(h) - proj_{u_2}(h)$ will be orthogonal to $u_1$ and $u_2$. etc.
This process, called the [Gram-Schmidt](https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process) process, can turn any set of vectors into a set of orthogonal vectors, assuming they are all non zero and no non-trivial linear combination makes them zero.
## Legendre
Consider now polynomials of degree $n$ or less with the normalization that $p(1) = 1$. We begin with two such polynomials: $u_0(x) = 1$ and $u_1(x) = x$.
These are orthogonal with respect to $\int_{-1}^1 f(x) g(x) dx$, as
$$
\int_{-1}^1 u_0(x) u_1(x) dx =
\int_{-1}^1 1 \cdot x dx =
x^2 \mid_{-1}^1 = 1^2 - (-1)^2 = 0.
$$
Now consider a quadratic polynomial, $u_2(x) = ax^2 + bx + c$, we want a polynomial which is orthogonal to $u_0$ and $u_1$ with the extra condition that $u_2(1) = c =1$ (or $c=1$.). We can do this using Gram-Schmidt as above, or as here through a system of two equations:
```{julia}
@syms a b c d x
u0 = 1
u1 = x
u2 = a*x^2 + b*x + c
eqs = (integrate(u0 * u2, (x, -1, 1)) ~ 0,
integrate(u1 * u2, (x, -1, 1)) ~ 0)
sols = solve(eqs, (a, b, c)) # b => 0, a => -3c
u2 = u2(sols...)
u2 = simplify(u2 / u2(x=>1)) # make u2(1) = 1 and fix c
```
The quadratic polynomial has $3$ unknowns and the orthgonality conditions give two equations. Solving these equations leaves one unknown (`c`). But the normalization condition (that $u_i(1) = 1$) allows `c` to be simplified out.
We can do this again with $u_3$:
```{julia}
u3 = a*x^3 + b*x^2 + c*x + d
eqs = (integrate(u0 * u3, (x, -1, 1)) ~ 0,
integrate(u1 * u3, (x, -1, 1)) ~ 0,
integrate(u2 * u3, (x, -1, 1)) ~ 0)
sols = solve(eqs, (a, b, c, d)) # a => -5c/3, b=>0, d=>0
u3 = u3(sols...)
u3 = simplify(u3/u3(x=>1)) # make u3(1) = 1
```
In theory, this can be continued up until any $n$. The resulting
polynomials are called the
[Legendre](https://en.wikipedia.org/wiki/Legendre_polynomials)
polynomials.
Rather than continue this, we develop easier means to generate these polynomials.
## General weight function
Let $w(x)$ be some non-negative function and consider the new inner product between two polynomials:
$$
\langle p, q\rangle = \int_I p(x) q(x) w(x) dx
$$
where $I$ is an interval and $w(x)$ is called a weight function. In the above discussion $I=[-1,1]$ and $w(x) = 1$.
Suppose we have *orthogonal* polynomials $p_i(x)$, $i=0,1, \dots, n$, where $p_i$ is a polynomial of degree $i$ ($p_i(x) = k_i x^i + \cdots$, where $k_i \neq 0$), and
$$
\langle p_m, p_n \rangle =
\int_I p_m(x) p_n(x) w(x) dx =
\begin{cases}
0 & m \neq n\\
h_m & m = n
\end{cases}
$$
Unique elements can be defined by specifying some additional property. For Legendre, it was $p_n(1)=1$, for other orthogonal families this may be specified by having leading coefficient of $1$ (monic), or a norm of $1$ (orthonormal), etc.
The above is the *absolutely continuous* case, generalizations of the integral allow this to be more general.
Orthogonality can be extended: If $q(x)$ is any polynomial of degree $m < n$, then
$\langle q, p_n \rangle = \int_I q(x) p_n(x) w(x) dx = 0$. (See the questions for more detail.)
Some names used for the characterizing constants are:
* $p_n(x) = k_n x^n + \cdots$ ($k_n$ is the leading term)
* $h_n = \langle p_n, p_n\rangle$
### Three-term reccurence
Orthogonal polynomials, as defined above through a weight function, satisfy a *three-term recurrence*:
$$
p_{n+1}(x) = (A_n x + B_n) p_n(x) - C_n p_{n-1}(x),
$$
where $n \geq 0$ and $p_{n-1}(x) = 0$.
(With this and knowledge of $A_n$, $B_n$, and $C_n$, the polynomials can be recursively generated from just specifying a value for the constant $p_0(x)$.
First, $p_{n+1}$ has leading term $k_{n+1}x^{n+1}$. Looking on the right hand side for the coefficient of $x^{n+1}$ we find $A_n k_n$, so $A_n = k_{n+1}/k_n$.
Next, we look at $u(x) = p_{n+1}(x) - A_n x p_n(x)$, a polynomial of degree $n$ or less.
As this has degree $n$ or less, it can be expressed in terms of $p_0, p_1, \dots, p_n$. Write it as $u(x) = \sum_{j=0}^n d_j p_j(x)$. Now, take any $m < n-1$ and consider $p_m$. We consider the inner product of $u$ and $p_m$ two ways:
$$
\begin{align*}
\int_I p_m(x) u(x) w(x) dx &=
\int_I p_m(x) \sum_{j=0}^n p_j(x) w(x) dx \\
&= \int_I p_m(x) \left(p_m(x) + \textcolor{red}{\sum_{j=0, j\neq m}^{n} p_j(x)}\right) w(x) dx \\
&= \int_I p_m(x) p_m(x) w(x) dx = h_m
\end{align*}
$$
As well
$$
\begin{align*}
\int_I p_m(x) u(x) w(x) dx
&= \int_I p_m(x) (p_{n+1}(x) - A_n x p_n(x)) w(x) dx \\
&= \int_I p_m(x) \textcolor{red}{p_{n+1}(x)} w(x) dx - \int_I p_m(x) A_n x p_n(x) w(x) dx\\
&= 0 - A_n \int_I (\textcolor{red}{x p_m(x)}) p_n(x) w(x) dx\\
&= 0
\end{align*}
$$
The last integral being $0$ as $xp_m(x)$ has degree $n-1$ or less and hence is orthogonal to $p_n$.
That is $p_{n+1} - A_n x p_n(x) = d_n p_n(x) + d_{n-1} p_{n-1}(x)$. Setting $B_n=d_n$ and $C_{n-1} = -d_{n-1}$ shows the three-term recurrence applies.
#### Example: Legendre polynomials
With this notation, the Legendre polynomials have:
$$
\begin{align*}
w(x) &= 1\\
I &= [-1,1]\\
A_n &= \frac{2n+1}{n+1}\\
B_n &= 0\\
C_n & = \frac{n}{n+1}\\
k_{n+1} &= \frac{2n+1}{n+1}k_n - \frac{n}{n-1}k_{n-1}, k_1=k_0=1\\
h_n &= \frac{2}{2n+1}
\end{align*}
$$
#### Favard theorem
In an efficient review of the subject, [Koornwinder](https://arxiv.org/pdf/1303.2825) states conditions on the recurrence that ensure that if a $n$-th degree polynomials $p_n$ satisfy a three-term recurrence, then there is an associated weight function (suitably generalized). The conditions use this form of a three-term recurrence:
$$
\begin{align*}
xp_n(x) &= a_n p_{n+1}(x) + b_n p_n(x) + c_n p_{n-1}(x),\quad (n > 0)\\
xp_0(x) &= a_0 p_1(x) + b_0 p_0(x)
\end{align*}
$$
where the constants are real and $a_n c_{n+1} > 0$. These force $a_n = k_n/k_{n+1}$ and $c_n/h_{n+1} = a_n/h_n$
#### Clenshaw algorithm
When introducing polynomials, the synthetic division algorithm was given to compute $p(x) / (x-r)$. This same algorithm also computed $p(r)$ efficiently and is called Horner's method. The `evalpoly` method in `Julia`'s base implements this.
For a set of polynomials $p_0(x), p_1(x), \dots, p_n(x)$ satisfying a three-term recurrence $p_{n+1}(x) = (A_n x + B_n) p_n(x) - C_n p_{n-1}(x)$, the Clenshaw algorithm gives an efficient means to compute an expression of a linear combination of the polynomials, $q(x) = a_0 p_0(x) + a_1 p_1(x) + \cdots + a_n p_n(x)$.
The [method](https://en.wikipedia.org/wiki/Clenshaw_algorithm) uses a reverse recurrence formula starting with
$$
b_{n+1}(x) = b_{n+2}(x) = 0
$$
and then computing for $k = n, n-1, \dots, 1$
$$
b_k(x) = a_k + (A_k x + B_k) b_{k+1}(x) - C_k b_{k+2}(x)
$$
Finally finishing by computing $a_0 p_0(x) + b_1 p_1(x) - C(1) p_0(x) b_2$.
For example, with the Legendre polynomials, we have
```{julia}
A(n) = (2n+1)//(n+1)
B(n) = 0
C(n) = n // (n+1)
```
Say we want to compute $a_0 u_0(x) + a_1 u_1(x) + a_2 u_2(x) + a_3 u_3(x) + a_4 u_4(x)$. The necessary inputs are the coefficients, the value of $x$, and polynomials $p_0$ and $p_1$.
```{julia}
function clenshaw(x, as, p0, p1)
n = length(as) - 1
bn1, bn2 = 0, 0
a(k) = as[k + 1] # offset
for k in n:-1:1
bn1, bn2 = a(k) + (A(k) * x + B(k)) * bn1 - C(k+1) * bn2, bn1
end
b1, b2 = bn1, bn2
p0(x) * a(0) + p1(x) * b1 - C(1) * p0(x) * b2
end
```
This function can be purposed to generate additional Legendre polynomials. For example, to compute $u_4$ we pass in a symbolic value for $x$ and mask out all by $a_4$ in our coefficients:
```{julia}
p₀(x) = 1
p₁(x) = x # Legendre
@syms x
clenshaw(x, (0,0,0,0,1), p₀, p₁) |> expand |> simplify
```
:::{.callout-note}
### Differential equations approach
A different description of families of orthogonal polynomials is that they satisfy a differential equation of the type
$$
\sigma(x) y''(x) + \tau(x) y'(x) + \lambda_n y(x) = 0,
$$
where $\sigma(x) = ax^2 + bx + c$, $\tau(x) = dx + e$, and $\lambda_n = -(a\cdot n(n-1) + dn)$.
With this parameterization, values for $A_n$, $B_n$, and $C_n$ can be given in terms of the leading coefficient, $k_n$ (cf. [Koepf and Schmersau](https://arxiv.org/pdf/math/9612224)):
$$
\begin{align*}
A_n &= \frac{k_{n+1}}{k_n}\\
B_n &= \frac{k_{n+1}}{k_n} \frac{2bn(a(n-1)+d) + e(d-2a)}{(2a(n-1) + d)(2an+d)}\\
C_n &= \frac{k_{n+1}}{k_{n-1}}
\frac{n(a(n-1) + d)(a(n-2)+d)(n(an+d))(4ac-b^2)+ae^2+cd^2-bde}{
(a(n-1)+d)(a(2n-1)+d)(a(2n-3)+d)(2a(n-1)+d)^2}
\end{align*}
$$
There are other relations between derivatives and the orthogonal polynomials. For example, another three-term recurrence is:
$$
\sigma(x) p_n'(x) = (\alpha_n x + \beta_n)p_n(x) + \gamma_n p_{n-1}(x)
$$
The same reference has formulas for $\alpha$, $\beta$, and $\gamma$ in terms of $a,b,c,d$, and $e$ along with many others.
:::
## Chebyshev
The Chebyshev polynomials (of the first kind) satisfy the three-term recurrence
$$
T_{n+1}(x) = 2x T_n(x) - T_{n-1}(x)
$$
with $T_0(x)= 1$ and $T_1(x)=x$.
These polynomials have domain $(-1,1)$ and weight function $(1-x^2)^{-1/2}$.
(The Chebyshev polynomials of the second kind satisfy the same three-term recurrence but have $U_0(x)=1$ and $U_1(x)=2x$.)
These polynomials are related to trigonometry through
$$
T_n(\cos(\theta)) = \cos(n\theta)
$$
This characterization makes it easy to find the zeros of the
polynomial $T_n$, as they happen when $\cos(n\theta)$ is $0$, or when
$n\theta = \pi/2 + k\pi$ for $k$ in $0$ to $n-1$. Solving for $\theta$
and taking the cosine, we get the zeros of the $n$th degree polynomial
$T_n$ are $\cos(\pi(k + 1/2)/n)$ for $k$ in $0$ to $n-1$.
These evenly spaced angles lead to roots more concentrated at the edges of the interval $(-1,1)$.
##### Example
Chebyshev polynomials have a minimal property that makes them fundamental for use with interpolation.
Define the *infinity* norm over $[-1,1]$ to be the maximum value of the absolute value of the function over these values.
Let $f(x) = 2^{-n+1}T_n(x)$ be a monic version of the Chebyshev polynomial.
> If $q(x)$ is any monic polynomial of degree $n$, then the infinity norm of $q(x)$ is greater than or equal to that of $f$.
Using the trigonometric representation of $T_n$, we have
* $f(x)$ has infinity norm of $2^{-n+1}$ and these maxima occur at $x=\cos((k\pi)/n)$, where $0 \leq k \leq n$. (There is a cosine curve with known peaks, oscillating between $-1$ and $1$.)
* $f(x) > 0$ at $x = \cos((2k\pi)/n)$ for $0 \leq 2k \leq n$
* $f(x) < 0$ at $x = \cos(((2k+1)\pi)/n)$ for $0 \leq 2k+1 \leq n$
Suppose $w(x)$ is a monic polynomial of degree $n$ and suppose it has smaller infinity norm. Consider $u(x) = f(x) - w(x)$. At the extreme points of $f(x)$, we must have $|f(x)| \geq |w(x)|$. But this means
* $u(x) > 0$ at $x = \cos((2k\pi)/n)$ for $0 \leq 2k \leq n$
* $u(x) < 0$ at $x = \cos(((2k+1)\pi)/n)$ for $0 \leq 2k+1 \leq n$
As $u$ is continuous, this means there are at least $n$ sign changes, hence $n$ or more zeros. But as both $f$ and $w$ are monic, $u$ is of degree $n-1$, at most. This is a contradiction unless $u(x)$ is the zero polynomial, which it can't be by assumption.
### Integration
Recall, a Riemann sum can be thought of in terms of weights, $w_i$ and nodes $x_i$ for which $\int_I f(x) dx \approx \sum_{i=0}^{n-1} w_i f(x_i)$.
For a right-Riemann sum with partition given by $a_0 < a_1 < \cdots < a_n$ the nodes are $x_i = a_i$ and the weights are $w_i = (a_i - a_{i-1})$ (or in the evenly spaced case, $w_i = (a_n - a_0)/n$.
More generally, this type of expression can represent integrals of the type $\int_I f(x) w(x) dx$, with $w(x)$ as in an inner product. Call such a sum a Gaussian quadrature.
We will see that the zeros of orthogonal polynomials have special properties as nodes.
> For orthogonal polynomials over the interval $I$ with weight function $w(x)$, each $p_n$ has $n$ distinct real zeros in $I$.
Suppose that $p_n$ had only $k<n$ sign changes at $x_1, x_2, \dots, x_k$. Then for some choice of $\delta$, $(-1)^\delta p(x) (x-x_1)(x-x_2)\cdots(x-x_k) \geq 0$. Since this is non zero, it must be that
$$
(-1)^\delta \int_I p(x) \left( (x-x_1)(x-x_2)\cdots(x-x_k)\right) w(x) dx > 0
$$
But, the product is of degree $k < n$, so by orthogonality must be $0$. Hence, it can't be that $k < n$, so there must be $n$ sign changes in $I$ by $p_n$. Each corresponds to a zero, as $p_n$ is continuous.
This next statement says that using the zeros of $p_n$ for the nodes of Gaussian quadrature and appropriate weights that the quadrature is exact for higher degree polynomials.
> For a fixed $n$, suppose $p_0, p_1, \dots, p_n$ are orthogonal polynomials over $I$ with weight function $w(x)$. If the zeros of $p_n$ are the nodes $x_i$, then there exists $n$ weights so that the any polynomial of degree $2n-1$ or less, the Gaussian quadrature is exact.
That is if $q(x)$ is a polynomial with degree $2n-1$ or less, we have for some choice of $w_i$:
$$
\int_I q(x) w(x) dx = \sum_{i=1}^n w_i q(x_i)
$$
To compare, recall, Riemann sums ($1$-node) were exact for constant functions (degree $0$), the trapezoid rule ($2$-nodes) is exact for linear polynomials (degree $1$), and Simpson's rule ($3$ nodes) are exact for cubic polynomials (degree $3$).
We follow [Wikipedia](https://en.wikipedia.org/wiki/Gaussian_quadrature#Fundamental_theorem) to see this key fact.
Take $h(x)$ of degree $2n-1$ or less. Then by polynomial long division, there are polynomials $q(x)$ and $r(x)$ where
$$
h(x) = q(x) p_n(x) + r(x)
$$
and the degree of $r(x)$ is less than $n-1$, the degree of $p_n(x)$. Further, the degree of $q(x)$ is also less than $n-1$, as were it more, then the degree of $q(x)p_n(x)$ would be more than $n-1+n$ or $2n-1$. Let's note that if $x_i$ is a zero of $p_n(x)$ that $h(x_i)= r(x_i)$.
So
$$
\begin{align*}
\int_I h(x) w(x) dx &= \int_I \textcolor{red}{q(x)} p_n(x) w(x) dx + \int_I r(x) w(x)dx\\
&= 0 + \int r(x) w(x) dx.
\end{align*}
$$
Now consider the polynomials made from the zeros of $p_n(x)$
$$
l_i(x) = \prod_{j \ne i} \frac{x - x_j}{x_i - x_j}
$$
These are called Lagrange interpolating polynomials and have the property that $l_i(x_i) = 1$ and $l_i(x_j) = 0$ if $i \neq j$.
These allow the expression of
$$
\begin{align*}
r(x) &= l_1(x)r(x_1) + l_2(x) r(x_2) + \cdots + l_n(x) r(x_n) \\
&= \sum_{i=1}^n l_i(x) r(x_i)
\end{align*}
$$
This isn't obviously true, but this expression agrees with an at-most degree $n-1$ polynomial ($r(x)$) at $n$ points hence it must be the same polynomial.)
With this representation, the integral becomes
$$
\begin{align*}
\int_I h(x) w(x) dx &= \int_I r(x) w(x)dx \\
&= \int_I \sum_{i=1}^n l_i(x) r(x_i) w(x) dx\\
&= \sum_{i=1}^n r(x_i) \int_I l_i(x) w(x) dx \\
&= \sum_{i=1}^n r(x_i) w_i\\
&= \sum_{i=1}^n w_i h(x_i)
\end{align*}
$$
That is there are weights, $w_i = \int_I l_i(x) w(x) dx$, for which the integration is exactly found by Gaussian quadrature using the roots of $p_n$ as the nodes.
The general formula for the weights can be written in terms of the polynomials $p_i = k_ix^i + \cdots$:
$$
\begin{align*}
w_i &= \int_I l_i(x) w(x) dx \\
&= \frac{k_n}{k_{n-1}}
\frac{\int_I p_{n-1}(x)^2 w(x) dx}{p'_n(x_i) p_{n-1}(x_i)}.
\end{align*}
$$
To see this, consider:
$$
\begin{align*}
\prod_{j \neq i} (x - x_j) &=
\frac{\prod_j (x-x_j)}{x-x_i} \\
&= \frac{1}{k_n}\frac{k_n \prod_j (x - x_j)}{x - x_i} \\
&= \frac{1}{k_n} \frac{p_n(x)}{x-x_i}\\
&= \frac{1}{k_n} \frac{p_n(x) - p_n(x_i)}{x-x_i}\\
&\rightarrow \frac{p'_n(x_i)}{k_n}, \text{ as } x \rightarrow x_i.
\end{align*}
$$
Thus
$$
\prod_{j \neq i} (x_i - x_j) = \frac{p'_n(x_i)}{k_n}.
$$
This gives
$$
\begin{align*}
w_i &= \int_i \frac{k_n \prod_j (x-x_j)}{p'_n(x_i)} w(x) dx\\
&= \frac{1}{p'_n(x_i)} \int_i \frac{p_n(x)}{x-x_i} w(x) dx
\end{align*}
$$
To work on the last term, a trick (see the questions for detail) can show that for any $k \leq n$ that
$$
\int_I \frac{x^k p_n(x)}{x - x_i} w(x) dx
= x_i^k \int_I \frac{p_n(x)}{x - x_i} w(x) dx
$$
Hence for any degree $n$ or less polynomial: we have
$$
q(x_i) \int_I \frac{p_n(x)}{x - x_i} w(x) dx =
\int_I \frac{q(x) p_n(x)}{x - x_i} w(x) dx
$$.
We will use this for $p_{n-1}$. First, as $x_i$ is a zero of $p_n(x)$ we have
$$
\frac{p_n(x)}{x-x_i} = k_n x^{n-1}+ r(x),
$$
where $r(x)$ has degree $n-2$ at most. This is due to $p_n$ being divided by a monic polynomial, hence leaving a degree $n-1$ polynomial with leading coefficient $k_n$.
But then
$$
\begin{align*}
w_i &= \frac{1}{p'_n(x_i)} \int_I \frac{p_n(x)}{x-x_i} w(x) dx \\
&= \frac{1}{p'_n(x_i)} \frac{1}{p_{n-1}(x_i)} \int_I \frac{p_{n-1}(x) p_n(x)}{x - x_i} w(x) dx\\
&= \frac{1}{p'_n(x_i)p_{n-1}(x_i)} \int_I p_{n-1}(x)
(k_n x^{n-1} + \textcolor{red}{r(x)}) w(x) dx\\
&= \frac{k_n}{p'_n(x_i)p_{n-1}(x_i)} \int_I p_{n-1}(x) x^{n-1} w(x) dx\\
&= \frac{k_n}{p'_n(x_i)p_{n-1}(x_i)} \int_I p_{n-1}(x)
\left(
\textcolor{red}{\left(x^{n-1} - \frac{p_{n-1}(x)}{k_{n-1}}\right) }
+ \frac{p_{n-1}(x)}{k_{n-1}}\right) w(x) dx\\
&= \frac{k_n}{p'_n(x_i)p_{n-1}(x_i)} \int_I p_{n-1}(x)\frac{p_{n-1}(x)}{k_{n-1}} w(x) dx\\
&= \frac{k_n}{k_{n-1}} \frac{1}{p'_n(x_i)p_{n-1}(x_i)} \int_I p_{n-1}(x)^2 w(x) dx.
\end{align*}
$$
### Examples of quadrature formula
The `QuadGK` package uses a modification to Gauss quadrature to estimate numeric integrals. Let's see how. Behind the scenes, `quadgk` calls `kronrod` to compute nodes and weights.
We have from earlier that
```{julia}
u₃(x) = x*(5x^2 - 3)/2
u₄(x) = 35x^4 / 8 - 15x^2 / 4 + 3/8
```
```{julia}
xs = find_zeros(u₄, -1, 1)
```
From this we can compute the weights from the derived general formula:
```{julia}
k₃, k₄ = 5/2, 35/8
w(x) = 1
I = first(quadgk(x -> u₃(x)^2 * w(x), -1, 1))
ws = [k₄/k₃ * 1/(derivative(u₄,xᵢ) * u₃(xᵢ)) * I for xᵢ ∈ xs]
(xs, ws)
```
We compare now to the values returned by `kronrod` in `QuadGK`
```{julia}
kxs, kwts, wts = kronrod(4, -1, 1)
[ws wts xs kxs[2:2:end]]
```
(The `kronrod` function computes $2n-1$ nodes and weights. The Gauss-Legendre nodes are $n$ of those, and extracted by taking the 2nd, 4th, etc.)
To compare integrations of some smooth function we have
```{julia}
u(x) = exp(x)
GL = sum(wᵢ * u(xᵢ) for (xᵢ, wᵢ) ∈ zip(xs, ws))
KL = sum(wᵢ * u(xᵢ) for (xᵢ, wᵢ) ∈ zip(kxs, kwts))
QL, esterror = quadgk(u, -1, 1)
(; GL, KL, QL, esterror)
```
The first two are expected to not be as accurate, as they utilize a fixed number of nodes.
## Questions
###### Question
Let $p_i$ for $i$ in $0$ to $n$ be polynomials of degree $i$. It is true that for any polynomial $q(x)$ of degree $k \leq n$ that there is a linear combination such that $q(x) = a_0 p_0(x) + \cdots + a_k p_k(x)$.
First it is enough to do this for a monic polynomial $x^k$, why?
```{julia}
#| echo: false
choices = [raw"If you can do it for each $x^i$ then if $q(x) = b_0 + b_1x + b_2x^2 + \cdots + b_k x^k$ we just multiply the coefficients for each $x^i$ by $b_i$ and add.",
raw"It isn't true"]
radioq(choices, 1)
```
Suppose $p_0 = k_0$ and $p_1 = k_1x + a$. How would you make $x=x^1$?
```{julia}
#| echo: false
choices = [raw"$(p1 - (a/k_0) p_0)/k_1$",
raw"$p1 - p0$"]
radioq(choices, 1)
```
Let $p_i = k_i x^i + \cdots$ ($k_i$ is the leading term)
To reduce $p_3 = k_3x^3 + a_2x^2 + a_1x^1 + a_0$ to $k_3x^3$ we could try:
* form $q_3 = p_3 - p_2 (a_2/k_2)$. As $p_2$ is degree $2$, this leaves $k_3x^3$ alone, but it
```{julia}
#| echo: false
choices = [raw"It leaves $0$ as the coefficient of $x^2$",
raw"It leaves all the other terms as $0$"]
radioq(choices, 1)
```
* We then use $p_1$ times some multiple $a/k_1$ to remove the $x$ term
* we then use $p_0$ times some multiple $a/k_0$ to remove the constant term
Would this strategy work to reduce $p_n$ to $k_n x^n$?
```{julia}
#| echo: false
radioq(["Yes", "No"], 1)
```
###### Question
Suppose $p(x)$ and $q(x)$ are polynomials of degree $n$ and there are $n+1$ points for which $p(x_i) = q(x_i)$.
First, is it true or false that a polynomial of degree $n$ has *at most* n zeros?
```{julia}
#| echo: false
radioq(["true, unless it is the zero polynomial", "false"], 1)
```
What is the degree of $h(x) = p(x) - q(x)$?
```{julia}
#| echo: false
radioq([raw"At least $n+1$", raw"At most $n$"], 2)
```
At least how many zeros does the polynomial $h(x)$ have?
```{julia}
#| echo: false
radioq([raw"At least $n+1$", raw"At most $n$"], 1)
```
Is $p(x) = q(x)$ with these assumptions?
```{julia}
#| echo: false
radioq(["yes", "no"], 1)
```
###### Question
We wish to show that for any $k \leq n$ that
$$
\int_I \frac{x^k p_n(x)}{x - x_i} w(x) dx
= x_i^k \int_I \frac{p_n(x)}{x - x_i} w(x) dx
$$
We have for $u=x/x_i$ that
$$
\frac{1}{x - x_i} = \frac{1 - u^k}{x - x_i} + \frac{u^k}{x - x_i}
$$
But the first term, $(1-u^k)/(x-x_i)$ is a polynomial of degree $k-1$. Why?
```{julia}
#| echo: false
choices = [raw"""
Because we can express this as $x_i^k - x^k$ which factors as $(x_i - x) \cdot u(x)$ where $u(x)$ has degree $k-1$, at most.
""",
raw"""
It isn't true, it clearly has degree $k$
"""]
radioq(choices, 1)
```
This gives if $k \leq n$ and with $u=x/x_i$:
$$
\begin{align*}
\int_I \frac{p_n(x)}{x - x_i} w(x) dx
&= \int_I p_n(x) \left( \textcolor{red}{\frac{1 - u^k}{x - x_i}} + \frac{u^k}{x - x_i} \right) w(x) dx\\
&= \int_I p_n(x) \frac{\frac{x^k}{x_i^k}}{x - x_i} w(x) dx\\
&= \frac{1}{x_i^k} \int_I \frac{x^k p_n(x)}{x - x_i} w(x) dx
\end{align*}
$$

View File

@@ -1,4 +1,4 @@
# Partial Fractions
# Partial fractions
{{< include ../_common_code.qmd >}}
@@ -14,7 +14,9 @@ using SymPy
Integration is facilitated when an antiderivative for $f$ can be found, as then definite integrals can be evaluated through the fundamental theorem of calculus.
However, despite differentiation being an algorithmic procedure, integration is not. There are "tricks" to try, such as substitution and integration by parts. These work in some cases. However, there are classes of functions for which algorithms exist. For example, the `SymPy` `integrate` function mostly implements an algorithm that decides if an elementary function has an antiderivative. The [elementary](http://en.wikipedia.org/wiki/Elementary_function) functions include exponentials, their inverses (logarithms), trigonometric functions, their inverses, and powers, including $n$th roots. Not every elementary function will have an antiderivative comprised of (finite) combinations of elementary functions. The typical example is $e^{x^2}$, which has no simple antiderivative, despite its ubiquitousness.
However, despite differentiation being an algorithmic procedure, integration is not. There are "tricks" to try, such as substitution and integration by parts. These work in some cases---but not all!
However, there are classes of functions for which algorithms exist. For example, the `SymPy` `integrate` function mostly implements an algorithm that decides if an elementary function has an antiderivative. The [elementary](http://en.wikipedia.org/wiki/Elementary_function) functions include exponentials, their inverses (logarithms), trigonometric functions, their inverses, and powers, including $n$th roots. Not every elementary function will have an antiderivative comprised of (finite) combinations of elementary functions. The typical example is $e^{x^2}$, which has no simple antiderivative, despite its ubiquitousness.
There are classes of functions where an (elementary) antiderivative can always be found. Polynomials provide a case. More surprisingly, so do their ratios, *rational functions*.
@@ -238,7 +240,11 @@ $$
#### Examples
Find an antiderivative for $1/(x\cdot(x^2+1)^2)$.
Find an antiderivative for
$$
\frac{1}{x\cdot(x^2+1)^2}.
$$
We have a partial fraction decomposition is:
@@ -259,7 +265,11 @@ integrate(1/q, x)
---
Find an antiderivative of $1/(x^2 - 2x-3)$.
Find an antiderivative of
$$
\frac{1}{x^2 - 2x-3}.
$$
We again just let `SymPy` do the work. A partial fraction decomposition is given by:

View File

@@ -289,11 +289,11 @@ where $u = (x-\mu)/\sigma$, so $du = (1/\sigma) dx$.
This shows that integrals involving a normal density with parameters $\mu$ and $\sigma$ can be computed using the *standard* normal density with $\mu=0$ and $\sigma=1$. Unfortunately, there is no elementary antiderivative for $\exp(-u^2/2)$, so integrals for the standard normal must be numerically approximated.
There is a function `erf` in the `SpecialFunctions` package (which is loaded by `CalculusWithJulia`) that computes:
There is a function `erf` in the `SpecialFunctions` package (which is loaded by `CalculusWithJulia`) defined by:
$$
\int_0^x \frac{2}{\sqrt{\pi}} \exp(-t^2) dt
\text{erf}(x) = \frac{2}{\sqrt{\pi}}\int_0^x \exp(-t^2) dt
$$
A further change of variables by $t = u/\sqrt{2}$ (with $\sqrt{2}dt = du$) gives:

View File

@@ -1,4 +1,4 @@
# Surface Area
# Surface area
{{< include ../_common_code.qmd >}}
@@ -63,7 +63,7 @@ $$
If the curve is parameterized by $(g(t), f(t))$ between $a$ and $b$ then the surface area is
$$
\int_a^b 2\pi f(t) \cdot \sqrt{g'(t)^2 + f'(t)^2} dx.
\int_a^b 2\pi f(t) \cdot \sqrt{g'(t)^2 + f'(t)^2} dt.
$$
These formulas do not add in the surface area of either of the ends.
@@ -90,11 +90,343 @@ To see why this formula is as it is, we look at the parameterized case, the firs
Let a partition of $[a,b]$ be given by $a = t_0 < t_1 < t_2 < \cdots < t_n =b$. This breaks the curve into a collection of line segments. Consider the line segment connecting $(g(t_{i-1}), f(t_{i-1}))$ to $(g(t_i), f(t_i))$. Rotating this around the $x$ axis will generate something approximating a disc, but in reality will be the frustum of a cone. What will be the surface area?
::: {#fig-surface-area}
```{julia}
#| echo: false
let
gr()
function projection_plane(v)
vx, vy, vz = v
a = [-vy, vx, 0] # v ⋅ a = 0
b = v × a # so v ⋅ b = 0
return (a/norm(a), b/norm(b))
end
function project(x, v)
â, b̂ = projection_plane(v)
(x ⋅ â, x ⋅ b̂) # (x ⋅ â) â + (x ⋅ b̂) b̂
end
radius(t) = 1 / (1 + exp(t))
t₀, tₙ = 0, 3
surf(t, θ) = [t, radius(t)*cos(θ), radius(t)*sin(θ)]
Consider a right-circular cone parameterized by an angle $\theta$ and the largest radius $r$ (so that the height satisfies $r/h=\tan(\theta)$). If this cone were made of paper, cut up a side, and laid out flat, it would form a sector of a circle, whose area would be $R^2\gamma/2$ where $R$ is the radius of the circle (also the side length of our cone), and $\gamma$ an angle that we can figure out from $r$ and $\theta$. To do this, we note that the arc length of the circle's edge is $R\gamma$ and also the circumference of the bottom of the cone so $R\gamma = 2\pi r$. With all this, we can solve to get $A = \pi r^2/\sin(\theta)$. But we have a frustum of a cone with radii $r_0$ and $r_1$, so the surface area is a difference: $A = \pi (r_1^2 - r_0^2) /\sin(\theta)$.
v = [2, -2, 1]
function plot_axes()
empty_style = (xaxis = ([], false),
yaxis = ([], false),
legend=false)
plt = plot(; empty_style...)
axis_values = [[(0,0,0), (3.5,0,0)], # x axis
[(0,0,0), (0, 2.0 * radius(0), 0)], # yaxis
[(0,0,0), (0, 0, 1.5 * radius(0))]] # z axis
for (ps, ax) ∈ zip(axis_values, ("x", "y", "z"))
p0, p1 = ps
a, b = project(p0, v), project(p1, v)
annotate!([(b...,text(ax, :bottom))])
plot!([a, b]; arrow=true, head=:tip, line=(:gray, 1)) # gr() allows arrows
end
plt
end
function psurf(v)
(t,θ) -> begin
v1, v2 = project(surf(t, θ), v)
[v1, v2] # or call collect to make a tuple into a vector
end
end
function detJ(F, t, θ)
∂θ = ForwardDiff.derivative(θ -> F(t, θ), θ)
∂t = ForwardDiff.derivative(t -> F(t, θ), t)
(ax, ay), (bx, by) = ∂θ, ∂t
ax * by - ay * bx
end
function cap!(t, v; kwargs...)
θs = range(0, 2pi, 100)
S = Shape(project.(surf.(t, θs), (v,)))
plot!(S; kwargs...)
end
## ----
G = psurf(v)
fold(F, t, θmin, θmax) = find_zero(θ -> detJ(F, t, θ), (θmin, θmax))
plt = plot_axes()
Relating this to our values in terms of $f$ and $g$, we have $r_1=f(t_i)$, $r_0 = f(t_{i-1})$, and $\sin(\theta) = \Delta f / \sqrt{(\Delta g)^2 + (\Delta f)^2}$, where $\Delta f = f(t_i) - f(t_{i-1})$ and similarly for $\Delta g$.
ts = range(t₀, tₙ, 100)
back_edge = fold.(G, ts, 0, pi)
front_edge = fold.(G, ts, pi, 2pi)
db = Dict(t => v for (t,v) in zip(ts, back_edge))
df = Dict(t => v for (t,v) in zip(ts, front_edge))
# basic shape
plt = plot_axes()
plot!(project.(surf.(ts, back_edge), (v,)); line=(:black, 1))
plot!(project.(surf.(ts, front_edge), (v,)); line=(:black, 1))
# add caps
cap!(t₀, v; fill=(:gray, 0.33))
cap!(tₙ, v; fill=(:gray, 0.33))
# add rotated surface segment
i,j = 33,38
a = ts[i]
θs = range(db[ts[i]], df[ts[i]], 100)
θs = reverse(range(db[ts[j]], df[ts[j]], 100))
function 𝐺(t,θ)
v1, v2 = G(t, θ)
(v1, v2)
end
S = Shape(vcat(𝐺.(ts[i], θs), 𝐺.(ts[j], θs)))
plot!(S)
θs = range(df[ts[i]], 2pi + db[ts[i]], 100)
plot!([𝐺(ts[i], θ) for θ in θs]; line=(:black, 1, :dash))
θs = range(df[ts[j]], 2pi + db[ts[j]], 100)
plot!([𝐺(ts[j], θ) for θ in θs]; line=(:black, 1))
plot!([project((ts[i], 0,0),v), 𝐺(ts[i],db[ts[i]])]; line=(:black, 1, :dot), arrow=true)
plot!([project((ts[j], 0,0),v), 𝐺(ts[j],db[ts[j]])]; line=(:black, 1, :dot), arrow=true)
# add shading
lightpt = [2, -2, 5] # from further above
H = psurf(lightpt)
light_edge = fold.(H, ts, pi, 2pi);
for (i, (t, top, bottom)) in enumerate(zip(ts, light_edge, front_edge))
λ = iseven(i) ? 1.0 : 0.8
top = bottom + λ*(top - bottom)
curve = [project(surf(t, θ), v) for θ in range(bottom, top, 20)]
plot!(curve, line=(:black, 1))
end
# annotations
_x, _y, _z = surf(ts[i],db[ts[i]])
__x, __y = project((_x, _y/2, _z/2), v)
_x, _y, _z = surf(ts[j],db[ts[j]])
__x, __y = project((_x, _y/2, _z/2), v)
# annotations
annotate!([
(__x, __y, text(L"r_i", :left, :top)),
(__x, __y, text(L"r_{i+1}",:left, :top)),
])
current()
end
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration of function $(g(t), f(t))$ rotated about the $x$ axis with a section shaded.
:::
Consider a right-circular cone parameterized by an angle $\theta$ which at a given height has radius $r$ and slant height $l$ (so that the height satisfies $r/l=\sin(\theta)$). If this cone were made of paper, cut up a side, and laid out flat, it would form a sector of a circle, as illustrated below:
::: {#fig-frustum-cone-area}
```{julia}
#| echo: false
p1 = let
gr()
function projection_plane(v)
vx, vy, vz = v
a = [-vy, vx, 0] # v ⋅ a = 0
b = v × a # so v ⋅ b = 0
return (a/norm(a), b/norm(b))
end
function project(x, v)
â, b̂ = projection_plane(v)
(x ⋅ â, x ⋅ b̂) # (x ⋅ â) â + (x ⋅ b̂) b̂
end
function plot_axes(v)
empty_style = (xaxis = ([], false),
yaxis = ([], false),
legend=false)
plt = plot(; empty_style..., aspect_ratio=:equal)
a,b,c,d,e = project.([(0,0,2), (0,0,3), surf(3, 3pi/2), surf(2, 3pi/2),(0,0,0)], (v,))
pts = [a,b,c,d,a]#project.([a,b,c,d,a], (v,))
plot!(pts; line=(:gray, 1))
plot!([c,d]; line=(:black, 2))
plot!([d, e,a]; line=(:gray, 1,1))
#plot!(project.([e,a,d,e],(v,)); line=(:gray, 1))
plt
end
function psurf(v)
(t,θ) -> begin
v1, v2 = project(surf(t, θ), v)
[v1, v2] # or call collect to make a tuple into a vector
end
end
function detJ(F, t, θ)
∂θ = ForwardDiff.derivative(θ -> F(t, θ), θ)
∂t = ForwardDiff.derivative(t -> F(t, θ), t)
(ax, ay), (bx, by) = ∂θ, ∂t
ax * by - ay * bx
end
function cap!(t, v; kwargs...)
θs = range(0, 2pi, 100)
S = Shape(project.(surf.(t, θs), (v,)))
plot!(S; kwargs...)
end
function fold(F, t, θmin, θmax)
𝐹(θ) = detJ(F, t, θ)
𝐹(θmin) * 𝐹(θmax) <= 0 || return NaN
find_zero(𝐹, (θmin, θmax))
end
radius(t) = t/2
t₀, tₙ = 0, 3
surf(t, θ) = [radius(t)*cos(θ), radius(t)*sin(θ), t] # z axis
v = [2, -2, 1]
G = psurf(v)
ts = range(t₀, tₙ, 100)
back_edge = fold.(G, ts, 0, pi)
front_edge = fold.(G, ts, pi, 2pi)
db = Dict(t => v for (t,v) in zip(ts, back_edge))
df = Dict(t => v for (t,v) in zip(ts, front_edge))
plt = plot_axes(v)
plot!(project.(surf.(ts, back_edge), (v,)); line=(:black, 1))
plot!(project.(surf.(ts, front_edge), (v,)); line=(:black, 1))
cap!(tₙ, v; fill=(:gray80, 0.33))
i = 67
tᵢ = ts[i] # tᵢ = 2.0
plot!(project.([surf.(tᵢ, θ) for θ in range(df[tᵢ], 2pi + db[tᵢ], 100)], (v,)))
# add surface to rotate
## add light
lightpt = [2, -2, 5] # from further above
H = psurf(lightpt)
light_edge = fold.(H, ts, pi, 2pi);
for (i, (t, top, bottom)) in enumerate(zip(ts, light_edge, front_edge))
λ = iseven(i) ? 1.0 : 0.8
(isnan(top) || isnan(bottom)) && continue
top = bottom + λ*(top - bottom)
curve = [project(surf(t, θ), v) for θ in range(bottom, top, 20)]
#plot!(curve, line=(:black, 1))
end
a,b,c = project(surf(tₙ, 3pi/2), v), project(surf(2, 3pi/2),v), project((0,0,0), v)
#plot!([a,b], line=(:black, 3))
#plot!([b,c]; line=(:black,2))
# annotations
_x,_y,_z = surf(tₙ, 3pi/2)
r1 = project((_x/2, _y/2, _z), v)
_x,_y,_z = surf(2, 3pi/2)
r2 = project((_x/2, _y/2, _z), v)
_x, _y, _z = surf(1/2, 3pi/2)
theta = project((_x/2, _y/2, _z), v)
a, b = project.((surf(3, 3pi/2), surf(2, 3pi/2)), (v,))
annotate!([
(r1..., text(L"r_2",:bottom)),
(r2..., text(L"r_1",:bottom)),
(theta..., text(L"\theta")),
(a..., text(L"l_2",:right, :top)),
(b..., text(L"l_1", :right, :top))
])
current()
end
p2 = let
θ = 2pi - pi/3
θs = range(2pi-θ, 2pi, 100)
r1, r2 = 2, 3
empty_style = (xaxis = ([], false),
yaxis = ([], false),
legend=false,
aspect_ratio=:equal)
plt = plot(; empty_style...)
plot!(r1.*cos.(θs), r1 .* sin.(θs); line=(:black, 1))
plot!(r2.*cos.(θs), r2 .* sin.(θs); line=(:black, 1))
plot!([(0,0),(r1,0)]; line=(:gray, 1, :dash))
plot!([(r1,0),(r2,0)]; line=(:black, 1))
s, c = sincos(2pi-θ)
plot!([(0,0),(r1,0)]; line=(:gray, 1, :dash))
plot!([(0,0), (r1*c, r1*s)]; line=(:gray, 1, :dash))
plot!([(r1,0),(r2,0)]; line=(:black, 1))
plot!([(r1*c, r1*s), (r2*c, r2*s)]; line=(:black, 2))
s,c = sincos((2pi - θ)/2)
annotate!([
(1/2*c, 1/2*s, text(L"\gamma")),
(r1*c, r1*s, text(L"l_1",:left, :top)),
(r2*c, r2*s, text(L"l_2", :left, :top)),
])
#=
δ = pi/8
scs = reverse(sincos.(range(2pi-θ, 2pi - θ + pi - δ,100)))
plot!([1/2 .* (c,s) for (s,c) in scs]; line=(:gray, 1,:dash), arrow=true, side=:head)
scs = sincos.(range(2pi - θ + pi + δ, 2pi,100))
plot!([1/2 .* (c,s) for (s,c) in scs]; line=(:gray, 1,:dash), arrow=true, side=:head)
=#
end
plot(p1, p2)
```
```{julia}
#| echo: false
plotly()
nothing
```
The surface of a frustum of a cone and the same area spread out flat. Angle $\gamma = 2\pi(1 - \sin(\theta)$.
:::
By comparing circumferences, it is seen that the angles $\theta$ and $\gamma$ are related by $\gamma = 2\pi(1 - \sin(\theta))$ (as $2\pi r_2 = 2\pi l_2\sin(\theta) = (2\pi-\gamma)/(2\pi) \cdot 2\pi l_2$). The values $l_i$ and $r_i$ are related by $r_i = l_i \sin(\theta)$. The area in both pictures is: $(\pi l_2^2 - \pi l_1^2) \cdot (2\pi-\gamma)/(2\pi)$ which simplifies to $\pi (l_2 + l_1) \cdot \sin(\theta) \cdot (l_2 - l_1)$ or $2\pi \cdot (r_2 - r_1)/2 \cdot \text{slant height}$.
Relating this to our values in terms of $f$ and $g$, we have $r_1=f(t_i)$, $r_0 = f(t_{i-1})$, and the slant height is related by $(l_2-l_1)^2 = (g(t_2)-g(t_1))^2 + (f(t_2) - f(t_1))^2$.
Putting this altogether we get that the surface area generarated by rotating the line segment around the $x$ axis is
@@ -102,7 +434,7 @@ Putting this altogether we get that the surface area generarated by rotating the
$$
\text{sa}_i = \pi (f(t_i)^2 - f(t_{i-1})^2) \cdot \sqrt{(\Delta g)^2 + (\Delta f)^2} / \Delta f =
\pi (f(t_i) + f(t_{i-1})) \cdot \sqrt{(\Delta g)^2 + (\Delta f)^2}.
2\pi \frac{f(t_i) + f(t_{i-1})}{2} \cdot \sqrt{(\Delta g)^2 + (\Delta f)^2}.
$$
(This is $2 \pi$ times the average radius times the slant height.)
@@ -122,7 +454,9 @@ $$
\text{SA} = \int_a^b 2\pi f(t) \sqrt{g'(t)^2 + f'(t)^2} dt.
$$
If we assume integrability of the integrand, then as our partition size goes to zero, this approximate surface area converges to the value given by the limit. (As with arc length, this needs a technical adjustment to the Riemann integral theorem as here we are evaluating the integrand function at four points ($t_i$, $t_{i-1}$, $\xi$ and $\psi$) and not just at some $c_i$. An figure appears at the end.
If we assume integrability of the integrand, then as our partition size goes to zero, this approximate surface area converges to the value given by the limit. (As with arc length, this needs a technical adjustment to the Riemann integral theorem as here we are evaluating the integrand function at four points ($t_i$, $t_{i-1}$, $\xi$ and $\psi$) and not just at some $c_i$.
#### Examples
@@ -176,7 +510,7 @@ F(1) - F(0)
### Plotting surfaces of revolution
The commands to plot a surface of revolution will be described more clearly later; for now we present them as simply a pattern to be followed in case plots are desired. Suppose the curve in the $x-y$ plane is given parametrically by $(g(u), f(u))$ for $a \leq u \leq b$.
The commands to plot a surface of revolution will be described more clearly later; for now we present them as simply a pattern to be followed in case plots are desired. Suppose the curve in the $x-z$ plane is given parametrically by $(g(u), f(u))$ for $a \leq u \leq b$.
To be concrete, we parameterize the circle centered at $(6,0)$ with radius $2$ by:
@@ -195,14 +529,14 @@ The plot of this curve is:
#| hold: true
us = range(a, b, length=100)
plot(g.(us), f.(us), xlims=(-0.5, 9), aspect_ratio=:equal, legend=false)
plot!([0,0],[-3,3], color=:red, linewidth=5) # y axis emphasis
plot!([3,9], [0,0], color=:green, linewidth=5) # x axis emphasis
plot!([(0, -3), (0, 3)], line=(:red, 5)) # z axis emphasis
plot!([(3, 0), (9, 0)], line=(:green, 5)) # x axis emphasis
```
Though parametric plots have a convenience constructor, `plot(g, f, a, b)`, we constructed the points with `Julia`'s broadcasting notation, as we will need to do for a surface of revolution. The `xlims` are adjusted to show the $y$ axis, which is emphasized with a layered line. The line is drawn by specifying two points, $(x_0, y_0)$ and $(x_1, y_1)$ in the form `[x0,x1]` and `[y0,y1]`.
Though parametric plots have a convenience constructor, `plot(g, f, a, b)`, we constructed the points with `Julia`'s broadcasting notation, as we will need to do for a surface of revolution. The `xlims` are adjusted to show the $y$ axis, which is emphasized with a layered line. The line is drawn by specifying two points, $(x_0, y_0)$ and $(x_1, y_1)$ using tuples and wrapping in a vector.
Now, to rotate this about the $y$ axis, creating a surface plot, we have the following pattern:
Now, to rotate this about the $z$ axis, creating a surface plot, we have the following pattern:
```{julia}
S(u,v) = [g(u)*cos(v), g(u)*sin(v), f(u)]
@@ -210,23 +544,22 @@ us = range(a, b, length=100)
vs = range(0, 2pi, length=100)
ws = unzip(S.(us, vs')) # reorganize data
surface(ws..., zlims=(-6,6), legend=false)
plot!([0,0], [0,0], [-3,3], color=:red, linewidth=5) # y axis emphasis
plot!([(0,0,-3), (0,0,3)], line=(:red, 5)) # z axis emphasis
```
The `unzip` function is not part of base `Julia`, rather part of `CalculusWithJulia` (it is really `SplitApplyCombine`'s `invert` function). This function rearranges data into a form consumable by the plotting methods like `surface`. In this case, the result of `S.(us,vs')` is a grid (matrix) of points, the result of `unzip` is three grids of values, one for the $x$ values, one for the $y$ values, and one for the $z$ values. A manual adjustment to the `zlims` is used, as `aspect_ratio` does not have an effect with the `plotly()` backend and errors on 3d graphics with `pyplot()`.
The `unzip` function is not part of base `Julia`, rather part of `CalculusWithJulia` (it is really `SplitApplyCombine`'s `invert` function). This function rearranges data into a form consumable by the plotting methods like `surface`. In this case, the result of `S.(us,vs')` is a grid (matrix) of points, the result of `unzip` is three grids of values, one for the $x$ values, one for the $y$ values, and one for the $z$ values. A manual adjustment to the `zlims` is used, as `aspect_ratio` does not have an effect with the `plotly()` backend.
To rotate this about the $x$ axis, we have this pattern:
```{julia}
#| hold: true
S(u,v) = [g(u), f(u)*cos(v), f(u)*sin(v)]
us = range(a, b, length=100)
vs = range(0, 2pi, length=100)
ws = unzip(S.(us,vs'))
surface(ws..., legend=false)
plot!([3,9], [0,0],[0,0], color=:green, linewidth=5) # x axis emphasis
plot([(3,0,0), (9,0,0)], line=(:green,5)) # x axis emphasis
surface!(ws..., legend=false)
```
The above pattern covers the case of rotating the graph of a function $f(x)$ of $a,b$ by taking $g(t)=t$.
@@ -551,46 +884,3 @@ a, b = 0, pi
val, _ = quadgk(t -> 2pi* f(t) * sqrt(g'(t)^2 + f'(t)^2), a, b)
numericq(val)
```
# Appendix
```{julia}
#| hold: true
#| echo: false
gr()
## For **some reason** having this in the natural place messes up the plots.
## {{{approximate_surface_area}}}
xs,ys = range(-1, stop=1, length=50), range(-1, stop=1, length=50)
f(x,y)= 2 - (x^2 + y^2)
dr = [1/2, 3/4]
df = [f(dr[1],0), f(dr[2],0)]
function sa_approx_graph(i)
p = plot(xs, ys, f, st=[:surface], legend=false)
for theta in range(0, stop=i/10*2pi, length=10*i )
path3d!(p,sin(theta)*dr, cos(theta)*dr, df)
end
p
end
n = 10
anim = @animate for i=1:n
sa_approx_graph(i)
end
imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)
caption = L"""
Surface of revolution of $f(x) = 2 - x^2$ about the $y$ axis. The lines segments are the images of rotating the secant line connecting $(1/2, f(1/2))$ and $(3/4, f(3/4))$. These trace out the frustum of a cone which approximates the corresponding surface area of the surface of revolution. In the limit, this approximation becomes exact and a formula for the surface area of surfaces of revolution can be used to compute the value.
"""
plotly()
ImageFile(imgfile, caption)
```

View File

@@ -6,30 +6,45 @@ This section uses these packages:
using SymPy
using Plots
using Roots
plotly()
```
----
```{julia}
#| echo: false
using LaTeXStrings
gr();
```
---
In the March 2003 issue of the College Mathematics Journal, Leon M Hall posed 12 questions related to the following figure:
```{julia}
#| echo: false
f(x) = x^2
fp(x) = 2x
a₀ = 7/8
q₀ = -a₀ - 1/(2a₀)
f(x) = x^2
fp(x) = 2x
tangent(x) = f(a₀) + fp(a₀) * (x - a₀)
normal(x) = f(a₀) - (1 / fp(a₀)) * (x - a₀)
function make_plot(a₀=7/8, q₀=-a₀ - 1/2a₀)
plt = plot(; xlim=(-2,2), ylim=(-1, (1.5)^2),
xticks=nothing, yticks=nothing,
aspect_ratio=:equal, border=:none, legend=false)
function make_plot()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
tangent(x) = f(a₀) + fp(a₀) * (x - a₀)
normal(x) = f(a₀) - (1 / fp(a₀)) * (x - a₀)
plt = plot(; empty_style...,
xlims=(-2,2), ylims=(-1, (1.5)^2))
f(x) = x^2
fp(x) = 2x
plot!(f, -1.5, 1.5, line=(1, :royalblue))
plot!(zero, line=(1, :black))
plot!(f, -1.5, 1.5, line=(2, :black))
plot!([-1.6, 1.6], [0,0]; axis_style...)
tl = x -> f(a₀) + fp(a₀) * (x-a₀)
nl = x -> f(a₀) - 1/(fp(a₀)) * (x-a₀)
@@ -40,9 +55,10 @@ function make_plot(a₀=7/8, q₀=-a₀ - 1/2a₀)
# add in right triangle
scatter!([a₀, q₀], f.([a₀, q₀]), markersize=5)
Δ = 0.01
annotate!([(a₀ + Δ, nl(a₀+Δ), "P", :bottom),
(q₀ - Δ, nl(q₀-Δ), "Q", :top)])
plt
annotate!([(a₀ + Δ, nl(a₀+Δ), text(L"P", :top)),
(q₀ - Δ, nl(q₀-Δ), text(L"Q", :bottom, :left))
])
current()
end
make_plot()
```
@@ -64,7 +80,7 @@ zs = solve(f(x) ~ nl, x)
q = only(filter(!=(a), zs))
```
----
---
The first question is simply:
@@ -99,7 +115,7 @@ In the remaining examples we don't show the code by default.
:::
----
---
> 1b. The length of the line segment $PQ$
@@ -117,7 +133,7 @@ lseg = sqrt((f(a) - f(q))^2 + (a - q)^2);
```
----
---
> 2a. The horizontal distance between $P$ and $Q$
@@ -135,7 +151,7 @@ plot!([q₀, a₀], [f(a₀), f(a₀)], linewidth=5)
hd = a - q;
```
----
---
> 2b. The area of the parabolic segment
@@ -156,7 +172,7 @@ plot!(xs, ys, fill=(:green, 0.25, 0))
A = simplify(integrate(nl - f(x), (x, q, a)));
```
----
---
> 2c. The volume of the rotated solid formed by revolving the parabolic segment around the vertical line $k$ units to the right of $P$ or to the left of $Q$ where $k > 0$.
@@ -169,7 +185,7 @@ A = simplify(integrate(nl - f(x), (x, q, a)));
V = simplify(integrate(2PI*(nl-f(x))*(a - x + k),(x, q, a)));
```
----
---
> 3. The $y$ coordinate of the centroid of the parabolic segment
@@ -198,7 +214,7 @@ yₘ = integrate( (1//2) * (nl^2 - f(x)^2), (x, q, a)) / A
yₘ = simplify(yₘ);
```
----
---
> 4. The length of the arc of the parabola between $P$ and $Q$
@@ -217,7 +233,7 @@ p
L = integrate(sqrt(1 + fp(x)^2), (x, q, a));
```
----
---
> 5. The $y$ coordinate of the midpoint of the line segment $PQ$
@@ -238,7 +254,7 @@ p
mp = nl(x => (a + q)/2);
```
----
---
> 6. The area of the trapezoid bound by the normal line, the $x$-axis, and the vertical lines through $P$ and $Q$.
@@ -257,7 +273,7 @@ p
trap = 1//2 * (f(q) + f(a)) * (a - q);
```
----
---
> 7. The area bounded by the parabola and the $x$ axis and the vertical lines through $P$ and $Q$
@@ -279,7 +295,7 @@ p
pa = integrate(x^2, (x, q, a));
```
----
---
> 8. The area of the surface formed by revolving the arc of the parabola between $P$ and $Q$ around the vertical line through $P$
@@ -305,7 +321,7 @@ vv(x) = f(a - uu(x))
SA = 2PI * integrate(uu(x) * sqrt(diff(uu(x),x)^2 + diff(vv(x),x)^2), (x, q, a));
```
----
---
> 9. The height of the parabolic segment (i.e. the distance between the normal line and the tangent line to the parabola that is parallel to the normal line)
@@ -334,7 +350,7 @@ segment_height = sqrt((b-b)^2 + (f(b) - nl(x=>b))^2);
```
----
---
> 10. The volume of the solid formed by revolving the parabolic segment around the $x$-axis
@@ -355,7 +371,7 @@ end
Vₓ = integrate(pi * (nl^2 - f(x)^2), (x, q, a));
```
----
---
> 11. The area of the triangle bound by the normal line, the vertical line through $Q$ and the $x$-axis
@@ -376,7 +392,7 @@ plot!([p₀,q₀,q₀,p₀], [0,f(q₀),0,0];
triangle = 1/2 * f(q) * (a - f(a)/(-1/fp(a)) - q);
```
----
---
> 12. The area of the quadrilateral bound by the normal line, the tangent line, the vertical line through $Q$ and the $x$-axis
@@ -401,7 +417,7 @@ x₁,x₂,x₃,x₄ = (a,q,q,tl₀)
y₁, y₂, y₃, y₄ = (f(a), f(q), 0, 0)
quadrilateral = (x₁ - x₂)*(y₁ - y₃)/2 - (x₁ - x₃)*(y₁ - y₂)/2 + (x₁ - x₃)*(y₁ - y₄)/2 - (x₁ - x₄)*(y₁ - y₃)/2;
```
----
---
The answers appear here in sorted order, some given as approximate floating point values:

View File

@@ -19,7 +19,44 @@ using SymPy
```{julia}
#| echo: false
#| results: "hidden"
import LinearAlgebra: norm
import LinearAlgebra: norm, cross
using SplitApplyCombine
nothing
```
```{julia}
#| echo: false
# commands used for plotting from https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09surf_of_rotation2.py
#linear projection of R^3 onto R^2
function _proj(X, v)
# a is ⟂ to v and b is v × a
vx, vy, vz = v
a = [-vy, vx, 0]
b = cross([vx,vy,vz], a)
a, b = a/norm(a), b/norm(b)
return (a ⋅ X, b ⋅ X)
end
# project a curve in R3 onto R2
pline(viewp, ps...) = [_proj(p, viewp) for p in ps]
# determinant of Jacobian; area multiplier
# det(J); used to identify folds
function jac(X, u, v)
return det(ForwardDiff.jacobian(xs -> collect(X(xs...)), [u,v]))
end
function _fold(F, t, θmin, θmax)
λ = θ -> jac(F, t, θ) # F is projected surface, psurf
iszero(λ(θmin)) && return θmin
iszero(λ(θmax)) && return θmax
return solve(ZeroProblem(λ, (θmin, θmax)))
end
nothing
```
@@ -122,6 +159,103 @@ The formula is for a rotation around the $x$-axis, but can easily be generalized
:::
::: {#fig-solid-of-revolution}
```{julia}
#| echo: false
plt = let
gr()
# Follow lead of # https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09surf_of_rotation2.py
# plot surface of revolution around x axis between [0, 3]
# best if r(t) decreases
rad(x) = 2/(1 + exp(x))
trange = (0,3)
θrange = (0, 2pi)
viewp = [2,-2, 1]
##
proj(X) = _proj(X, viewp)
# surface of revolution
surf(t, z) = [t, rad(t)*cos(z), rad(t)*sin(z)]
# project the surface at (t, a=theta)
psurf(t,z) = proj(surf(t,z))
# create shape holding project disc
drawdiscF(t) = Shape(invert([psurf(t, 2*i*pi/100) for i in 1:101])...)
α = 1.0 # opacity
line_style = (; line=(:black, 1))
plot(; empty_style..., aspect_ratio=:equal)
# by layering, we get x-axis as desired
plot!(pline(viewp, [-1,0,0], [0,0,0]); line_style...)
plot!(drawdiscF(0); fill =(:lightgray, α))
plot!(pline(viewp, [0,0,0], [1,0,0]); line_style...)
plot!(drawdiscF(1); fill =(:black, α)) # black to lightgray gives thickness
plot!(drawdiscF(1.1); fill=(:lightgray, α))
plot!(pline(viewp, [1.1,0,0], [2,0,0]); line_style...)
plot!(drawdiscF(2); fill=(:lightgray, α))
plot!(pline(viewp, [2,0,0], [3,0,0]); line_style...)
plot!(drawdiscF(3); fill=(:lightgray, α))
plot!(pline(viewp, [3,0,0], [4,0,0]); line_style..., arrow=true, side=:head)
plot!(pline(viewp, [0,0,0], [0,0,1.25]); line_style..., arrow=true, side=:head)
tt = range(trange..., 30)
curve = psurf.(tt, pi/2)
plot!(curve; line=(:black, 2))
f1 = [(t, _fold(psurf, t, 0, pi)) for t in tt]
curve = [psurf(f[1], f[2]) for f in f1]
plot!(curve; line=(:black,1))
f2 = [(t, _fold(psurf, t, pi, 2*pi)) for t in tt]
curve = [psurf(f[1], f[2]) for f in f2]
plot!(curve; line=(:black,1))
## find bottom edge (t,θ) again
tt = range(0, 3, 120)
f1 = [(t, _fold(psurf, t, pi, 2*pi)) for t in range(trange..., 100)]
# shade bottom by adding bigger density of lines near bottom
for (i,f) ∈ enumerate(f1)
λ = iseven(i) ? 6 : 4 # adjust density by have some lines only extend to 6
isnan(f[1]) || isnan(f[2]) && continue
curve = [psurf(f[1], θ) for θ in range(f[2] - 0.2*(λ - f[1]), f[2], 20)]
plot!(curve; line=(:black, 1))
end
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration of a figure being rotated around the $x$-axis. The discs have approximate volume given by the area of the base times the height or $\pi r(x)^2 \Delta x$. (Figure ported from @Angenent.)
:::
For a numeric example, we consider the original Red [Solo](http://en.wikipedia.org/wiki/Red_Solo_Cup) Cup. The dimensions of the cup were basically: a top diameter of $d_1 = 3~ \frac{3}{4}$ inches, a bottom diameter of $d_0 = 2~ \frac{1}{2}$ inches and a height of $h = 4~ \frac{3}{4}$ inches.
@@ -352,6 +486,109 @@ $$
V = \int_a^b \pi \cdot (R(x)^2 - r(x)^2) dx.
$$
::: {#fig-washer-illustration}
```{julia}
#| echo: false
plt = let
gr()
# Follow lead of # https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09surf_of_rotation2.py
# plot surface of revolution around x axis between [0, 3]
# best if r(t) decreases
rad(x) = 2/(1 + exp(x))
trange = (0, 3)
θrange = (0, 2pi)
viewp = [2,-2,1]
##
proj(X) = _proj(X, viewp)
# surface of revolution
surf(t, z) = [t, rad(t)*cos(z), rad(t)*sin(z)]
surf2(t, z) = (t, rad(t)*cos(z)/2, rad(t)*sin(z)/2)
# project the surface at (t, a=theta)
psurf(t,z) = proj(surf(t,z))
psurf2(t, z) = proj(surf2(t,z))
# create shape holding project disc
drawdiscF(t) = Shape(invert([psurf(t, 2*i*pi/100) for i in 1:101])...)
drawdiscI(t) = Shape([psurf2(t, 2*i*pi/100) for i in 1:101])
α = 1.0
line_style = (; line=(:black, 1))
plot(; empty_style..., aspect_ratio=:equal)
# by layering, we get x-axis as desired
plot!(pline(viewp, [-1,0,0], [0,0,0]); line_style...)
plot!(drawdiscF(0); fill =(:lightgray, α))
plot!(drawdiscI(0); fill=(:white, .5))
plot!(pline(viewp, [0,0,0], [1,0,0]); line_style...)
plot!(drawdiscF(1); fill =(:black, α)) # black to lightgray gives thickness
plot!(drawdiscI(1); fill=(:white, .5))
plot!(drawdiscF(1.1); fill=(:lightgray, α))
plot!(drawdiscI(1.1); fill=(:white, .5))
plot!(pline(viewp, [1.1,0,0], [2,0,0]); line_style...)
plot!(drawdiscF(2); fill=(:lightgray, α))
plot!(drawdiscI(2); fill=(:white, .5))
plot!(pline(viewp, [2,0,0], [3,0,0]); line_style...)
plot!(drawdiscF(3); fill=(:lightgray, α))
plot!(drawdiscI(3); fill=(:white, .5))
plot!(pline(viewp, [3,0,0], [4,0,0]); line_style..., arrow=true, side=:head)
plot!(pline(viewp, [0,0,0], [0,0,1.25]); line_style..., arrow=true, side=:head)
## bounding curves
### main spine
tt = range(trange..., 30)
curve = [psurf(t, pi/2) for t in tt]
plot!(curve; line=(:black, 2))
### the folds
f1 = [(t, _fold(psurf, t, 0, pi)) for t in tt]
curve = [psurf(f[1], f[2]) for f in f1]
plot!(curve; line=(:black,))
f2 = [(t, _fold(psurf, t, pi, 2*pi)) for t in tt]
curve = [psurf(f[1], f[2]) for f in f2]
plot!(curve; line=(:black,))
## add shading
### find bottom edge (t,θ) again
f1 = [[t, _fold(psurf, t, pi, 2*pi)] for t in range(trange..., 120)]
### shade bottom by adding bigger density of lines near bottom
for (i,f) ∈ enumerate(f1)
λ = iseven(i) ? 6 : 4 # adjust density by have some lines only extend to 6
isnan(f[1]) || isnan(f[2]) && continue
curve = [psurf(f[1], θ) for θ in range(f[2] - 0.2*(λ - f[1]), f[2], 20)]
plot!(curve; line=(:black, 1))
end
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Modification of earlier figure to show washer method. The interior volume would be given by $\int_a^b \pi r(x)^2 dx$, the entire volume by $\int_a^b \pi R(x)^2 dx$. The difference then is the volume computed by the washer method.
:::
##### Example
@@ -410,21 +647,86 @@ For a general cone, we use this [definition](http://en.wikipedia.org/wiki/Cone):
Let $h$ be the distance from the apex to the base. Consider cones with the property that all planes parallel to the base intersect the cone with the same shape, though perhaps a different scale. This figure shows an example, with the rays coming from the apex defining the volume.
::: {#fig-generic-cone}
```{julia}
#| echo: false
plt = let
gr()
rad(t) = 3/2 - t
trange = (0, 3/2)
θrange = (0, 2pi)
viewp = [2,-1/1.5,1/2+.2]
##
proj(X) = _proj(X, viewp)
# our surface
R, r, rho = 1, 1/4, 1/4
f(t) = (R-r) * cos(t) + rho * cos((R-r)/r * t)
g(t) = (R-r) * sin(t) - rho * sin((R-r)/r * t)
surf(t, θ) = (rad(t)*f(θ), rad(t)*g(θ), t)
psurf(t,θ) = proj(surf(t,θ))
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
drawdiscF(t) = Shape([psurf(t, 2*i*pi/100) for i in 1:101])
plot(; empty_style..., aspect_ratio=:equal)
for (i,t) in enumerate(range(0, 3/2, 30))
plot!(drawdiscF(t); fill=(:gray,1), line=(:black,1))
end
θ = 0; plot!([psurf(0, θ), psurf(3/2, θ)]; line=(:black, 2))
θ = pi/2; plot!([psurf(0, θ), psurf(3/2, θ)]; line=(:black, 1))
θ = 3pi/2; plot!([psurf(0, θ), psurf(3/2, θ)]; line=(:black, 1))
current()
end
plt
```
```{julia}
#| hold: true
#| echo: false
h = 5
R, r, rho = 1, 1/4, 1/4
f(t) = (R-r) * cos(t) + rho * cos((R-r)/r * t)
g(t) = (R-r) * sin(t) - rho * sin((R-r)/r * t)
ts = range(0, 2pi, length=100)
plotly()
nothing
```
p = plot(f.(ts), g.(ts), zero.(ts), legend=false)
for t ∈ range(0, 2pi, length=25)
plot!(p, [0,f(t)], [0,g(t)], [h, 0], linecolor=:red)
A "cone" formed from the parameterized curve
$r(t) = \langle
(R-r) \cdot \cos(t) + \rho \cdot \cos((R-r)/r \cdot t),
(R-r) \cdot \sin(t) - \rho \cdot \sin((R-r)/r \cdot t)
\rangle$ with apex at the point $[0,0,3/2]$ and rays extending down through the origin following $3/2-z$.
:::
```{julia}
#| echo: false
#| eval: false
plt = let
h = 5
R, r, rho = 1, 1/4, 1/4
f(t) = (R-r) * cos(t) + rho * cos((R-r)/r * t)
g(t) = (R-r) * sin(t) - rho * sin((R-r)/r * t)
ts = range(0, 2pi, length=100)
plot(f.(ts), g.(ts), zero.(ts), legend=false)
for t ∈ range(0, 2pi, length=25)
plot!([0,f(t)], [0,g(t)], [h, 0], linecolor=:red)
end
current()
end
p
plt
```
A right circular cone is one where this shape is a circle. This definition can be more general, as a square-based right pyramid is also such a cone. After possibly reorienting the cone in space so the base is at $u=0$ and the apex at $u=h$ the volume of the cone can be found from:
@@ -450,6 +752,78 @@ $$
This gives a general formula for the volume of such cones.
::: {#fig-cross-sections}
```{julia}
#| echo: false
plt = let
gr()
# sections
# https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09Xsections.py
x(h,z) = 0.3*h^2+(0.6-0.2*h)*cos(z)
y(h,z) = h+(0.3-0.2*h)*sin(z)+0.05*sin(4*z)
r(h,z) = (x(h,z), y(h,z))
r1(h,z) = (2,0) .+ r(h,z)
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
Nh=30
heights = range(-1/2, 1/2, Nh)
h0=heights[Nh ÷ 2]
h1=heights[Nh ÷ 2 + 1]
hs = [heights[1], h0, h1, heights[end]]
ts = range(0, 2pi, 300)
plot(; empty_style..., aspect_ratio=:equal)
# stack the curves
for h in heights
curve = r.(h, ts)
plot!(Shape(curve); fill=(:white, 1.0), line=(:black, 1))
end
# shape pull outs; use black to give thickness
for (h, color) in zip(hs, (:white, :black, :white, :white))
curve = r1.(h,ts)
plot!(Shape(curve); fill=(color,1.0), line=(:black, 1,))
end
# axis with marked points
plot!([(-1,-1), (-1, 1)]; axis_style...)
pts = [(-1, y(h, pi)) for h in hs]
scatter!(pts, marker=(5, :circle))
# connect with dashes
for h in hs
plot!([(-1, y(h, pi)), r(h,pi)]; line=(:black, 1, :dash))
plot!([r(h,0), r1(h,pi)]; line=(:black, 1, :dash))
end
current()
end
plt
```
This figure shows the volume of a figure being comprised of slices. A discrete approximation would be found by estimating the volume of each slice by the cross sectional area times a small $\Delta h$. This leads to a formula
$V = \int_a^b A(h)dh$, where $A$ computes the cross sectional area.
(This figure was ported from @Angenent.)
:::
```{julia}
#| echo: false
plotly()
nothing
```
### Cavalieri's method
@@ -457,39 +831,236 @@ This gives a general formula for the volume of such cones.
[Cavalieri's](http://tinyurl.com/oda9xd9) Principle is "Suppose two regions in three-space (solids) are included between two parallel planes. If every plane parallel to these two planes intersects both regions in cross-sections of equal area, then the two regions have equal volumes." (Wikipedia).
With the formula for the volume of solids based on cross sections, this is a trivial observation, as the functions giving the cross-sectional area are identical. Still, it can be surprising. Consider a sphere with an interior cylinder bored out of it. (The [Napkin](http://tinyurl.com/o237v83) ring problem.) The bore has height $h$ - for larger radius spheres this means very wide bores.
::: {#fig-Cavalieris-first}
```{julia}
#| echo: false
plt = let
gr()
x(h,z) = (0.6-0.2*h) * cos(z)
y(h,z) = h + (0.2-0.15*h) * sin(z) + 0.01 * sin(4*z)
xa(h,z) = 2 + 0.1 * cos(7*pi*h) + (0.6-0.2*h)*cos(z)
heights = range(-1/2, 1/2, 50)
ts = range(0, 2pi, 300)
h0 = heights[25]
h1 = heights[26]
plot(; empty_style..., aspect_ratio=:equal)
for h in heights
curve=[(x(h, t), y(h, t)) for t in ts]
plot!(Shape(curve); fill=(:white,), line=(:black,1))
curve=[(xa(h, t), y(h, t)) for t in ts]
plot!(Shape(curve); fill=(:white,), line=(:black,1))
end
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Illustration of Cavalieri's first principle. The discs from the left are moved around to form the left volume, but as the volumes of each cross-sectional disc remains the same, the two valumes are equally approximated. (This figure ported from @Angenent.)
:::
With the formula for the volume of solids based on cross sections, this is a trivial observation, as the functions giving the cross-sectional area are identical. Still, it can be surprising.
Consider a sphere with an interior cylinder bored out of it. (The [Napkin](http://tinyurl.com/o237v83) ring problem.) The bore has height $h$ - for larger radius spheres this means very wide bores.
::: {#fig-napkin-ring-1}
```{julia}
#| echo: false
plt = let
# Follow lead of # https://github.com/SigurdAngenent/WisconsinCalculus/blob/master/figures/221/09surf_of_rotation2.py
# plot surface of revolution around x axis between [0, 3]
# best if r(t) decreases
rad(t) = (t = clamp(t, -1, 1); sqrt(1 - t^2))
rad2(t) = 1/2
viewp = [2,-2,1]
##
function _proj(X, v)
# a is ⟂ to v and b is v × a
vx, vy, vz = v
a = [-vy, vx, 0]
b = cross([vx,vy,vz], a)
a, b = a/norm(a), b/norm(b)
return (a ⋅ X, b ⋅ X)
end
# project a curve in R3 onto R2
pline(viewp, ps...) = [_proj(p, viewp) for p in ps]
# determinant of Jacobian; area multiplier
# det(J); used to identify folds
function jac(X, u, v)
return det(ForwardDiff.jacobian(xs -> collect(X(xs...)), [u,v]))
end
function _fold(F, t, θmin, θmax)
λ = θ -> jac(F, t, θ) # F is projected surface, psurf
iszero(λ(θmin)) && return θmin
iszero(λ(θmax)) && return θmax
return solve(ZeroProblem(λ, (θmin, θmax)))
end
##
proj(X) = _proj(X, viewp)
# surface of revolution about the z axis
surf(t, z) = (rad(t)*cos(z), rad(t)*sin(z), t)
surf2(t, z) = (rad2(t)*cos(z), rad2(t)*sin(z), t)
# project the surface at (t, a=theta)
psurf(t,z) = proj(surf(t,z))
psurf2(t, z) = proj(surf2(t,z))
bisect(f, a, b) = find_zero(f, (a,b), Bisection())
# create shape holding project disc
drawdiscF(t) = Shape([psurf(t, 2*i*pi/100) for i in 1:101])
drawdiscI(t) = Shape([psurf2(t, 2*i*pi/100) for i in 1:101])
α = 1.0
line_style = (; line=(:black, 1))
plot(; empty_style..., aspect_ratio=:equal)
# washer
t0 = sqrt(3/4)
Δ = .03
δ = 0.785398 + 0.05
x₀ = -.25
plot!(drawdiscF(x₀-Δ); fill=(:black,), line=(:black,1))
plot!(drawdiscF(x₀); fill=(:orange,), line=(:black,1))
plot!(drawdiscI(x₀); fill=(:white,1.0), line=(:black,1))
x₀ = 0.35
plot!(drawdiscF(x₀-Δ); fill=(:black,), line=(:black,1))
plot!(drawdiscF(x₀); fill=(:orange,), line=(:black,1))
plot!(drawdiscI(x₀); fill=(:white,1.0), line=(:black,1))
z0 = 3pi/2 - δ
plot!(pline(viewp, surf(t0, z0), surf(-t0, z0)); line=(:black, 1))
plot!(pline(viewp, surf(t0, z0+pi), surf(-t0, z0+pi)); line=(:black, 1))
# caps
curve = [psurf(t0, θ) for θ in range(0, 2pi, 100)]
plot!(curve, line=(:black, 2))
curve = [psurf(-t0, θ) for θ in range(0, 2pi, 100)]
plot!(curve, line=(:black, 2))
## folds
tθs = [(t, _fold(psurf, t, 0,pi)) for t in range(-t0, t0, 50)]
curve = [psurf(t, θ) for (t,θ) ∈ tθs]
plot!(curve, line=(:black, 3))
tθs = [(t, _fold(psurf, t, pi, 2pi)) for t in range(-t0, t0, 50)]
curve = [psurf(t, θ) for (t,θ) ∈ tθs]
plot!(curve, line=(:black, 3))
# Shade lines
δ = pi/6
Δₜ = (4pi/2 - (3pi/2 - δ))/(2*25)
for θ ∈ range(3pi/2-δ, 4pi/2, 25)
curve = [psurf(t, θ) for t in
range(-t0, max(-t0, -t0 + 1/2*sin(θ+δ+pi/2 + pi/2)), 20)]
plot!(curve, line=(:black, 1))
curve = [psurf(t, θ+Δₜ) for t in
range(-t0, max(-t0, -t0 + 1/3*sin(θ+δ+pi/2 + pi/2)), 20)]
plot!(curve, line=(:black, 1))
end
#=
f1 = [[t, _fold(psurf, t, 0, pi/2)] for t in range(-0.5, -0.1, 26)]
for f in f1
plot!([psurf( f[1], f[2]-k*0.01*(6-f[1]) )
for k in 1:21]; line=(:black, 1))
end
=#
current()
end
plt
```
Figure showing sphere with interior cylinder bored out.
:::
This cross-sectional figure is used to better understand the key dimensions.
::: {#fig-napkin-ring-2}
```{julia}
#| hold: true
#| echo: false
#The following illustrates $R=5$ and $h=8$.
#The following illustrates $R=1$ and $h=2sqrt(3/4$.
plt = let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
R =5; h1 = 2*4
R = 1; h = 2*sqrt(3/4)
theta = asin(h1/2/R)
thetas = range(-theta, stop=theta, length=100)
ts = range(-pi, stop=pi, length=100)
y = h1/4
θ = theta = asin(h/2/R)
thetas = range(-theta, stop=theta, length=100)
ts = range(-pi, stop=pi, length=100)
y = h/4
p = plot(legend=false, aspect_ratio=:equal);
plot!(p, R*cos.(ts), R*sin.(ts));
plot!(p, R*cos.(thetas), R*sin.(thetas), color=:orange);
plot(; empty_style..., aspect_ratio=:equal)
plot!(R*cos.(ts), R*sin.(ts); line=(:black,));
plot!(R*cos.(thetas), R*sin.(thetas), line=(:orange,1));
plot!(p, [R*cos.(theta), R*cos.(theta)], [h1/2, -h1/2], color=:orange);
plot!(p, [R*cos.(theta), sqrt(R^2 - y^2)], [y, y], color=:orange)
plot!([R*cos.(theta), R*cos.(theta)], [h/2, -h/2]; color=:orange);
plot!([R*cos.(theta), sqrt(R^2 - y^2)], [y, y]; line=(:orange,3))
plot!(p, [0, R*cos.(theta)], [0,0], color=:red);
plot!(p,[ 0, R*cos.(theta)], [0,h1/2], color=:red);
plot!([0, R*cos.(theta)], [0,0], color=:red);
plot!([ 0, R*cos.(theta)], [0,h/2], color=:red);
annotate!(p, [(.5, -2/3, "sqrt(R²- (h/2)²)"),
(R*cos.(theta)-.6, h1/4, "h/2"),
(1.5, 1.75*tan.(theta), "R")])
p
x₀ = sqrt(R^2 - (h/2)^2)
annotate!( [
(x₀/2, 0, text(L"\sqrt{R^2- (\frac{h}{2})^2}",10, :top)),
(x₀, h/4, text(L"\frac{h}{2}",:right)),
(R/2*cos(θ),R/2*sin(θ), text(L"R", :bottom; rotation=rad2deg(θ)))
])
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Side view illustrating key dimensions of napkin ring problem with $R$ being the radius of the sphere and $h$ being the height of the resulting interior cylinder.
:::
The small orange line is rotated, so using the washer method we get the cross sections given by $\pi(r_0^2 - r_i^2)$, the outer and inner radii, as a function of $y$.

View File

@@ -72,7 +72,7 @@ The definition says three things
* The value of the limit is the same as $f(c)$.
The defined speaks to continuity at a point, we can extend it to continuity over an interval $(a,b)$ by saying:
The definition speaks to continuity at a point, we can extend it to continuity over an interval $(a,b)$ by saying:
::: {.callout-note icon=false}
## Definition of continuity over an open interval
@@ -130,9 +130,9 @@ There are various reasons why a function may not be continuous.
$$
f(x) = \begin{cases}
-1 & x < 0 \\
0 & x = 0 \\
1 & x > 0
-1 &~ x < 0 \\
0 &~ x = 0 \\
1 &~ x > 0
\end{cases}
$$
@@ -148,25 +148,57 @@ is implemented by `Julia`'s `sign` function. It has a value at $0$, but no limit
plot([-1,-.01], [-1,-.01], legend=false, color=:black)
plot!([.01, 1], [.01, 1], color=:black)
scatter!([0], [1/2], markersize=5, markershape=:circle)
ts = range(0, 2pi, 100)
C = Shape(0.02 * sin.(ts), 0.03 * cos.(ts))
plot!(C, fill=(:white,1), line=(:black, 1))
```
is not continuous at $x=0$. It has a limit of $0$ at $0$, a function value $f(0) =1/2$, but the limit and the function value are not equal.
* The `floor` function, which rounds down to the nearest integer, is also not continuous at the integers, but is right continuous at the integers, as, for example, $\lim_{x \rightarrow 0+} f(x) = f(0)$. This graph emphasizes the right continuity by placing a point for the value of the function when there is a jump:
* The `floor` function, which rounds down to the nearest integer, is also not continuous at the integers, but is right continuous at the integers, as, for example, $\lim_{x \rightarrow 0+} f(x) = f(0)$. This graph emphasizes the right continuity by placing a filled marker for the value of the function when there is a jump and an open marker where the function is not that value.
```{julia}
#| hold: true
#| echo: false
x = [0,1]; y=[0,0]
plt = plot(x.-2, y.-2, color=:black, legend=false)
plot!(plt, x.-1, y.-1, color=:black)
plot!(plt, x.-0, y.-0, color=:black)
plot!(plt, x.+1, y.+1, color=:black)
plot!(plt, x.+2, y.+2, color=:black)
scatter!(plt, [-2,-1,0,1,2], [-2,-1,0,1,2], markersize=5, markershape=:circle)
plt = let
empty_style = (xticks=-4:4, yticks=-4:4,
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35), line=nothing)
range_style = (; fill=(:blue, 0.35), line=nothing)
ts = range(0, 2pi, 100)
xys = sincos.(ts)
xys = [.1 .* xy for xy in xys]
plot(; empty_style..., aspect_ratio=:equal)
plot!([-4.25,4.25], [0,0]; axis_style...)
plot!([0,0], [-4.25, 4.25]; axis_style...)
for k in -4:4
P,Q = (k,k),(k+1,k)
plot!([P,Q], line=(:black,1))
S = Shape([k .+ xy for xy in xys])
plot!(S; fill=(:black,))
S = Shape([(k+1,k) .+ xy for xy in xys])
plot!(S; fill=(:white,), line=(:black,1))
end
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
* The function $f(x) = 1/x^2$ is not continuous at $x=0$: $f(x)$ is not defined at $x=0$ and $f(x)$ has no limit at $x=0$ (in the usual sense).
@@ -176,8 +208,8 @@ plt
$$
f(x) =
\begin{cases}
0 & \text{if } x \text{ is irrational,}\\
1 & \text{if } x \text{ is rational.}
0 &~ \text{if } x \text{ is irrational,}\\
1 &~ \text{if } x \text{ is rational.}
\end{cases}
$$
@@ -192,8 +224,8 @@ Let a function be defined by cases:
$$
f(x) = \begin{cases}
3x^2 + c & x \geq 0,\\
2x-3 & x < 0.
3x^2 + c &~ x \geq 0,\\
2x-3 &~ x < 0.
\end{cases}
$$
@@ -383,8 +415,8 @@ Let $f(x)$ be defined by
$$
f(x) = \begin{cases}
c + \sin(2x - \pi/2) & x > 0\\
3x - 4 & x \leq 0.
c + \sin(2x - \pi/2) &~ x > 0\\
3x - 4 &~ x \leq 0.
\end{cases}
$$
@@ -423,12 +455,22 @@ Consider the function $f(x)$ given by the following graph
```{julia}
#| hold: true
#| echo: false
xs = range(0, stop=2, length=50)
plot(xs, [sqrt(1 - (x-1)^2) for x in xs], legend=false, xlims=(0,4))
plot!([2,3], [1,0])
scatter!([3],[0], markersize=5)
plot!([3,4],[1,0])
scatter!([4],[0], markersize=5)
let
xs = range(0, stop=2, length=50)
plot(xs, [sqrt(1 - (x-1)^2) for x in xs];
line=(:black,1),
legend=false, xlims=(-0.1,4.1))
plot!([2,3], [1,0]; line=(:black,1))
plot!([3,4],[1,0]; line=(:black,1))
scatter!([(0,0)], markersize=5, markercolor=:black)
scatter!([(2,0)], markersize=5, markercolor=:white)
scatter!([(2, 1)], markersize=5; markercolor=:black)
scatter!([(3,0)], markersize=5; markercolor=:black)
scatter!([(3,1)], markersize=5; markercolor=:white)
scatter!([(4,0)], markersize=5; markercolor=:black)
end
```
The function $f(x)$ is continuous at $x=1$?
@@ -513,3 +555,29 @@ choices = ["Can't tell",
answ = 1
radioq(choices, answ)
```
###### Question
A parametric equation is specified by a parameterization $(f(t), g(t)), a \leq t \leq b$. The parameterization will be continuous if and only if each function is continuous.
Suppose $k_x$ and $k_y$ are positive integers and $a, b$ are positive numbers, will the [Lissajous](https://en.wikipedia.org/wiki/Parametric_equation#Lissajous_Curve) curve given by $(a\cos(k_x t), b\sin(k_y t))$ be continuous?
```{julia}
#| hold: true
#| echo: false
yesnoq(true)
```
Here is a sample graph for $a=1, b=2, k_x=3, k_y=4$:
```{julia}
#| hold: true
a,b = 1, 2
k_x, k_y = 3, 4
plot(t -> a * cos(k_x *t), t-> b * sin(k_y * t), 0, 4pi)
```

View File

@@ -17,9 +17,9 @@ using SymPy
---
![Between points M and M lies an F](figures/ivt.jpg){width=40%}
![Between points M and M lies an F for a continuous curve. [L'Hospitals](https://ia801601.us.archive.org/26/items/infinimentpetits1716lhos00uoft/infinimentpetits1716lhos00uoft.pdf) figure 55.](figures/ivt.jpg){width=40%}
Continuity for functions is a valued property which carries implications. In this section we discuss two: the intermediate value theorem and the extreme value theorem. These two theorems speak to some fundamental applications of calculus: finding zeros of a function and finding extrema of a function. [L'Hospitals](https://ia801601.us.archive.org/26/items/infinimentpetits1716lhos00uoft/infinimentpetits1716lhos00uoft.pdf) figure 55, above, suggests why.
Continuity for functions is a valued property which carries implications. In this section we discuss two: the intermediate value theorem and the extreme value theorem. These two theorems speak to some fundamental applications of calculus: finding zeros of a function and finding extrema of a function.
## Intermediate Value Theorem
@@ -32,53 +32,79 @@ If $f$ is continuous on $[a,b]$ with, say, $f(a) < f(b)$, then for any $y$ with
:::
::: {#fig-IVT}
```{julia}
#| hold: true
#| echo: false
#| cache: true
### {{{IVT}}}
gr()
function IVT_graph(n)
f(x) = sin(pi*x) + 9x/10
a,b = [0,3]
let
gr()
# IVT
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35), line=nothing)
range_style = (; fill=(:blue, 0.35), line=nothing)
xs = range(a,stop=b, length=50)
f(x) = x + sinpi(3x) + 5sin(2x) + 3cospi(2x)
a, b = -1, 5
xs = range(a, b, 251)
ys = f.(xs)
y0, y1 = extrema(ys)
plot(; empty_style...)
plot!(f, a, b; fn_style...)
## cheat -- pick an x, then find a y
Δ = .2
x = range(a + Δ, stop=b - Δ, length=6)[n]
y = f(x)
plot!([a-.2, b + .2],[0,0]; axis_style...)
plot!([a-.1, a-.1], [y0-2, y1+2]; axis_style...)
plt = plot(f, a, b, legend=false, size=fig_size)
plot!(plt, [0,x,x], [f(x),f(x),0], color=:orange, linewidth=3)
plot!([(a,0),(a,f(a))]; line=(:black, 1, :dash))
plot!([(b,0),(b,f(b))]; line=(:black, 1, :dash))
plt
m = f(a/2 + b/2) + 1.5
plot!([a, b], [m,m]; line=(:black, 1, :dashdot))
δx = 0.03
plot!(Shape([a,b,b,a], 4*δx*[-1,-1,1,1]);
domain_style...)
plot!(Shape((a-.1) .+ 2δx * [-1,1,1,-1], [f(a),f(a),f(b), f(b)]);
range_style...)
plot!(Shape((a-.1) .+ 2δx/3 * [-1,1,1,-1], [y0,y0,y1,y1]);
range_style...)
zs = find_zeros(x -> f(x) - m, (a,b))
c = zs[2]
plot!([(c,0), (c,f(c))]; line=(:black, 1, :dashdot))
annotate!([
(a, 0, text(L"a", 12, :bottom)),
(b, 0, text(L"b", 12, :top)),
(c, 0, text(L"c", 12, :top)),
(a-.1, f(a), text(L"f(a)", 12, :right)),
(a-.1, f(b), text(L"f(b)", 12, :right)),
(a-0.2, m, text(L"y", 12, :right)),
])
end
n = 6
anim = @animate for i=1:n
IVT_graph(i)
end
imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)
caption = L"""
Illustration of intermediate value theorem. The theorem implies that any randomly chosen $y$
value between $f(a)$ and $f(b)$ will have at least one $x$ in $[a,b]$
with $f(x)=y$.
"""
plotly()
ImageFile(imgfile, caption)
```
In the early years of calculus, the intermediate value theorem was intricately connected with the definition of continuity, now it is a consequence.
```{julia}
#| echo: false
plotly()
nothing
```
Illustration of the intermediate value theorem. The theorem implies that any randomly chosen $y$ value between $f(a)$ and $f(b)$ will have at least one $c$ in $[a,b]$ with $f(c)=y$. This graphic shows one of several possible values for the given choice of $y$.
:::
In the early years of calculus, the intermediate value theorem was intricately connected with the definition of continuity, now it is an important consequence.
The basic proof starts with a set of points in $[a,b]$: $C = \{x \text{ in } [a,b] \text{ with } f(x) \leq y\}$. The set is not empty (as $a$ is in $C$) so it *must* have a largest value, call it $c$ (this might seem obvious, but it requires the completeness property of the real numbers). By continuity of $f$, it can be shown that $\lim_{x \rightarrow c-} f(x) = f(c) \leq y$ and $\lim_{y \rightarrow c+}f(x) =f(c) \geq y$, which forces $f(c) = y$.
@@ -90,18 +116,6 @@ The basic proof starts with a set of points in $[a,b]$: $C = \{x \text{ in } [a,
Suppose we have a continuous function $f(x)$ on $[a,b]$ with $f(a) < 0$ and $f(b) > 0$. Then as $f(a) < 0 < f(b)$, the intermediate value theorem guarantees the existence of a $c$ in $[a,b]$ with $f(c) = 0$. This was a special case of the intermediate value theorem proved by Bolzano first. Such $c$ are called *zeros* of the function $f$.
We use this fact when building a "sign chart" of a polynomial function. Between any two consecutive real zeros the polynomial can not change sign. (Why?) So a "test point" can be used to determine the sign of the function over an entire interval.
The `sign_chart` function from `CalculusWithJulia` uses this to indicate where an *assumed* continuous function changes sign:
```{julia}
f(x) = sin(x + x^2) + x/2
sign_chart(f, -3, 3)
```
The intermediate value theorem can find the sign of the function *between* adjacent zeros, but how are the zeros identified?
Here, we use the Bolzano theorem to give an algorithm - the *bisection method* - to locate a value $c$ in $[a,b]$ with $f(c) = 0$ under the assumptions:
* $f$ is continuous on $[a,b]$
@@ -111,7 +125,7 @@ Here, we use the Bolzano theorem to give an algorithm - the *bisection method* -
::: {.callout-note}
#### Between
The bisection method is used to find a zero, $c$, of $f(x)$ *between* two values, $a$ and $b$. The method is guaranteed to work under assumptions, the most important being the continuous function having different signs at $a$ and $b$.
The bisection method is used to find a zero, $c$, of $f(x)$ *between* two values, $a$ and $b$. The method is guaranteed to work under the assumption of a continuous function having different signs at $a$ and $b$.
:::
@@ -238,7 +252,7 @@ sin(c)
(Even `1pi` itself is not a "zero" due to floating point issues.)
### The `find_zero` function.
### The `find_zero` function to solve `f(x) = 0`
The `Roots` package has a function `find_zero` that implements the bisection method when called as `find_zero(f, (a,b))` where $[a,b]$ is a bracket. Its use is similar to `simple_bisection` above. This package is loaded when `CalculusWithJulia` is. We illlustrate the usage of `find_zero` in the following:
@@ -248,8 +262,8 @@ The `Roots` package has a function `find_zero` that implements the bisection met
xstar = find_zero(sin, (3, 4))
```
:::{.callout-warning}
## Warning
:::{.callout-note}
## Action template
Notice, the call `find_zero(sin, (3, 4))` again fits the template `action(function, args...)` that we see repeatedly. The `find_zero` function can also be called through `fzero`. The use of `(3, 4)` to specify the interval is not necessary. For example `[3,4]` would work equally as well. (Anything where `extrema` is defined works.)
:::
@@ -301,7 +315,7 @@ find_zero(q, (5, 10))
::: {.callout-note}
### Between need not be near
Later, we will see more efficient algorithms to find a zero *near* a given guess. The bisection method finds a zero *between* two values of a bracketing interval. This interval need not be small. Indeed in many cases it can be infinite. For this particular problem, any interval like `(2,N)` will work as long as `N` is bigger than the zero and small enough that `q(N)` is finite *or* infinite *but* not `NaN`. (Basically, `q` must evaluate to a number with a sign. Here, the value of `q(Inf)` is `NaN` as it evaluates to the indeterminate `Inf - Inf`. But `q` is still not `NaN` for quite large numbers, such as `1e77`, as `x^4` can as big as `1e308` -- technically `floatmax(Float64)` -- and be finite.)
Later, we will see more efficient algorithms to find a zero *near* a given guess. The bisection method finds a zero *between* two values of a bracketing interval. This interval need not be small. Indeed in many cases it can be infinite. For this particular problem, any interval like `(2,N)` will work as long as `N` is bigger than the zero and small enough that `q(N)` is finite *or* infinite *but* not `NaN`. (Basically, `q` must evaluate to a number with a sign. Here, the value of `q(Inf)` is `NaN` as it evaluates to the indeterminate `Inf - Inf`. But `q` is still not `NaN` for quite large numbers, such as `1e77`, as `x^4` can as big as `1e308`---technically `floatmax(Float64)`---and be finite.)
:::
@@ -329,6 +343,10 @@ It appears (and a plot over $[0,1]$ verifies) that there is one zero between $-2
find_zero(x^3 - x + 1, (-2, -1))
```
#### The `find_zero` function to solve `f(x) = c`
Solving `f(x) = c` is related to solving `h(x) = 0`. The key is to make a new function using the difference of the two sides: `h(x) = f(x) - c`.
##### Example
Solve for a value of $x$ where `erfc(x)` is equal to `0.5`.
@@ -348,6 +366,40 @@ find_zero(h, (-Inf, Inf)) # as wide as possible in this case
```
##### Example: Inverse functions
If $f(x)$ is *monotonic* and *continuous* over an interval $[a,b]$ then it has an *inverse function*. That is for any $y$ between $f(a)$ and $f(b)$ we can find an $x$ satisfying $y = f(x)$ with $a \leq x \leq b$. This is due, of course, to both the intermediate value theorem (which guarantees an $x$) and monotonicity (which guarantees just one $x$).
To see how we can *numerically* find an inverse function using `find_zero`, we have this function:
```{julia}
function inverse_function(f, a, b, args...; kwargs...)
fa, fb = f(a), f(b)
m, M = fa < fb ? (fa, fb) : (fb, fa)
y -> begin
@assert m ≤ y ≤ M
find_zero(x ->f(x) - y, (a,b), args...; kwargs...)
end
end
```
The check on `fa < fb` is due to the possibility that $f$ is increasing (in which case `fa < fb`) or decreasing (in which case `fa > fb`).
To see this used, we consider the monotonic function $f(x) = x - \sin(x)$ over $[0, 5\pi]$. To graph, we have:
```{julia}
f(x) = x - sin(x)
a, b = 0, 5pi
plot(inverse_function(f, a, b), f(a), f(b); aspect_ratio=:equal)
```
(We plot over the range $[f(a), f(b)]$ here, as we can guess $f(x)$ is *increasing*.)
#### The `find_zero` function to solve `f(x) = g(x)`
Solving `f(x) = g(x)` is related to solving `h(x) = 0`. The key is to make a new function using the difference of the two sides: `h(x) = f(x) - g(x)`.
##### Example
@@ -388,36 +440,6 @@ find_zero(cos(x) ~ x, (0, 2))
[![Intersection of two curves as illustrated by Canadian artist Kapwani Kiwanga.](figures/intersection-biennale.jpg)](https://www.gallery.ca/whats-on/touring-exhibitions-and-loans/around-the-world/canada-pavilion-at-the-venice-biennale/kapwani-kiwanga-trinket){width=40%}
##### Example: Inverse functions
If $f(x)$ is *monotonic* and *continuous* over an interval $[a,b]$ then it has an *inverse function*. That is for any $y$ between $f(a)$ and $f(b)$ we can find an $x$ satisfying $y = f(x)$ with $a \leq x \leq b$. This is due, of course, to both the intermediate value theorem (which guarantees an $x$) and monotonicity (which guarantees just one $x$).
To see how we can *numerically* find an inverse function using `find_zero`, we have this function:
```{julia}
function inverse_function(f, a, b, args...; kwargs...)
fa, fb = f(a), f(b)
m, M = fa < fb ? (fa, fb) : (fb, fa)
y -> begin
@assert m ≤ y ≤ M
find_zero(x ->f(x) - y, (a,b), args...; kwargs...)
end
end
```
The check on `fa < fb` is due to the possibility that $f$ is increasing (in which case `fa < fb`) or decreasing (in which case `fa > fb`).
To see this used, we consider the monotonic function $f(x) = x - \sin(x)$ over $[0, 5\pi]$. To graph, we have:
```{julia}
f(x) = x - sin(x)
a, b = 0, 5pi
plot(inverse_function(f, a, b), f(a), f(b); aspect_ratio=:equal)
```
(We plot over the range $[f(a), f(b)]$ here, as we can guess $f(x)$ is *increasing*.)
##### Example
@@ -612,7 +634,7 @@ Geometry will tell us that $\cos(x) - x/p$ for *one* $x$ in $[0, \pi/2]$ wheneve
#| hold: true
f(x, p=1) = cos(x) - x/p
I = (0, pi/2)
find_zero(f, I), find_zero(f, I, p=2)
find_zero(f, I), find_zero(f, I; p=2)
```
The second number is the solution when `p=2`.
@@ -685,7 +707,7 @@ f.(zs)
The `find_zero` function in the `Roots` package is an interface to one of several methods. For now we focus on the *bracketing* methods, later we will see others. Bracketing methods, among others, include `Roots.Bisection()`, the basic bisection method though with a different sense of "middle" than $(a+b)/2$ and used by default above; `Roots.A42()`, which will typically converge much faster than simple bisection; `Roots.Brent()` for the classic method of Brent, and `FalsePosition()` for a family of *regula falsi* methods. These can all be used by specifying the method in a call to `find_zero`.
Alternatively, `Roots` implements the `CommonSolve` interface popularized by its use in the `DifferentialEquations.jl` ecosystem, a wildly successful area for `Julia`. The basic setup involves two steps: setup a "problem;" solve the problem.
Alternatively, `Roots` implements the `CommonSolve` interface popularized by its use in the `DifferentialEquations.jl` ecosystem, a wildly successful area for `Julia`. The basic setup involves two steps: setup a "problem"; solve the problem.
To set up a problem we call `ZeroProblem` with the function and an initial interval, as in:
@@ -755,6 +777,74 @@ nothing
[![Elevation profile of the Hardrock 100 ultramarathon. Treating the elevation profile as a function, the absolute maximum is just about 14,000 feet and the absolute minimum about 7600 feet. These are of interest to the runner for different reasons. Also of interest would be each local maxima and local minima - the peaks and valleys of the graph - and the total elevation climbed - the latter so important/unforgettable its value makes it into the chart's title.
](figures/hardrock-100.jpeg)](https://hardrock100.com){width=50%}
This figure shows the two concepts as well.
::: {#fig-absolute-relative}
```{julia}
#| echo: false
plt = let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
p(x) = (x-1)*(x-2)*(x-3)*(x-4) + x/2 + 2
a, b = 0.25, 4.5
z₁, z₂, z₃ = zs = find_zeros(x -> ForwardDiff.derivative(p,x), (a, b))
a = -0.0
plot(; empty_style...)
plot!(p, a, b; line=(:black, 2))
plot!([a,b+0.25], [0,0]; axis_style...)
plot!([a,a] .+ .1, [-1, p(0)]; axis_style...)
δ = .5
ts = range(0, 2pi, 100)
for z in zs
plot!([z-δ,z+δ],[p(z),p(z)]; line=(:black, 1))
C = Shape(z .+ 0.03 * sin.(ts), p(z) .+ 0.3 * cos.(ts))
plot!(C; fill=(:periwinkle, 1), line=(:black, 1))
end
for z in (a,b)
C = Shape(z .+ 0.03 * sin.(ts), p(z) .+ 0.3 * cos.(ts))
plot!(C; fill=(:black, 1), line=(:black, 1))
end
κ = 0.33
annotate!([
(a, 0, text(L"a", :top)),
(b,0, text(L"b", :top)),
(a + κ/5, p(a), text(raw"absolute max", 10, :left)),
(z₁, p(z₁)-κ, text(raw"absolute min", 10, :top)),
(z₂, p(z₂) + κ, text(raw"relative max", 10, :bottom)),
(z₃, p(z₃) - κ, text(raw"relative min", 10, :top)),
(b, p(b) + κ, text(raw"endpoint", 10, :bottom))
])
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Figure illustrating absolute and relative minima for a function $f(x)$ over $I=[a,b]$. The leftmost point has a $y$ value, $f(a)$, which is an absolute maximum of $f(x)$ over $I$. The three points highlighted between $a$ and $b$ are all relative extrema. The first one is *also* the absolute minimum over $I$. The endpoint is not considered a relative maximum for technical reasons---there is no interval around $b$, it being on the boundary of $I$.
:::
The extreme value theorem discusses an assumption that ensures absolute maximum and absolute minimum values exist.
::: {.callout-note icon=false}
@@ -782,7 +872,7 @@ The function $f(x) = \sqrt{1-x^2}$ is continuous on the interval $[-1,1]$ (in th
##### Example
The function $f(x) = x \cdot e^{-x}$ on the closed interval $[0, 5]$ is continuous. Hence it has an absolute maximum, which a graph shows to be $0.4$. It has an absolute minimum, clearly the value $0$ occurring at the endpoint.
The function $f(x) = x \cdot e^{-x}$ on the closed interval $[0, 5]$ is continuous. Hence it has an absolute maximum, which a graph shows to be about $0.4$ and occurring near $x=1$. It has an absolute minimum, clearly the value $0$ occurring at the endpoint.
```{julia}
@@ -819,7 +909,7 @@ A New York Times [article](https://www.nytimes.com/2016/07/30/world/europe/norwa
## Continuity and closed and open sets
We comment on two implications of continuity that can be generalized to more general settings.
We comment on two implications of continuity that can be generalized.
The two intervals $(a,b)$ and $[a,b]$ differ as the latter includes the endpoints. The extreme value theorem shows this distinction can make a big difference in what can be said regarding *images* of such interval.
@@ -1145,7 +1235,7 @@ radioq(choices, answ, keep_order=true)
###### Question
The extreme value theorem has two assumptions: a continuous function and a *closed* interval. Which of the following examples fails to satisfy the consequence of the extreme value theorem because the function is not continuous?
The extreme value theorem has two assumptions: a continuous function and a *closed* interval. Which of the following examples fails to satisfy the consequence of the extreme value theorem because the function is defined on $I$ but is not continuous on $I$?
```{julia}
@@ -1160,6 +1250,170 @@ answ = 4
radioq(choices, answ, keep_order=true)
```
###### Question
The extreme value theorem is true when $f$ is a continuous function on an interval $I$ *and* $I=[a,b]$ is a *closed* interval. Which of these illustrates why it doesn't apply as $f$ is not continuous on $I$ but is defined on $I$?
```{julia}
#| hold: true
#| echo: false
let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
ts = range(0, 2pi, 100)
# defined on I; not continuous on I
p1 = plot(;empty_style..., aspect_ratio=:equal)
title!(p1, "(a)")
plot!(p1, x -> 1 - abs(2x), -1, 1, color=:black)
plot!(p1, zero; line=(:black, 1), arrow=true, side=:head)
C = Shape(0.03 .* sin.(ts), 1 .+ 0.03 .* cos.(ts))
plot!(p1, C, fill=(:white, 1), line=(:black,1))
C = Shape(0.03 .* sin.(ts), - 0.25 .+ 0.03 .* cos.(ts))
plot!(p1, C, fill=(:black,1))
annotate!(p1, [
(-1,0,text(L"a", :top)),
(1,0,text(L"b", :top))
])
# not defined on I
p2 = plot(;empty_style...)
title!(p2, "(b)")
plot!(p2, x -> 1/(1-x), 0, .95, color=:black)
plot!(p2, x-> -1/(1-x), 1.05, 2, color=:black)
plot!(p2, zero; axis_style...)
annotate!(p2,[
(0,0,text(L"a", :top)),
(2, 0, text(L"b", :top))
])
# not continuous on I
p3 = plot(;empty_style...)
title!(p3, "(c)")
plot!(p3, x -> 1/(1-x), 0, .95, color=:black)
ylims!((-0.25, 1/(1 - 0.96)))
plot!(p3, [0,1.05],[0,0]; axis_style...)
vline!(p3, [1]; line=(:black, 1, :dash))
annotate!(p3,[
(0,0,text(L"a", :top)),
(1, 0, text(L"b", :top))
])
# continuous
p4 = plot(;empty_style...)
title!(p4, "(d)")
f(x) = x^x
a, b = 0, 2
plot!(p4, f, a, b; line=(:black,1))
ylims!(p4, (-.25, f(b)))
plot!(p4, [a-.1, b+.1], [0,0]; axis_style...)
scatter!([0,2],[ f(0),f(2)]; marker=(:circle,:black))
annotate!([
(a, 0, text(L"a", :top)),
(b, 0, text(L"b", :top))
])
l = @layout[a b; c d]
p = plot(p1, p2, p3, p4, layout=l)
imgfile = tempname() * ".png"
savefig(p, imgfile)
hotspotq(imgfile, (0,1/2), (1/2,1))
end
```
The extreme value theorem is true when $f$ is a continuous function on an interval $I$ and $I=[a,b]$ is a *closed* interval. Which of these illustrates when the theorem's assumptions are true?
```{julia}
#| hold: true
#| echo: false
## come on; save this figure...
let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
ts = range(0, 2pi, 100)
# defined on I; not continuous on I
p1 = plot(;empty_style..., aspect_ratio=:equal)
title!(p1, "(a)")
plot!(p1, x -> 1 - abs(2x), -1, 1, color=:black)
plot!(p1, zero; line=(:black, 1), arrow=true, side=:head)
C = Shape(0.03 .* sin.(ts), 1 .+ 0.03 .* cos.(ts))
plot!(p1, C, fill=(:white, 1), line=(:black,1))
C = Shape(0.03 .* sin.(ts), - 0.25 .+ 0.03 .* cos.(ts))
plot!(p1, C, fill=(:black,1))
annotate!(p1, [
(-1,0,text(L"a", :top)),
(1,0,text(L"b", :top))
])
# not defined on I
p2 = plot(;empty_style...)
title!(p2, "(b)")
plot!(p2, x -> 1/(1-x), 0, .95, color=:black)
plot!(p2, x-> -1/(1-x), 1.05, 2, color=:black)
plot!(p2, zero; axis_style...)
annotate!(p2,[
(0,0,text(L"a", :top)),
(2, 0, text(L"b", :top))
])
# not continuous on I
p3 = plot(;empty_style...)
title!(p3, "(c)")
plot!(p3, x -> 1/(1-x), 0, .95, color=:black)
ylims!((-0.1, 1/(1 - 0.96)))
plot!(p3, [0,1.05],[0,0]; axis_style...)
vline!(p3, [1]; line=(:black, 1, :dash))
annotate!(p3,[
(0,0,text(L"a", :top)),
(1, 0, text(L"b", :top))
])
# continuous
p4 = plot(;empty_style...)
title!(p4, "(d)")
f(x) = x^x
a, b = 0, 2
ylims!(p4, (-.25, f(b)))
plot!(p4, f, a, b; line=(:black,1))
plot!(p4, [a-.1, b+.1], [0,0]; axis_style...)
scatter!([0,2],[ f(0),f(2)]; marker=(:circle,:black))
annotate!([
(a, 0, text(L"a", :top)),
(b, 0, text(L"b", :top))
])
l = @layout[a b; c d]
p = plot(p1, p2, p3, p4, layout=l)
imgfile = tempname() * ".png"
savefig(p, imgfile)
hotspotq(imgfile, (1/2,1), (0,1/2))
end
```
```{julia}
#| echo: false
plotly();
```
###### Question
@@ -1251,28 +1505,3 @@ The zeros of the equation $\cos(x) \cdot \cosh(x) = 1$ are related to vibrations
val = maximum(find_zeros(x -> cos(x) * cosh(x) - 1, (0, 6pi)))
numericq(val)
```
###### Question
A parametric equation is specified by a parameterization $(f(t), g(t)), a \leq t \leq b$. The parameterization will be continuous if and only if each function is continuous.
Suppose $k_x$ and $k_y$ are positive integers and $a, b$ are positive numbers, will the [Lissajous](https://en.wikipedia.org/wiki/Parametric_equation#Lissajous_Curve) curve given by $(a\cos(k_x t), b\sin(k_y t))$ be continuous?
```{julia}
#| hold: true
#| echo: false
yesnoq(true)
```
Here is a sample graph for $a=1, b=2, k_x=3, k_y=4$:
```{julia}
#| hold: true
a,b = 1, 2
k_x, k_y = 3, 4
plot(t -> a * cos(k_x *t), t-> b * sin(k_y * t), 0, 4pi)
```

View File

@@ -36,27 +36,38 @@ colors = [:black, :blue, :orange, :red, :green, :orange, :purple]
## Area of parabola
function make_triangle_graph(n)
title = "Area of parabolic cup ..."
n==1 && (title = "\${Area = }1/2\$")
n==2 && (title = "\${Area = previous }+ 1/8\$")
n==3 && (title = "\${Area = previous }+ 2\\cdot(1/8)^2\$")
n==4 && (title = "\${Area = previous }+ 4\\cdot(1/8)^3\$")
n==5 && (title = "\${Area = previous }+ 8\\cdot(1/8)^4\$")
n==6 && (title = "\${Area = previous }+ 16\\cdot(1/8)^5\$")
n==7 && (title = "\${Area = previous }+ 32\\cdot(1/8)^6\$")
n==1 && (title = L"Area $= 1/2$")
n==2 && (title = L"Area $=$ previous $+\; \frac{1}{8}$")
n==3 && (title = L"Area $=$ previous $+\; 2\cdot\frac{1}{8^2}$")
n==4 && (title = L"Area $=$ previous $+\; 4\cdot\frac{1}{8^3}$")
n==5 && (title = L"Area $=$ previous $+\; 8\cdot\frac{1}{8^4}$")
n==6 && (title = L"Area $=$ previous $+\; 16\cdot\frac{1}{8^5}$")
n==7 && (title = L"Area $=$ previous $+\; 32\cdot\frac{1}{8^6}$")
plt = plot(f, 0, 1, legend=false, size = fig_size, linewidth=2)
annotate!(plt, [(0.05, 0.9, text(title,:left))]) # if in title, it grows funny with gr
n >= 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1], color=colors[1], linetype=:polygon, fill=colors[1], alpha=.2)
n == 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1], color=colors[1], linewidth=2)
plt = plot(f, 0, 1;
legend=false,
size = fig_size,
linewidth=2)
annotate!(plt, [
(0.05, 0.9, text(title,:left))
]) # if in title, it grows funny with gr
n >= 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1];
color=colors[1], linetype=:polygon,
fill=colors[1], alpha=.2)
n == 1 && plot!(plt, [1,0,0,1, 0], [1,1,0,1,1];
color=colors[1], linewidth=2)
for k in 2:n
xs = range(0,stop=1, length=1+2^(k-1))
ys = map(f, xs)
k < n && plot!(plt, xs, ys, linetype=:polygon, fill=:black, alpha=.2)
xs = range(0, stop=1, length=1+2^(k-1))
ys = f.(xs)
k < n && plot!(plt, xs, ys;
linetype=:polygon, fill=:black, alpha=.2)
if k == n
plot!(plt, xs, ys, color=colors[k], linetype=:polygon, fill=:black, alpha=.2)
plot!(plt, xs, ys, color=:black, linewidth=2)
plot!(plt, xs, ys;
color=colors[k], linetype=:polygon, fill=:black, alpha=.2)
plot!(plt, xs, ys;
color=:black, linewidth=2)
end
end
plt
@@ -218,35 +229,64 @@ This bounds the expression $\sin(x)/x$ between $1$ and $\cos(x)$ and as $x$ gets
The above bound comes from this figure, for small $x > 0$:
::: {#fig-sin-cos-bound}
```{julia}
#| hold: true
#| echo: false
gr()
p = plot(x -> sqrt(1 - x^2), 0, 1, legend=false, aspect_ratio=:equal,
linewidth=3, color=:black)
θ = π/6
y,x = sincos(θ)
col=RGBA(0.0,0.0,1.0, 0.25)
plot!(range(0,x, length=2), zero, fillrange=u->y/x*u, color=col)
plot!(range(x, 1, length=50), zero, fillrange = u -> sqrt(1 - u^2), color=col)
plot!([x,x],[0,y], linestyle=:dash, linewidth=3, color=:black)
plot!([x,1],[y,0], linestyle=:dot, linewidth=3, color=:black)
plot!([1,1], [0,y/x], linewidth=3, color=:black)
plot!([0,1], [0,y/x], linewidth=3, color=:black)
plot!([0,1], [0,0], linewidth=3, color=:black)
Δ = 0.05
annotate!([(0,0+Δ,"A"), (x-Δ,y+Δ/4, "B"), (1+Δ/2,y/x, "C"),
(1+Δ/2,0+Δ/2,"D")])
annotate!([(.2*cos(θ/2), 0.2*sin(θ/2), "θ")])
imgfile = tempname() * ".png"
savefig(p, imgfile)
caption = "Triangle ``ABD`` has less area than the shaded wedge, which has less area than triangle ``ACD``. Their respective areas are ``(1/2)\\sin(\\theta)``, ``(1/2)\\theta``, and ``(1/2)\\tan(\\theta)``. The inequality used to show ``\\sin(x)/x`` is bounded below by ``\\cos(x)`` and above by ``1`` comes from a division by ``(1/2) \\sin(x)`` and taking reciprocals.
"
plotly()
ImageFile(imgfile, caption)
plt = let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35), line=nothing)
range_style = (; fill=(:blue, 0.35), line=nothing)
plot(; empty_style..., aspect_ratio=:equal)
plot!(x -> sqrt(1 - x^2), 0, 1; line=(:black, 2))
θ = π/6
y,x = sincos(θ)
col=RGBA(0.0,0.0,1.0, 0.25)
plot!(range(0,x, length=2), zero, fillrange=u->y/x*u, color=col)
plot!(range(x, 1, length=50), zero, fillrange = u -> sqrt(1 - u^2), color=col)
plot!([x,x],[0,y], line=(:dash, 2, :black))
plot!([x,1],[y,0], line=(:dot, 2, :black))
plot!([1,1], [0,y/x], line=(2, :black))
plot!([0,1], [0,y/x], line=(2, :black))
plot!([0,1], [0,0], line=(2, :black))
Δ = 0.05
annotate!([(0,0+Δ, text(L"A", 10)),
(x-Δ,y+Δ/4, text(L"B",10)),
(1+Δ/2,y/x, text(L"C", 10)),
(1+Δ/2,0+Δ/2, text(L"D", 10)),
(0.2*cos(θ/2), 0.2*sin(θ/2), text(L"\theta", 12))
])
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Triangle $\triangle ABD$ has less area than the shaded wedge, which has less area than triangle $\triangle ACD$. Their respective areas are $(1/2)\sin(\theta)$, $(1/2)\theta$, and $(1/2)\tan(\theta)$. The inequality used to show $\sin(x)/x$ is bounded below by $\cos(x)$ and above by $1$ comes from a division by $(1/2) \sin(x)$ and taking reciprocals.
:::
To discuss the case of $(1+x)^{1/x}$ it proved convenient to assume $x = 1/m$ for integer values of $m$. At the time of Cauchy, log tables were available to identify the approximate value of the limit. Cauchy computed the following value from logarithm tables:
@@ -649,7 +689,7 @@ c = 15/11
lim(h, c; n = 16)
```
(Though the graph and table do hint at something a bit odd -- the graph shows a blip, the table doesn't show values in the second column going towards a specific value.)
(Though the graph and table do hint at something a bit odd---the graph shows a blip, the table doesn't show values in the second column going towards a specific value.)
However the limit in this case is $-\infty$ (or DNE), as there is an aysmptote at $c=15/11$. The problem is the asymptote due to the logarithm is extremely narrow and happens between floating point values to the left and right of $15/11$.
@@ -1012,18 +1052,34 @@ $$
Why? We can express the function $e^{\csc(x)}/e^{\cot(x)}$ as the above function plus the polynomial $1 + x/2 + x^2/8$. The above is then the sum of two functions whose limits exist and are finite, hence, we can conclude that $M = 0 + 1$.
### The [squeeze](http://en.wikipedia.org/wiki/Squeeze_theorem) theorem
### The squeeze theorem
Sometimes limits can be found by bounding more complicated functions by easier functions.
::: {.callout-note icon=false}
## The [squeeze theorem](http://en.wikipedia.org/wiki/Squeeze_theorem)
Fix $c$ in $I=(a,b)$. Suppose for all $x$ in $I$, except possibly $c$, there are two functions $l$ and $u$, satisfying:
We note one more limit law. Suppose we wish to compute $\lim_{x \rightarrow c}f(x)$ and we have two other functions, $l$ and $u$, satisfying:
* $l(x) \leq f(x) \leq u(x)$.
* These limits exist and are equal:
$$
L = \lim_{x \rightarrow c} l(x) = \lim_{x \rightarrow c} u(x).
$$
* for all $x$ near $c$ (possibly not including $c$) $l(x) \leq f(x) \leq u(x)$.
* These limits exist and are equal: $L = \lim_{x \rightarrow c} l(x) = \lim_{x \rightarrow c} u(x)$.
Then
$$
\lim_{x\rightarrow c} f(x) = L.
$$
Then the limit of $f$ must also be $L$.
:::
The figure shows a usage of the squeeze theorem to show $\sin(x)/x \rightarrow 1$ as $\cos(x) \leq \sin(x)x \leq 1$ for $x$ close to $0$.
```{julia}
#| hold: true
@@ -1059,8 +1115,71 @@ ImageFile(imgfile, caption)
The formal definition of a limit involves clarifying what it means for $f(x)$ to be "close to $L$" when $x$ is "close to $c$". These are quantified by the inequalities $0 < \lvert x-c\rvert < \delta$ and the $\lvert f(x) - L\rvert < \epsilon$. The second does not have the restriction that it is greater than $0$, as indeed $f(x)$ can equal $L$. The order is important: it says for any idea of close for $f(x)$ to $L$, an idea of close must be found for $x$ to $c$.
The key is identifying a value for $\delta$ for a given value of $\epsilon$.
::: {#fig-limit-e-d}
```{julia}
#| echo: false
plt = let
gr()
f(x) = (x+1)^2 -1
a, b = -1/4, 2
c = 1
L = f(c)
δ = 0.2
ϵ = 3sqrt(δ)
plot(; empty_style...)#, aspect_ratio=:equal)
plot!(f, a, b; line=(:black, 2))
plot!([a,b],[0,0]; axis_style...)
plot!([0,0], [f(a), f(2)]; axis_style...)
plot!([c, c, 0], [0, f(c), f(c)]; line=(:black, 1, :dash))
plot!([c-δ, c-δ, 0], [0, f(c-δ), f(c-δ)]; line=(:black, 1, :dashdot))
plot!([c+δ, c+δ, 0], [0, f(c+δ), f(c+δ)]; line=(:black, 1, :dashdot))
S = Shape([0,b,b,0],[L-ϵ,L-ϵ,L+ϵ,L+ϵ])
plot!(S; fill=(:lightblue, 0.25), line=(nothing,))
domain_color=:red
range_color=:blue
S = Shape([c-δ, c+δ, c+δ, c-δ], 0.05*[-1,-1,1,1])
plot!(S, fill=(domain_color,0.4), line=nothing)
m,M = f(c-δ), f(c+δ)
T = Shape(0.015 * [-1,1,1,-1], [m,m,M,M])
plot!(T, fill=(range_color, 0.4), line=nothing)
C = Plots.scale(Shape(:circle), 0.02, 0.1)
plot!(Plots.translate(C, c, L), fill=(:white,1,0), line=(:black, 1))
plot!(Plots.translate(C, c, 0), fill=(:white,1,0), line=(domain_color, 1))
plot!(Plots.translate(C, 0, L), fill=(:white,1,0), line=(range_color, 1))
annotate!([
(c, 0, text(L"c", :top)),
(c-δ, 0, text(L"c - \delta", 10, :top)),
(c+δ, 0, text(L"c + \delta", 10, :top)),
(0, L, text(L"L", :right)),
(0, L+ϵ, text(L"L + \epsilon", 10, :right)),
(0, L-ϵ, text(L"L - \epsilon", 10, :right)),
])
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Figure illustrating requirements of $\epsilon-\delta$ definition of the limit. The image (shaded red on $y$ axis) of the $x$ within $\delta$ of $c$ (except for $c$ and shaded blue on the $x$ axis) must stay within the bounds of $L-\epsilon$ and $L+ \epsilon$, where $\delta$ may be chosen based on $\epsilon$ but needs to be chosen for every positive $\epsilon$, not just a fixed one as in this figure.
:::
A simple case is the linear case. Consider the function $f(x) = 3x + 2$. Verify that the limit at $c=1$ is $5$.
@@ -1088,64 +1207,78 @@ These lines produce a random $\epsilon$, the resulting $\delta$, and then verify
(The random numbers are technically in $[0,1)$, so in theory `epsilon` could be `0`. So the above approach would be more solid if some guard, such as `epsilon = max(eps(), rand())`, was used. As the formal definition is the domain of paper-and-pencil, we don't fuss.)
In this case, $\delta$ is easy to guess, as the function is linear and has slope $3$. This basically says the $y$ scale is 3 times the $x$ scale. For non-linear functions, finding $\delta$ for a given $\epsilon$ can be a challenge. For the function $f(x) = x^3$, illustrated below, a value of $\delta=\epsilon^{1/3}$ is used for $c=0$:
In this case, $\delta$ is easy to guess, as the function is linear and has slope $3$. This basically says the $y$ scale is 3 times the $x$ scale. For non-linear functions, finding $\delta$ for a given $\epsilon$ can be more of a challenge.
```{julia}
#| hold: true
#| echo: false
#| cache: true
## {{{ limit_e_d }}}
gr()
function make_limit_e_d(n)
f(x) = x^3
##### Example
xs = range(-.9, stop=.9, length=50)
ys = map(f, xs)
We show using the definition that for any fixed $a$ and $n$:
$$
\lim_{x \rightarrow a} x^n = a^n.
$$
This proof uses a bound based on properties of the absolute value.
plt = plot(f, -.9, .9, legend=false, size=fig_size)
if n == 0
nothing
else
k = div(n+1,2)
epsilon = 1/2^k
delta = cbrt(epsilon)
if isodd(n)
plot!(plt, xs, 0*xs .+ epsilon, color=:orange)
plot!(plt, xs, 0*xs .- epsilon, color=:orange)
else
plot!(delta * [-1, 1], epsilon * [ 1, 1], color=:orange)
plot!(delta * [ 1, -1], epsilon * [-1,-1], color=:orange)
plot!(delta * [-1, -1], epsilon * [-1, 1], color=:red)
plot!(delta * [ 1, 1], epsilon * [-1, 1], color=:red)
end
end
plt
end
We look at $f(x) - L = x^n - a^n = (x-a)(x^{n-1} + x^{n-2}a + \cdots + x^1a^{n-1} + a^n)$.
Taking absolute values gives an inequality by the triangle inequality:
n = 11
anim = @animate for i=1:n
make_limit_e_d(i-1)
end
$$
\lvert x^n - a^n\rvert \leq \lvert x-a\rvert\cdot
\left(
\lvert x^{n-1}\rvert +
\lvert x^{n-2}\rvert\lvert a\rvert +
\cdots +
\lvert x^1\rvert\lvert a^{n-1}\rvert +
\lvert a^n\rvert
\right).
$$
imgfile = tempname() * ".gif"
gif(anim, imgfile, fps = 1)
Now, for a given $\epsilon>0$ we seek a $\delta>0$ satisfying the properties of the limit definition for $f(x) = x^n$ and $L=a^n$. For now, assume $\delta < 1$. Then we can assume $\lvert x-a\rvert < \delta$ and
$$
\lvert x\rvert = \lvert x - a + a\rvert \leq \lvert x-a\rvert + \lvert a\rvert < 1 + \lvert a\rvert
$$
caption = L"""
This says then
Demonstration of $\epsilon$-$\delta$ proof of $\lim_{x \rightarrow 0}
x^3 = 0$. For any $\epsilon>0$ (the orange lines) there exists a
$\delta>0$ (the red lines of the box) for which the function $f(x)$
does not leave the top or bottom of the box (except possibly at the
edges). In this example $\delta^3=\epsilon$.
$$
\begin{align*}
\lvert x^n - a^n\rvert
&\leq
\lvert x-a\rvert\cdot \left(
\lvert x\rvert^{n-1} +
\lvert x\rvert^{n-2}\lvert a\rvert +
\cdots +
\lvert x\rvert^1\lvert a^{n-1}\rvert +
\lvert a^n\rvert
\right)\\
%%
&\leq \lvert x - a\rvert
\cdot \left(
(\lvert a\rvert+1)^{n-1} +
(\lvert a\rvert+1)^{n-2}\lvert a\rvert
+ \cdots +
(\lvert a\rvert+1)^1 \lvert a^{n-1}\rvert +
\lvert a^n \rvert
\right)\\
&\leq \lvert x-a\rvert \cdot C,
\end{align*}
$$
where $C$ is just some constant not depending on $x$, just $a$ and $n$.
Now if $\delta < 1$ and $\delta < \epsilon/C$ and if
$0 < \lvert x - a \rvert < \delta$ then
$$
\lvert f(x) - L \rvert =
\lvert x^n - a^n\rvert \leq \lvert x-a\rvert \cdot C \leq \delta\cdot C < \frac{\epsilon}{C} \cdot C = \epsilon.
$$
With this result, the rules of limits can immediately extend this to any polynomial, $p(x),$ it follows that $\lim_{x \rightarrow c} p(x) = p(a)$. (Because $c_n x^n \rightarrow c_n a^n$ and the sum of two functions with a limit has the limit of the sums.) Based on this, we will say later that any polynomial is *continuous* for all $x$.
"""
plotly()
ImageFile(imgfile, caption)
```
## Questions
@@ -1261,26 +1394,26 @@ let
title!(p1, "(a)")
plot!(p1, x -> x^2, 0, 2, color=:black)
plot!(p1, zero, linestyle=:dash)
annotate!(p1,[(1,0,"a")])
annotate!(p1,[(1,0,text(L"a",:top))])
p2 = plot(;axis=nothing, legend=false)
title!(p2, "(b)")
plot!(p2, x -> 1/(1-x), 0, .95, color=:black)
plot!(p2, x-> -1/(1-x), 1.05, 2, color=:black)
plot!(p2, zero, linestyle=:dash)
annotate!(p2,[(1,0,"a")])
annotate!(p2,[(1,0,text(L"a",:top))])
p3 = plot(;axis=nothing, legend=false)
title!(p3, "(c)")
plot!(p3, sinpi, 0, 2, color=:black)
plot!(p3, zero, linestyle=:dash)
annotate!(p3,[(1,0,"a")])
annotate!(p3,[(1,0,text("a",:top))])
p4 = plot(;axis=nothing, legend=false)
title!(p4, "(d)")
plot!(p4, x -> x^x, 0, 2, color=:black)
plot!(p4, zero, linestyle=:dash)
annotate!(p4,[(1,0,"a")])
annotate!(p4,[(1,0,text(L"a",:top))])
l = @layout[a b; c d]
p = plot(p1, p2, p3, p4, layout=l)
@@ -1491,8 +1624,8 @@ Take
$$
f(x) = \begin{cases}
0 & x \neq 0\\
1 & x = 0
0 &~ x \neq 0\\
1 &~ x = 0
\end{cases}
$$

View File

@@ -32,22 +32,24 @@ Let's begin with a function that is just problematic. Consider
$$
f(x) = \sin(1/x)
f(x) = \sin(\frac{1}{x})
$$
As this is a composition of nice functions it will have a limit everywhere except possibly when $x=0$, as then $1/x$ may not have a limit. So rather than talk about where it is nice, let's consider the question of whether a limit exists at $c=0$.
@fig-sin-1-over-x shows the issue:
A graph shows the issue:
:::{#fig-sin-1-over-x}
```{julia}
#| hold: true
#| echo: false
f(x) = sin(1/x)
plot(f, range(-1, stop=1, length=1000))
```
Graph of the function $f(x) = \sin(1/x)$ near $0$. It oscillates infinitely many times around $0$.
:::
The graph oscillates between $-1$ and $1$ infinitely many times on this interval - so many times, that no matter how close one zooms in, the graph on the screen will fail to capture them all. Graphically, there is no single value of $L$ that the function gets close to, as it varies between all the values in $[-1,1]$ as $x$ gets close to $0$. A simple proof that there is no limit, is to take any $\epsilon$ less than $1$, then with any $\delta > 0$, there are infinitely many $x$ values where $f(x)=1$ and infinitely many where $f(x) = -1$. That is, there is no $L$ with $|f(x) - L| < \epsilon$ when $\epsilon$ is less than $1$ for all $x$ near $0$.
@@ -65,11 +67,10 @@ The following figure illustrates:
```{julia}
#| hold: true
f(x) = x * sin(1/x)
plot(f, -1, 1)
plot!(abs)
plot!(x -> -abs(x))
plot(f, -1, 1; label="f")
plot!(abs; label="|.|")
plot!(x -> -abs(x); label="-|.|")
```
The [squeeze](http://en.wikipedia.org/wiki/Squeeze_theorem) theorem of calculus is the formal reason $f$ has a limit at $0$, as both the upper function, $|x|$, and the lower function, $-|x|$, have a limit of $0$ at $0$.
@@ -181,21 +182,38 @@ Consider this funny graph:
```{julia}
#| hold: true
#| echo: false
xs = range(0,stop=1, length=50)
plot(x->x^2, -2, -1, legend=false)
let
xs = range(0,stop=1, length=50)
plot(; legend=false, aspect_ratio=true,
xticks = -4:4)
plot!([(-4, -1.5),(-2,4)]; line=(:black,1))
plot!(x->x^2, -2, -1; line=(:black,1))
plot!(exp, -1,0)
plot!(x -> 1-2x, 0, 1)
plot!(sqrt, 1, 2)
plot!(x -> 1-x, 2,3)
S = Plots.scale(Shape(:circle), 0.05)
plot!(Plots.translate(S, -4, -1.5); fill=(:black,))
plot!(Plots.translate(S, -1, (-1)^2); fill=(:white,))
plot!(Plots.translate(S, -1, exp(-1)); fill=(:black,))
plot!(Plots.translate(S, 1, 1 - 2(1)); fill=(:black,))
plot!(Plots.translate(S, 1, sqrt(1)); fill=(:white,))
plot!(Plots.translate(S, 2, sqrt(2)); fill=(:white,))
plot!(Plots.translate(S, 2, 1 - (2)); fill=(:black,))
plot!(Plots.translate(S, 3, 1 - (3)); fill=(:black,))
end
```
Describe the limits at $-1$, $0$, and $1$.
* At $-1$ we see a jump, there is no limit but instead a left limit of 1 and a right limit appearing to be $1/2$.
* At $0$ we see a limit of $1$.
* Finally, at $1$ again there is a jump, so no limit. Instead the left limit is about $-1$ and the right limit $1$.
* At $-1$ we see a jump, there is no limit but instead a left limit of 1 and a right limit appearing to be $1/2$.
* At $0$ we see a limit of $1$.
* Finally, at $1$ again there is a jump, so no limit. Instead the left limit is about $-1$ and the right limit $1$.
## Limits at infinity

12
quarto/misc/Project.toml Normal file
View File

@@ -0,0 +1,12 @@
[deps]
CalculusWithJulia = "a2e0e22d-7d4c-5312-9169-8b992201a882"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
HCubature = "19dc6840-f33b-545b-b366-655c7e3ffd49"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuadGK = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
QuizQuestions = "612c44de-1021-4a21-84fb-7261cf5eb2d4"
SymPy = "24249f21-da20-56a4-8eb1-6a02cf4ae2e6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
TextWrap = "b718987f-49a8-5099-9789-dcd902bef87d"

View File

@@ -4,6 +4,7 @@ DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
IntervalArithmetic = "d1acc4aa-44c8-5952-acd4-ba5d80a2a253"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Latexify = "23fbe1c1-3f47-55db-b15f-69d7ec21a316"
Markdown = "d6f4376e-aef5-505a-96c1-9c027394607a"
Measures = "442fdcdd-2543-5da2-b0f3-8c86c306513e"
Mustache = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"

View File

@@ -22,7 +22,7 @@ The family of exponential functions is used to model growth and decay. The famil
## Exponential functions
The family of exponential functions is defined by $f(x) = a^x, -\infty< x < \infty$ and $a > 0$. For $0 < a < 1$ these functions decay or decrease, for $a > 1$ the functions grow or increase, and if $a=1$ the function is constantly $1$.
The family of exponential functions is defined by $f(x) = a^x, -\infty< x < \infty$ and $a > 0$. For $0 < a < 1$ these functions decay or decrease, for $a > 1$ these functions grow or increase, and if $a=1$ the function is constantly $1$.
For a given $a$, defining $a^n$ for positive integers is straightforward, as it means multiplying $n$ copies of $a.$ From this, for *integer powers*, the key properties of exponents: $a^x \cdot a^y = a^{x+y}$, and $(a^x)^y = a^{x \cdot y}$ are immediate consequences. For example with $x=3$ and $y=2$:
@@ -114,7 +114,7 @@ t2, t8 = 72/2, 72/8
exp(r2*t2), exp(r8*t8)
```
So fairly close - after $72/r$ years the amount is $2.05...$ times more than the initial amount.
So fairly close---after $72/r$ years the amount is $2.05...$ times more than the initial amount.
##### Example
@@ -259,7 +259,7 @@ The inverse function will solve for $x$ in the equation $a^x = y$. The answer, f
That is $a^{\log_a(x)} = x$ for $x > 0$ and $\log_a(a^x) = x$ for all $x$.
To see how a logarithm is mathematically defined will have to wait, though the family of functions - one for each $a>0$ - are implemented in `Julia` through the function `log(a,x)`. There are special cases requiring just one argument: `log(x)` will compute the natural log, base $e$ - the inverse of $f(x) = e^x$; `log2(x)` will compute the log base $2$ - the inverse of $f(x) = 2^x$; and `log10(x)` will compute the log base $10$ - the inverse of $f(x)=10^x$. (Also `log1p` computes an accurate value of $\log(1 + p)$ when $p \approx 0$.)
To see how a logarithm is mathematically defined will have to wait, though the family of functions---one for each $a>0$---are implemented in `Julia` through the function `log(a,x)`. There are special cases requiring just one argument: `log(x)` will compute the natural log, base $e$---the inverse of $f(x) = e^x$; `log2(x)` will compute the log base $2$---the inverse of $f(x) = 2^x$; and `log10(x)` will compute the log base $10$- the inverse of $f(x)=10^x$. (Also `log1p` computes an accurate value of $\log(1 + p)$ when $p \approx 0$.)
To see this in an example, we plot for base $2$ the exponential function $f(x)=2^x$, its inverse, and the logarithm function with base $2$:
@@ -398,7 +398,7 @@ $$
##### Example
Before the ubiquity of electronic calculating devices, the need to compute was still present. Ancient civilizations had abacuses to make addition easier. For multiplication and powers a [slide rule](https://en.wikipedia.org/wiki/Slide_rule) could be used. It is easy to represent addition physically with two straight pieces of wood - just represent a number with a distance and align the two pieces so that the distances are sequentially arranged. To multiply then was as easy: represent the logarithm of a number with a distance then add the logarithms. The sum of the logarithms is the logarithm of the *product* of the original two values. Converting back to a number answers the question. The conversion back and forth is done by simply labeling the wood using a logartithmic scale. The slide rule was [invented](http://tinyurl.com/qytxo3e) soon after Napier's initial publication on the logarithm in 1614.
Before the ubiquity of electronic calculating devices, the need to compute was still present. Ancient civilizations had abacuses to make addition easier. For multiplication and powers a [slide rule](https://en.wikipedia.org/wiki/Slide_rule) could be used. It is easy to represent addition physically with two straight pieces of wood---just represent a number with a distance and align the two pieces so that the distances are sequentially arranged. To multiply then was as easy: represent the logarithm of a number with a distance then add the logarithms. The sum of the logarithms is the logarithm of the *product* of the original two values. Converting back to a number answers the question. The conversion back and forth is done by simply labeling the wood using a logartithmic scale. The slide rule was [invented](http://tinyurl.com/qytxo3e) soon after Napier's initial publication on the logarithm in 1614.
##### Example

View File

@@ -42,6 +42,105 @@ For these examples, the domain of both $f(x)$ and $g(x)$ is all real values of $
In general the range is harder to identify than the domain, and this is the case for these functions too. For $f(x)$ we may know the $\cos$ function is trapped in $[-1,1]$ and it is intuitively clear than all values in that set are possible. The function $h(x)$ would have range $[0,\infty)$. The $s(x)$ function is either $-1$ or $1$, so only has two possible values in its range. What about $g(x)$? It is a parabola that opens upward, so any $y$ values below the $y$ value of its vertex will not appear in the range. In this case, the symmetry indicates that the vertex will be at $(1/2, -1/4)$, so the range is $[-1/4, \infty)$.
::: {#fig-domain-range-1}
```{julia}
#| echo: false
plt = let
gr()
# domain/range shade
λ = 1.2
a, b = .1, 3
f(x) = (x-1/2) + sin((x-1)^2)
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35))
range_style = (; fill=(:blue, 0.35))
xs = range(a,b, 1000)
y0,y1 = extrema(f.(xs))
Δy = (y1-y0)/60
Δx = (b - a)/75
plot(; empty_style...) # aspect_ratio=:equal,
plot!([-.25,3.25],[0,0]; axis_style...)
plot!([0,0], [min(-2Δy, y0 - Δy), y1 + 4Δy]; axis_style... )
plot!(f, a, b; fn_style...)
plot!(Shape([a,b,b,a], Δy * [-1,-1,1,1] ); domain_style...)
plot!(Shape(Δx*[-1,1,1,-1], [y0,y0,y1,y1]); range_style...)
plot!([a,a], [0, f(a)]; mark_style...)
plot!([b,b], [0, f(b)]; mark_style...)
plot!([a, b], [f(a), f(a)]; mark_style...)
plot!([a, b], [f(b), f(b)]; mark_style...)
end
plt
```
Figure of the plot of a function over an interval $[a,b]$ highlighting the domain and the range.
:::
::: {#fig-domain-range-2}
```{julia}
#| echo: false
plt = let
a, b = 0, 2pi
λ = 1.1
Δx, Δy = .033, .1
δx = 0.05
f(x) = sec(x)
g(x) = abs(f(x)) < 5 ? f(x) : NaN
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35))
range_style = (; fill=(:blue, 0.35))
plot(; empty_style...)
plot!(g, a+0.1, b; fn_style...)
plot!(λ*[a,b],[0,0]; axis_style...)
plot!([0,0], λ*[-5,5+1/2]; axis_style...)
vline!([pi/2, 3pi/2]; line=(:gray, :dash))
plot!(Shape([a,a+pi/2-δx,a+pi/2-δx,a], Δy*[-1,-1,1,1]); domain_style...)
plot!(Shape([a+pi/2+δx, a+pi/2+pi-δx,a+pi/2+pi-δx,a+pi/2+δx],
Δy*[-1,-1,1,1]); domain_style...)
plot!(Shape([3pi/2 + δx, 2pi, 2pi, 3pi/2+δx],
Δy*[-1,-1,1,1]); domain_style...)
plot!(Shape(Δx*[-1,1,1,-1], [-5, -5,-1,-1]); range_style...)
plot!(Shape(Δx*[-1,1,1,-1], [ 5, 5,1,1]); range_style...)
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
This figure shows that the domain of a function may be a collection of intervals. (In this case the $\sec$ function is not defined at $\pi/2 + k \pi$ for integer $k$) and the range may be a collection of intervals. (In this case, the $\sec$ function never has a value in $(-1,1)$.
:::
:::{.callout-note}
## Note
@@ -81,7 +180,7 @@ For typical cases like the three above, there isn't really much new to learn.
:::{.callout-note}
## Note
## The equals sign is used differently between math and Julia
The equals sign in `Julia` always indicates either an assignment or a mutation of the object on the left side. The definition of a function above is an *assignment*, in that a function is added (or modified) in a table holding the methods associated with the function's name.
The equals sign restricts the expressions available on the *left*-hand side to a) a variable name, for assignment; b) mutating an object at an index, as in `xs[1]`; c) mutating a property of a struct; or d) a function assignment following this form `function_name(args...)`.
@@ -90,7 +189,7 @@ Whereas function definitions and usage in `Julia` mirrors standard math notation
:::
### The domain of a function
### The domain of a function in Julia
Functions in `Julia` have an implicit domain, just as they do mathematically. In the case of $f(x)$ and $g(x)$, the right-hand side is defined for all real values of $x$, so the domain is all $x$. For $h(x)$ this isn't the case, of course. Trying to call $h(x)$ when $x < 0$ will give an error:
@@ -121,7 +220,10 @@ $$
f(x) = 5/9 \cdot (x - 32)
$$
In fact, the graph of a function $f(x)$ is simply defined as the graph of the equation $y=f(x)$. There is a distinction in `Julia` as a command such as
In fact, the graph of a function $f(x)$ is simply defined as the graph of the equation $y=f(x)$.
There **is** a distinction in `Julia`. The last command here
```{julia}
@@ -138,16 +240,68 @@ f(x) = 5/9 * (x - 32)
f(72) ## room temperature
```
will create a function object with a value of `x` determined at a later time - the time the function is called. So the value of `x` defined when the function is created is not important here (as the value of `x` used by `f` is passed in as an argument).
will create a function object with a value of `x` determined at a later time---the time the function is called. So the value of `x` defined when the function is created is not important here (as the value of `x` used by `f` is passed in as an argument).
Within `Julia`, we make note of the distinction between a function object versus a function call. In the definition `f(x)=cos(x)`, the variable `f` refers to a function object, whereas the expression `f(pi)` is a function call. This mirrors the math notation where an $f$ is used when properties of a function are being emphasized (such as $f \circ g$ for composition) and $f(x)$ is used when the values related to the function are being emphasized (such as saying "the plot of the equation $y=f(x)$).
Within `Julia`, we make note of the distinction between a function object versus a function call. In the definition `f(x)=cos(x)`, the variable `f` refers to a function object, whereas the expression `f(pi)` is a function call, resulting in a value. This mirrors the math notation where an $f$ is used when properties of a function are being emphasized (such as $f \circ g$ for composition) and $f(x)$ is used when the values related to the function are being emphasized (such as saying "the plot of the equation $y=f(x)$).
Distinguishing these three related but different concepts -- equations and expressions, function objects, and function calls -- is important when modeling mathematics on the computer.
Distinguishing these related but different concepts---expressions, equations, values from function calls, and function objects---is important when modeling mathematics on the computer.
::: {#fig-kidney}
```{julia}
#| echo: false
plt = let
gr()
# two kidneys and an arrow
ts = range(0, 2pi, 200)
a = 1
x(t) = 6a*cos(t) + sin(t) - 4a*cos(t)^5
y(t) = 4a*sin(t)^3
S = Shape(x.(ts), y.(ts))
y1(t) = 6a*cos(t) -3sin(t) - 4a*cos(t)^5
x1(t) = 4a*sin(t)^3
T = Shape(x1.(ts), y1.(ts))
T = Plots.translate(T, 10, 0)
plot(; empty_style...)
plot!(S, fill=(:gray, 0.2), line=(:black, 1))
plot!(T, fill=(:gray, 0.2), line=(:black, 1))
P = (0,0)
Q = (10, 0)
scatter!([P,Q], marker=(:circle,))
ts = reverse(range(pi/4+.1, 3pi/4-.1, 100))
plot!(5 .+ 5*sqrt(2)*cos.(ts), -5 .+ 5*sqrt(2)*sin.(ts);
line=(:black, 1), arrow=true, side=:head)
### Cases
annotate!([
(P..., text(L"x", :top)),
(Q..., text(L"f(x)", :top)),
(5, 3, text(L"f", :top)),
(0, -6, text("Domain")),
(10, -6, text("Range"))
])
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Common illustration of an abstract function mapping a value $x$ in the domain to a value $y=f(x)$ in the range. In `Julia`, the values are named, as with `x`, or computed, as with `f(x)`. The function *object* `f` is like the named arrow, which is the name assigned to a particular mapping.
:::
### Cases, the ternary operator
The definition of $s(x)$ above has two cases:
@@ -161,7 +315,7 @@ s(x) =
\end{cases}
$$
We learn to read this as: when $x$ is less than $0$, then the answer is $-1$. If $x$ is greater than $0$ the answer is $1.$ Often - but not in this example - there is an "otherwise" case to catch those values of $x$ that are not explicitly mentioned. As there is no such "otherwise" case here, we can see that this function has no definition when $x=0$. This function is often called the "sign" function and is also defined by $\lvert x\rvert/x$. (`Julia`'s `sign` function actually defines `sign(0)` to be `0`.)
We learn to read this as: when $x$ is less than $0$, then the answer is $-1$. If $x$ is greater than $0$ the answer is $1.$ Often---but not in this example---there is an "otherwise" case to catch those values of $x$ that are not explicitly mentioned. As there is no such "otherwise" case here, we can see that this function has no definition when $x=0$. This function is often called the "sign" function and is also defined by $\lvert x\rvert/x$. (`Julia`'s `sign` function defines `sign(0)` to be `0`.)
How do we create conditional statements in `Julia`? Programming languages generally have "if-then-else" constructs to handle conditional evaluation. In `Julia`, the following code will handle the above condition:
@@ -203,7 +357,7 @@ For example, here is one way to define an absolute value function:
abs_val(x) = x >= 0 ? x : -x
```
The condition is `x >= 0` - or is `x` non-negative? If so, the value `x` is used, otherwise `-x` is used.
The condition is `x >= 0`---or is `x` non-negative? If so, the value `x` is used, otherwise `-x` is used.
Here is a means to implement a function which takes the larger of `x` or `10`:
@@ -216,16 +370,16 @@ bigger_10(x) = x > 10 ? x : 10.0
(This could also utilize the `max` function: `f(x) = max(x, 10.0)`.)
Or similarly, a function to represent a cell phone plan where the first $500$ minutes are $20$ dollars and every additional minute is $5$ cents:
Or similarly, a function to represent a cell phone plan where the first $5$ Gb of data are $11$ dollars and every additional GB is $3.50$:
```{julia}
cellplan(x) = x < 500 ? 20.0 : 20.0 + 0.05 * (x-500)
cellplan(x) = x <= 5 ? 11.0 : 11.0 + 3.50 * (x-5)
```
:::{.callout-warning}
## Warning
Type stability. These last two definitions used `10.0` and `20.0` instead of the integers `10` and `20` for the answer. Why the extra typing? When `Julia` can predict the type of the output from the type of inputs, it can be more efficient. So when possible, we help out and ensure the output is always the same type.
Type stability. These last two definitions used `10.0` and `11.0` instead of the integers `10` and `11` for the answer. Why the extra typing? When `Julia` can predict the type of the output from the type of inputs, it can be more efficient. So when possible, we help out and ensure the output is always the same type.
:::
@@ -296,7 +450,7 @@ $$
f(x) = \left(\frac{g}{k v_0\cos(\theta)} + \tan(\theta) \right) x + \frac{g}{k^2}\ln\left(1 - \frac{k}{v_0\cos(\theta)} x \right)
$$
Here $g$ is the gravitational constant $9.8$ and $v_0$, $\theta$ and $k$ parameters, which we take to be $200$, $45$ degrees and $1/2$ respectively. With these values, the above function can be computed when $x=100$ with:
Here $g$ is the gravitational constant $9.8$ and $v_0$, $\theta$, and $k$ parameters, which we take to be $200$, $45$ degrees and $1/2$ respectively. With these values, the above function can be computed when $x=100$ with:
```{julia}
@@ -347,10 +501,27 @@ In our example, we see that in trying to find an answer to $f(x) = 0$ ( $\sqrt{2
```{julia}
#| echo: false
plot(q, a, b, linewidth=5, legend=false)
plot!(zero, a, b)
plot!([a, b], q.([a, b]))
scatter!([c], [q(c)])
plt = let
gr()
plt = plot(q, a, b, linewidth=5, legend=false)
plot!(plt, zero, a, b)
plot!(plt, [a, b], q.([a, b]))
scatter!(plt, [c], [q(c)]; marker=(:circle,))
scatter!(plt, [a,b], [q(a), q(b)]; marker=(:square,))
annotate!(plt, [
(a, 0, text(L"a", 10,:top)),
(b, 0, text(L"b", 10, :top)),
(c, 0, text(L"c", 10, :bottom))
])
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
Still, `q(c)` is not really close to $0$:
@@ -398,6 +569,7 @@ c = secant_intersection(f, a, b)
p = plot(f, a, b, linewidth=5, legend=false)
plot!(p, zero, a, b)
scatter!([a,b], [f(a), f(b)]; marker=(:square,))
plot!(p, [a,b], f.([a,b]));
scatter!(p, [c], [f(c)])
@@ -428,7 +600,12 @@ $$
f(x) = m \cdot x + b, \quad g(x) = y_0 + m \cdot (x - x_0).
$$
Both functions use the variable $x$, but there is no confusion, as we learn that this is just a dummy variable to be substituted for and so could have any name. Both also share a variable $m$ for a slope. Where does that value come from? In practice, there is a context that gives an answer. Despite the same name, there is no expectation that the slope will be the same for each function if the context is different. So when parameters are involved, a function involves a rule and a context to give specific values to the parameters. Euler had said initially that functions composed of "the variable quantity and numbers or constant quantities." The term "variable," we still use, but instead of "constant quantities," we use the name "parameters."
Both functions use the variable $x$, but there is no confusion, as we
learn that this is just a dummy variable to be substituted for and so
could have any name. Both also share a variable $m$ for a slope.
Where does the value for $m$ come from?
In practice, there is a context that gives an answer. Despite the same name, there is no expectation that the slope will be the same for each function if the context is different. So when parameters are involved, a function involves a rule and a context to give specific values to the parameters. Euler had said initially that functions composed of "the variable quantity and numbers or constant quantities." The term "variable," we still use, but instead of "constant quantities," we use the name "parameters." In computer language, instead of context, we use the word *scope*.
Something similar is also true with `Julia`. Consider the example of writing a function to model a linear equation with slope $m=2$ and $y$-intercept $3$. A typical means to do this would be to define constants, and then use the familiar formula:
@@ -467,7 +644,11 @@ So the `b` is found from the currently stored value. This fact can be exploited.
How `Julia` resolves what a variable refers to is described in detail in the manual page [Scope of Variables](https://docs.julialang.org/en/v1/manual/variables-and-scoping/). In this case, the function definition finds variables in the context of where the function was defined, the main workspace, and not where it is called. As seen, this context can be modified after the function definition and prior to the function call. It is only when `b` is needed, that the context is consulted, so the most recent binding is retrieved. Contexts allow the user to repurpose variable names without there being name collision. For example, we typically use `x` as a function argument, and different contexts allow this `x` to refer to different values.
Mostly this works as expected, but at times it can be complicated to reason about. In our example, definitions of the parameters can be forgotten, or the same variable name may have been used for some other purpose. The potential issue is with the parameters, the value for `x` is straightforward, as it is passed into the function. However, we can also pass the parameters, such as $m$ and $b$, as arguments. For parameters, one suggestion is to use [keyword](https://docs.julialang.org/en/v1/manual/functions/#Keyword-Arguments) arguments. These allow the specification of parameters, but also give a default value. This can make usage explicit, yet still convenient. For example, here is an alternate way of defining a line with parameters `m` and `b`:
Mostly this works as expected, but at times it can be complicated to reason about. In our example, definitions of the parameters can be forgotten, or the same variable name may have been used for some other purpose. The potential issue is with the parameters, the value for `x` is straightforward, as it is passed into the function.
However, we can also pass the parameters, such as $m$ and $b$, as arguments. There are different styles employed.
For parameters, one suggestion is to use [keyword](https://docs.julialang.org/en/v1/manual/functions/#Keyword-Arguments) arguments. These allow the specification of parameters, but also give a default value. This can make usage explicit, yet still convenient. For example, here is an alternate way of defining a line with parameters `m` and `b`:
```{julia}
@@ -491,7 +672,7 @@ During this call, values for `m` and `b` are found from how the function is call
mxplusb(0; m=3, b=2)
```
Keywords are used to mark the parameters whose values are to be changed from the default. Though one can use *positional arguments* for parameters - and there are good reasons to do so - using keyword arguments is a good practice if performance isn't paramount, as their usage is more explicit yet the defaults mean that a minimum amount of typing needs to be done.
Keywords are used to mark the parameters whose values are to be changed from the default. Though one can use *positional arguments* for parameters---and there are good reasons to do so---using keyword arguments is a good practice if performance isn't paramount, as their usage is more explicit yet the defaults mean that a minimum amount of typing needs to be done.
Keyword arguments are widely used with plotting commands, as there are numerous options to adjust, but typically only a handful adjusted per call. The `Plots` package whose commands we illustrate throughout these notes starting with the next section has this in its docs: `Plots.jl` follows two simple rules with data and attributes:
@@ -527,14 +708,14 @@ For example, here we use a *named tuple* to pass parameters to `f`:
```{julia}
#| hold: true
function trajectory(x ,p)
function trajectory(x, p)
g, v0, theta, k = p.g, p.v0, p.theta, p.k # unpack parameters
a = v0 * cos(theta)
(g/(k*a) + tan(theta))* x + (g/k^2) * log(1 - k/a*x)
end
p = (g=9.8, v0=200, theta = 45*pi/180, k=1/2)
p = (g=9.8, v0=200, theta=45*pi/180, k=1/2)
trajectory(100, p)
```
@@ -545,7 +726,7 @@ The style isn't so different from using keyword arguments, save the extra step o
v0, theta
```
The *big* advantage of bundling parameters into a container is consistency the function is always called in an identical manner regardless of the number of parameters (or variables).
The *big* advantage of bundling parameters into a container is consistency--the function is always called in an identical manner regardless of the number of parameters (or variables).
::: {.callout-note}
@@ -568,13 +749,13 @@ Volume(r, h) = pi * r^2 * h # of a cylinder
SurfaceArea(r, h) = pi * r * (r + sqrt(h^2 + r^2)) # of a right circular cone, including the base
```
The right-hand sides may or may not be familiar, but it should be reasonable to believe that if push came to shove, the formulas could be looked up. However, the left-hand sides are subtly different - they have two arguments, not one. In `Julia` it is trivial to define functions with multiple arguments - we just did.
The right-hand sides may or may not be familiar, but it should be reasonable to believe that if push came to shove, the formulas could be looked up. However, the left-hand sides are subtly different---they have two arguments, not one. In `Julia` it is trivial to define functions with multiple arguments---we just did.
Earlier we saw the `log` function can use a second argument to express the base. This function is basically defined by `log(b,x)=log(x)/log(b)`. The `log(x)` value is the natural log, and this definition just uses the change-of-base formula for logarithms.
But not so fast, on the left side is a function with two arguments and on the right side the functions have one argument - yet they share the same name. How does `Julia` know which to use? `Julia` uses the number, order, and *type* of the positional arguments passed to a function to determine which function definition to use. This is technically known as [multiple dispatch](http://en.wikipedia.org/wiki/Multiple_dispatch) or **polymorphism**. As a feature of the language, it can be used to greatly simplify the number of functions the user must learn. The basic idea is that many functions are "generic" in that they have methods which will work differently in different scenarios.
But not so fast, on the left side is a function with two arguments and on the right side the functions have one argument---yet they share the same name. How does `Julia` know which to use? `Julia` uses the number, order, and *type* of the positional arguments passed to a function to determine which function definition to use. This is technically known as [multiple dispatch](http://en.wikipedia.org/wiki/Multiple_dispatch) or **polymorphism**. As a feature of the language, it can be used to greatly simplify the number of functions the user must learn. The basic idea is that many functions are "generic" in that they have methods which will work differently in different scenarios.
:::{.callout-warning}
@@ -583,7 +764,7 @@ Multiple dispatch is very common in mathematics. For example, we learn different
:::
`Julia` is similarly structured. `Julia` terminology would be to call the operation "`+`" a *generic function* and the different implementations *methods* of "`+`". This allows the user to just need to know a smaller collection of generic concepts yet still have the power of detail-specific implementations. To see how many different methods are defined in the base `Julia` language for the `+` operator, we can use the command `methods(+)`. As there are so many (well over $200$ when `Julia` is started), we illustrate how many different logarithm methods are implemented for "numbers:"
`Julia` is similarly structured. `Julia` terminology would be to call the operation "`+`" a *generic function* and the different implementations *methods* of "`+`". This allows the user to just need to know a smaller collection of generic concepts yet still have the power of detail-specific implementations. To see how many different methods are defined in the base `Julia` language for the `+` operator, we can use the command `methods(+)`. As there are so many (well over $100$ when `Julia` is started), we illustrate how many different logarithm methods are implemented for "numbers:"
```{julia}
@@ -604,7 +785,7 @@ twotox(x::Real) = (2.0)^x
twotox(x::Complex) = (2.0 + 0.0im)^x
```
This is for illustration purposes -- the latter two are actually already done through `Julia`'s *promotion* mechanism -- but we see that `twotox` will return a rational number when `x` is an integer unlike `Julia` which, when `x` is non-negative will return an integer and will otherwise will error or return a float (when `x` is a numeric literal, like `2^(-3)`).
This is for illustration purposes---the latter two are actually already done through `Julia`'s *promotion* mechanism---but we see that `twotox` will return a rational number when `x` is an integer unlike `Julia` which, when `x` is non-negative will return an integer and will otherwise will error or return a float (when `x` is a numeric literal, like `2^(-3)`).
The key to reading the above is the type annotation acts like a gatekeeper allowing in only variables of that type or a subtype of that type.
@@ -630,7 +811,7 @@ Representing the area of a rectangle in terms of two variables is easy, as the f
Area(w, h) = w * h
```
But the other fact about this problem - that the perimeter is $20$ - means that height depends on width. For this question, we can see that $P=2w + 2h$ so that - as a function - `height` depends on `w` as follows:
But the other fact about this problem---that the perimeter is $20$---means that height depends on width. For this question, we can see that $P=2w + 2h$ so that---as a function---`height` depends on `w` as follows:
```{julia}
@@ -677,7 +858,7 @@ $$
g(x) = f(x-c)
$$
has an interpretation - the graph of $g$ will be the same as the graph of $f$ shifted to the right by $c$ units. That is $g$ is a transformation of $f$. From one perspective, the act of replacing $x$ with $x-c$ transforms a function into a new function. Mathematically, when we focus on transforming functions, the word [operator](http://en.wikipedia.org/wiki/Operator_%28mathematics%29) is sometimes used. This concept of transforming a function can be viewed as a certain type of function, in an abstract enough way. The relation would be to just pair off the functions $(f,g)$ where $g(x) = f(x-c)$.
has an interpretation---the graph of $g$ will be the same as the graph of $f$ shifted to the right by $c$ units. That is $g$ is a transformation of $f$. From one perspective, the act of replacing $x$ with $x-c$ transforms a function into a new function. Mathematically, when we focus on transforming functions, the word [operator](http://en.wikipedia.org/wiki/Operator_%28mathematics%29) is sometimes used. This concept of transforming a function can be viewed as a certain type of function, in an abstract enough way. The relation would be to just pair off the functions $(f,g)$ where $g(x) = f(x-c)$.
With `Julia` we can represent such operations. The simplest thing would be to do something like:
@@ -700,7 +881,7 @@ function shift_right(f; c=0)
end
```
That takes some parsing. In the body of the `shift_right` is the definition of a function. But this function has no name it is *anonymous*. But what it does should be clear - it subtracts $c$ from $x$ and evaluates $f$ at this new value. Since the last expression creates a function, this function is returned by `shift_right`.
That takes some parsing. In the body of the `shift_right` is the definition of a function. But this function has no name-it is *anonymous*. But what it does should be clear---it subtracts $c$ from $x$ and evaluates $f$ at this new value. Since the last expression creates a function, this function is returned by `shift_right`.
So we could have done something more complicated like:
@@ -720,7 +901,7 @@ The value of `c` used when `l` is called is the one passed to `shift_right`. Fun
:::
Anonymous functions can be created with the `function` keyword, but we will use the "arrow" notation, `arg->body` to create them, The above, could have been defined as:
Anonymous functions can be created with the `function` keyword, but we will use the "arrow" notation, `arg->body`, to create them, The above, could have been defined as:
```{julia}
@@ -732,7 +913,11 @@ When the `->` is seen a function is being created.
:::{.callout-warning}
## Warning
Generic versus anonymous functions. Julia has two types of functions, generic ones, as defined by `f(x)=x^2` and anonymous ones, as defined by `x -> x^2`. One gotcha is that `Julia` does not like to use the same variable name for the two types. In general, Julia is a dynamic language, meaning variable names can be reused with different types of variables. But generic functions take more care, as when a new method is defined it gets added to a method table. So repurposing the name of a generic function for something else is not allowed. Similarly, repurposing an already defined variable name for a generic function is not allowed. This comes up when we use functions that return functions as we have different styles that can be used: When we defined `l = shift_right(f, c=3)` the value of `l` is assigned an anonymous function. This binding can be reused to define other variables. However, we could have defined the function `l` through `l(x) = shift_right(f, c=3)(x)`, being explicit about what happens to the variable `x`. This would add a method to the generic function `l`. Meaning, we get an error if we tried to assign a variable to `l`, such as an expression like `l=3`. The latter style is inefficient, so is not preferred.
Generic versus anonymous functions. Julia has two types of functions, generic ones, as defined by `f(x)=x^2` and anonymous ones, as defined by `x -> x^2`. One gotcha is that `Julia` does not like to use the same variable name for the two types. In general, Julia is a dynamic language, meaning variable names can be reused with different types of variables. But generic functions take more care, as when a new method is defined it gets added to a method table. So repurposing the name of a generic function for something else is not allowed. Similarly, repurposing an already defined variable name for a generic function is not allowed.
This comes up when we use functions that return functions as we have different styles that can be used: When we defined `l = shift_right(f, c=3)` the value of `l` is assigned to name an anonymous function for later use. This binding can be reused to define other variables.
However, we could have defined the function `l` through `l(x) = shift_right(f, c=3)(x)`, being explicit about what happens to the variable `x`. This would add a method to the generic function `l`. Meaning, we get an error if we tried to assign a variable to `l`, such as an expression like `l=3`. The latter style is inefficient, so is not preferred.
:::
@@ -805,6 +990,10 @@ which picks off the values of `0` and `1` in a somewhat obscure way but less ver
The `Fix2` function is also helpful when using the `f(x, p)` form for passing parameters to a function. The result of `Base.Fix2(f, p)` is a function with its parameters fixed that can be passed along for plotting or other uses.
:::{.callout-note}
## Fix
In Julia v1.12 the `Fix` constructor can fix an arbitrary position of a variadic function.
:::
### The `do` notation
@@ -918,6 +1107,91 @@ answ = 1
radioq(choices, answ)
```
##### Question
::: {#fig-floor-function}
```{julia}
#| echo: false
plt = let
empty_style = (xticks=-4:4, yticks=-4:4,
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35), line=nothing)
range_style = (; fill=(:blue, 0.35), line=nothing)
ts = range(0, 2pi, 100)
xys = sincos.(ts)
xys = [.1 .* xy for xy in xys]
plot(; empty_style..., aspect_ratio=:equal)
plot!([-4.25,4.25], [0,0]; axis_style...)
plot!([0,0], [-4.25, 4.25]; axis_style...)
for k in -4:4
P,Q = (k,k),(k+1,k)
plot!([P,Q], line=(:black,1))
S = Shape([k .+ xy for xy in xys])
plot!(S; fill=(:black,))
S = Shape([(k+1,k) .+ xy for xy in xys])
plot!(S; fill=(:white,), line=(:black,1))
end
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
The `floor` function rounds down. For example, any value in $[k,k+1)$ rounds to $k$ for integer $k$.
:::
The figure shows the `floor` function which is useful in programming. It rounds down to the first integer value.
From the graph, what is the domain of the function?
```{julia}
#| hold: true
#| echo: false
choices = [
"The entire real line",
"The entire real line except for integer values",
"The integers"
]
answer = 1
radioq(choices, answ)
```
From the graph, what is the range of the function?
```{julia}
#| hold: true
#| echo: false
choices = [
"The entire real line",
"The entire real line except for integer values",
"The integers"
]
answer = 3
radioq(choices, answ)
```
(This graphic uses the convention that a filled in point is present, but an open point is not, hence each bar represents some $[k, k+1)$.)
###### Question
@@ -1301,7 +1575,7 @@ Where $N(m)$ counts the number of stars brighter than magnitude $m$ *per* square
A [square degree](https://en.wikipedia.org/wiki/Square_degree) is a unit of a solid angle. An entire sphere has a solid angle of $4\pi$ and $4\pi \cdot (180/\pi)^2$ square degrees.
With this we can answer agequestions, such as:
With this we can answer questions, such as:
> How many stars can we see in the sky?

View File

@@ -1,4 +1,4 @@
# The Inverse of a Function
# The inverse of a function
{{< include ../_common_code.qmd >}}
@@ -15,10 +15,17 @@ plotly()
---
A (univariate) mathematical function relates or associates values of $x$ to values $y$ using the notation $y=f(x)$. A key point is a given $x$ is associated with just one $y$ value, though a given $y$ value may be associated with several different $x$ values. (Graphically, this is the vertical line test.)
A (univariate) mathematical function relates or associates values of $x$ to values $y$ using the notation $y=f(x)$. A key point is a given $x$ is associated with just one $y$ value, though a given $y$ value may be associated with several different $x$ values. (Graphically, this is the horizontal line test.)
We may conceptualize such a relation in many ways: through an algebraic rule; through the graph of $f;$ through a description of what $f$ does; or through a table of paired values, say. For the moment, let's consider a function as a rule that takes in a value of $x$ and outputs a value $y$. If a rule is given defining the function, the computation of $y$ is straightforward. A different question is not so easy: for a given value $y$ what value - or *values* - of $x$ (if any) produce an output of $y$? That is, what $x$ value(s) satisfy $f(x)=y$?
We may conceptualize such a relation in many ways:
* through an algebraic rule;
* through the graph of $f$;
* through a description of what $f$ does;
* or through a table of paired values, say.
For the moment, let's consider a function as a rule that takes in a value of $x$ and outputs a value $y$. If a rule is given defining the function, the computation of $y$ is straightforward. A different question is not so easy: for a given value $y$ what value---or *values*---of $x$ (if any) produce an output of $y$? That is, what $x$ value(s) satisfy $f(x)=y$?
*If* for each $y$ in some set of values there is just one $x$ value, then this operation associates to each value $y$ a single value $x$, so it too is a function. When that is the case we call this an *inverse* function.
@@ -30,20 +37,57 @@ Why is this useful? When available, it can help us solve equations. If we can wr
Let's explore when we can "solve" for an inverse function.
Consider the graph of the function $f(x) = 2^x$:
Consider this graph of the function $f(x) = 2^x$
```{julia}
#| hold: true
f(x) = 2^x
plot(f, 0, 4, legend=false)
plot!([2,2,0], [0,f(2),f(2)])
#| echo: false
p = let
gr()
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
domain_style = (;fill=(:orange, 0.35), line=nothing)
range_style = (; fill=(:blue, 0.35), line=nothing)
f(x) = 2^x
a, b = 0, 2
plot(; empty_style...)
xs = range(a, b, 200)
ys = f.(xs)
plot!( xs, ys; fn_style...)
plot!([a-1/4, b+.2], [0,0]; axis_style...)
plot!([0, 0], [-.1, f(2.1)]; axis_style...)
x = 1
y = (f(b)+f(a))/2
plot!([x,x,0],[0,f(x),f(x)]; line=(:black, 1, :dash), arrow=true, side=:head)
plot!([0,log2(y), log2(y)], [y,y,0]; line=(:black,1,:dash), arrow=true, side=:head)
annotate!([
(x, 0, text(L"c", 10, :top)),
(0,f(x), text(L"f(c)", 10, :right)),
(0, y, text(L"y=f(d)", 10, :right)),
(log2(y), 0, text(L"d", 10, :top))
])
end
plotly()
p
```
The graph of a function is a representation of points $(x,f(x))$, so to *find* $y = f(c)$ from the graph, we begin on the $x$ axis at $c$, move vertically to the graph (the point $(c, f(c))$), and then move horizontally to the $y$ axis, intersecting it at $y = f(c)$. The figure shows this for $c=2$, from which we can read that $f(c)$ is about $4$. This is how an $x$ is associated to a single $y$.
If we were to *reverse* the direction, starting at $y = f(c)$ on the $y$ axis and then moving horizontally to the graph, and then vertically to the $x$-axis we end up at a value $c$ with the correct $f(c)$. This allows solving for $x$ knowing $y$ in $y=f(x)$.
If we were to *reverse* the direction, starting at $y = f(d)$ on the $y$ axis and then moving horizontally to the graph, and then vertically to the $x$-axis we end up at a value $d$ with the correct value of $f(d)$. This allows solving for $x$ knowing $y$ in $y=f(x)$.
The operation described will form a function **if** the initial movement horizontally is guaranteed to find *no more than one* value on the graph. That is, to have an inverse function, there can not be two $x$ values corresponding to a given $y$ value. This observation is often visualized through the "horizontal line test" - the graph of a function with an inverse function can only intersect a horizontal line at most in one place.
@@ -158,7 +202,7 @@ In the section on the [intermediate value theorem](../limits/intermediate_value_
## Functions which are not always invertible
Consider the function $f(x) = x^2$. The graph - a parabola - is clearly not *monotonic*. Hence no inverse function exists. Yet, we can solve equations $y=x^2$ quite easily: $y=\sqrt{x}$ *or* $y=-\sqrt{x}$. We know the square root undoes the squaring, but we need to be a little more careful to say the square root is the inverse of the squaring function.
Consider the function $f(x) = x^2$. The graph---a parabola---is clearly not *monotonic*. Hence no inverse function exists. Yet, we can solve equations $y=x^2$ quite easily: $y=\sqrt{x}$ *or* $y=-\sqrt{x}$. We know the square root undoes the squaring, but we need to be a little more careful to say the square root is the inverse of the squaring function.
The issue is there are generally *two* possible answers. To avoid this, we might choose to only take the *non-negative* answer. To make this all work as above, we restrict the domain of $f(x)$ and now consider the related function $f(x)=x^2, x \geq 0$. This is now a monotonic function, so will have an inverse function. This is clearly $f^{-1}(x) = \sqrt{x}$. (The $\sqrt{x}$ being defined as the principle square root or the unique *non-negative* answer to $u^2-x=0$.)
@@ -194,8 +238,9 @@ Consider again the graph of a monotonic function, in this case $f(x) = x^2 + 2,
```{julia}
#| hold: true
f(x) = x^2 + 2
plot(f, 0, 4, legend=false)
plot!([2,2,0], [0,f(2),f(2)])
plot(f, 0, 4; yticks=[2,4,8,16],
legend=false, framestyle=:origin)
plot!([(2,0), (2, f(2)), (0, f(2))])
```
The graph is shown over the interval $(0,4)$, but the *domain* of $f(x)$ is all $x \geq 0$. The *range* of $f(x)$ is clearly $2 \leq y \leq \infty$.
@@ -204,24 +249,18 @@ The graph is shown over the interval $(0,4)$, but the *domain* of $f(x)$ is all
The lines layered on the plot show how to associate an $x$ value to a $y$ value or vice versa (as $f(x)$ is one-to-one). The domain then of the inverse function is all the $y$ values for which a corresponding $x$ value exists: this is clearly all values bigger or equal to $2$. The *range* of the inverse function can be seen to be all the images for the values of $y$, which would be all $x \geq 0$. This gives the relationship:
> the *range* of $f(x)$ is the *domain* of $f^{-1}(x)$; furthermore the *domain* of $f(x)$ is the *range* for $f^{-1}(x)$;
> * the *domain* of $f^{-1}(x)$ is the *range* of $f(x)$;
> * the *range* of $f^{-1}(x)$ is the *domain* of $f(x)$;
From this we can see if we start at $x$, apply $f$ we get $y$, if we then apply $f^{-1}$ we will get back to $x$ so we have:
> For all $x$ in the domain of $f$: $f^{-1}(f(x)) = x$.
Similarly, were we to start on the $y$ axis, we would see:
> For all $x$ in the domain of $f^{-1}$: $f(f^{-1}(x)) = x$.
In short $f^{-1} \circ f$ and $f \circ f^{-1}$ are both identity functions, though on possibly different domains.
@@ -243,11 +282,12 @@ Let's see this in action. Take the function $2^x$. We can plot it by generating
f(x) = 2^x
xs = range(0, 2, length=50)
ys = f.(xs)
plot(xs, ys, color=:blue, label="f")
plot!(ys, xs, color=:red, label="f⁻¹") # the inverse
plot(xs, ys; color=:blue, label="f",
aspect_ratio=:equal, framestyle=:origin, xlims=(0,4))
plot!(ys, xs; color=:red, label="f⁻¹") # the inverse
```
By flipping around the $x$ and $y$ values in the `plot!` command, we produce the graph of the inverse function - when viewed as a function of $x$. We can see that the domain of the inverse function (in red) is clearly different from that of the function (in blue).
By flipping around the $x$ and $y$ values in the `plot!` command, we produce the graph of the inverse function---when viewed as a function of $x$. We can see that the domain of the inverse function (in red) is clearly different from that of the function (in blue).
The inverse function graph can be viewed as a symmetry of the graph of the function. Flipping the graph for $f(x)$ around the line $y=x$ will produce the graph of the inverse function: Here we see for the graph of $f(x) = x^{1/3}$ and its inverse function:
@@ -258,15 +298,16 @@ The inverse function graph can be viewed as a symmetry of the graph of the funct
f(x) = cbrt(x)
xs = range(-2, 2, length=150)
ys = f.(xs)
plot(xs, ys, color=:blue, aspect_ratio=:equal, legend=false)
plot!(ys, xs, color=:red)
plot!(identity, color=:green, linestyle=:dash)
x, y = 1/2, f(1/2)
plot!([x,y], [y,x], color=:green, linestyle=:dot)
plot(xs, ys; color=:blue,
aspect_ratio=:equal, legend=false)
plot!(ys, xs; line=(:red,))
plot!(identity; line=(:green, :dash))
x = 1/4
y = f(x)
plot!([(x,y), (y,x)]; line=(:green, :dot))
```
We drew a line connecting $(1/2, f(1/2))$ to $(f(1/2),1/2)$. We can see that it crosses the line $y=x$ perpendicularly, indicating that points are symmetric about this line. (The plotting argument `aspect_ratio=:equal` ensures that the $x$ and $y$ axes are on the same scale, so that this type of line will look perpendicular.)
We drew a line connecting $(1/4, f(1/4))$ to $(f(1/4),1/4)$. We can see that it crosses the line $y=x$ perpendicularly, indicating that points are symmetric about this line. (The plotting argument `aspect_ratio=:equal` ensures that the $x$ and $y$ axes are on the same scale, so that this type of line will look perpendicular.)
One consequence of this symmetry, is that if $f$ is strictly increasing, then so is its inverse.
@@ -309,8 +350,133 @@ What do we see? In blue, we can see the familiar square root graph along with a
This is reminiscent of the formula for the slope of a perpendicular line, $-1/m$, but quite different, as this formula implies the two lines have either both positive slopes or both negative slopes, unlike the relationship in slopes between a line and a perpendicular line.
::: {#fig-inverse-normal layout-ncol=1}
```{julia}
#| echo: false
# inverse function slope
gr()
p1 = let
f(x) = x^2
df(x) = 2x
The key here is that the shape of $f(x)$ near $x=c$ is somewhat related to the shape of $f^{-1}(x)$ at $f(c)$. In this case, if we use the tangent line as a fill in for how steep a function is, we see from the relationship that if $f(x)$ is "steep" at $x=c$, then $f^{-1}(x)$ will be "shallow" at $x=f(c)$.
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
plot(; aspect_ratio=:equal, empty_style...)
xs = range(0, 1.25, 100)
plot!(xs,f.(xs); fn_style...)
plot!(f.(xs), xs; fn2_style...)
plot!(identity, -1/4, 2; line=(:gray, 1, :dot))
#plot!([-.1, 1.35],[0,0]; axis_style...)
#plot!([0,0], [-0.1, f(1.3)]; axis_style...)
c = .4
m = df(c)
tl(x) = f(c) + df(c)*(x-c)
plot!(tl; line=(:black, 1, :dash))
d = c + .6
p1, p2, p3 = (c, tl(c)), (d, tl(c)), (d, tl(d))
q1, q2, q3 = (tl(c),c), (tl(c),d), (tl(d), d)
plot!([p1, p2, p3]; line=(:black, 1, :dot))
tl1(x) = c + (1/m)*(x - f(c))
plot!(tl1; line=(:red, 1, :dash))
plot!([q1, q2, q3]; line=(:red, 1, :dot))
annotate!([
((c+d)/2, f(c), text(L"\Delta x", 10, :top, :black)),
(d, (tl(c)+tl(d))/2, text(L"\Delta y", 10, :left, :black)),
(f(c), (c+d)/2, text(L"\Delta x", 10, :right, :red)),
((tl(c)+tl(d))/2, d, text(L"\Delta y", 10, :bottom, :red)),
(d, tl(d),
text(L"rise/run = $m = \Delta y / \Delta x$", 10, :top, :left,
rotation= rad2deg(atan(m)))),
(tl(d), d,
text(L"rise/run = $\Delta x / \Delta y = 1/m$", 10, :bottom, :left, rotation=rad2deg(atan(1/m)))),
(1.9, 1.9, text(L"y=x", 10, :top, rotation=45))
])
current()
end
# normal line
p2 = let
f(x) = 4 - (x-2)^2
df(x) = -2*(x-2)
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
plot(; aspect_ratio=:equal, empty_style...,
xlims=(1, 2.5), ylims=(3, 4.5))
xs = range(.99, 2.01, 100)
plot!(xs,f.(xs); fn_style...)
c = 1.5
tl(x) = f(c) + df(c)*(x-c)
nl(x) = f(c) - (x-c)/df(c)
xs = range(1, 2, 10)
plot!(xs, tl.(xs); fn2_style...)
xs = range(.9, 2, 10)
plot!(xs, nl.(xs); fn2_style...)
ylims!((f(.85), nl(.95)))
o = 1/3
plot!([c,c+o, c+o], [tl(c), tl(c), tl(c+o)]; mark_style...)
m = (tl(c+o) - tl(c))
plot!([c,c,c+m], [nl(c),nl(c + m),nl(c+m)]; mark_style...)
theta = rad2deg(atan(tl(c+o)-tl(c), o))
annotate!([
(c + o/2, f(c), text(L"1", :top, 10)),
(c + o, (f(c)+f(c+o))/2, text(L"m", :right, 10)),
(c, (nl(c) + nl(c+m))/2, text(L"-1", :right, 10)),
(c+m/2, nl(c+m), text(L"m", :top, 10)),
(c + o/2, tl(c+o), text(L"rise/run $=m/1$", 10, :top,
rotation=theta)),
(c + 1.1*o, nl(c+1.1*o), text(L"rise/run $=(-1)/m$", 10, :bottom,
rotation=theta-90))
])
current()
end
plot(p1, p2)
```
The inverse function has slope at a corresponding point that is the *reciprocal*; the "normal line" for a function at a point has slope that is the *negative reciprocal* of the "tangent line" at a point.
:::
```{julia}
#| echo: false
plotly()
nothing
```
The key here is that the shape of $f(x)$ near $x=c$ is directly related to the shape of $f^{-1}(x)$ at $f(c)$. In this case, if we use the tangent line as a fill in for how steep a function is, we see from the relationship that if $f(x)$ is "steep" at $x=c$, then $f^{-1}(x)$ will be "shallow" at $x=f(c)$.
## Questions

View File

@@ -11,6 +11,8 @@ using CalculusWithJulia
nothing
```
The [`Julia`](http://www.julialang.org) programming language is well suited as a computer accompaniment while learning the concepts of calculus. The following overview covers the language-specific aspects of the pre-calculus part of the [Calculus with Julia](calculuswithjulia.github.io) notes.
@@ -34,35 +36,29 @@ The [https://mybinder.org/](https://mybinder.org/) service in particular allows
[Google colab](https://colab.research.google.com/) offers a free service with more computing power than `binder`, though setup is a bit more fussy. To use `colab` along with these notes, you need to execute a command that downloads `Julia` and installs the `CalculusWithJulia` package and a plotting package. (Modify the `pkg"add ..."` command to add other desired packages; update the julia version as necessary):
```
# Installation cell
%%capture
%%shell
if ! command -v julia 3>&1 > /dev/null
then
wget -q 'https://julialang-s3.julialang.org/bin/linux/x64/1.10/julia-1.10.2-linux-x86_64.tar.gz' \
-O /tmp/julia.tar.gz
tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
rm /tmp/julia.tar.gz
fi
julia -e 'using Pkg; pkg"add IJulia CalculusWithJulia; precompile;"'
julia -e 'using Pkg; Pkg.add(url="https://github.com/mth229/BinderPlots.jl")'
echo 'Now change the runtime type'
```
(The `BinderPlots` is a light-weight, barebones, plotting package that uses `PlotlyLight` to render graphics with commands mostly following those of the `Plots` package. Though suitable for most examples herein, the `Plots` package could instead be installed)
> Go to google colab:
[https://colab.research.google.com/](https://colab.research.google.com/)
After this executes (which can take quite some time, as in a few minutes) under the `Runtime` menu select `Change runtime type` and then select `Julia`.
> Click on "Runtime" menu and then "Change Runtime Type"
After that, in a cell execute these commands to load the two installed packages:
> Select Julia as the "Runtime Type" then save
> Copy and paste then run this set of commands
```
using Pkg
Pkg.add("Plots")
Pkg.add("CalculusWithJulia")
using CalculusWithJulia
using BinderPlots
using Plots
```
As mentioned, other packages can be chosen for installation.
This may take 2-3 minutes to load. The `plotly()` backend doesn't work out of the box. Use `gr()` to recover if that command is issued.
@@ -85,7 +81,7 @@ $ julia
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.1 (2024-10-16)
| | |_| | | | (_| | | Version 1.11.6 (2025-07-09)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
@@ -97,7 +93,7 @@ julia> 2 + 2
* An IDE. For programmers, an integrated development environment is often used to manage bigger projects. `Julia` has `Juno` and `VSCode`.
* A notebook. The [Project Juptyer](https://jupyter.org/) provides a notebook interface for interacting with `Julia` and a more `IDE` style `jupyterlab` interface. A jupyter notebook has cells where commands are typed and immediately following is the printed output returned by `Julia`. The output of a cell depends on the state of the kernel when the cell is computed, not the order of the cells in the notebook. Cells have a number attached, showing the execution order. The `Juypter` notebook is used by `binder` and can be used locally through the `IJulia` package. This notebook has the ability to display many different types of outputs in addition to plain text, such as images, marked up math text, etc.
* A notebook. The [Project Jupyter](https://jupyter.org/) provides a notebook interface for interacting with `Julia` and a more `IDE` style `jupyterlab` interface. A jupyter notebook has cells where commands are typed and immediately following is the printed output returned by `Julia`. The output of a cell depends on the state of the kernel when the cell is computed, not the order of the cells in the notebook. Cells have a number attached, showing the execution order. The `Juypter` notebook is used by `binder` and can be used locally through the `IJulia` package. This notebook has the ability to display many different types of outputs in addition to plain text, such as images, marked up math text, etc.
* The [Pluto](https://github.com/fonsp/Pluto.jl) package provides a *reactive* notebook interface. Reactive means when one "cell" is modified and executed, the new values cascade to all other dependent cells which in turn are updated. This is very useful for exploring a parameter space, say. Pluto notebooks can be exported as HTML files which make them easy to read online and by clever design embed the `.jl` file that can run through `Pluto` if it is downloaded.
@@ -280,7 +276,7 @@ Values will be promoted to a common type (or type `Any` if none exists). For exa
(Vectors are used as a return type from some functions, as such, some familiarity is needed.)
Other common container types are variables of vectors (higher-dimensional arrarys, offset arrays, etc.) tuples (for heterogeneous, immutable, indexed values); named tuples (which add a name to each value in a tuple); and dictionaries (for associative relationships between a key and a value).
Other common container types are variables of vectors (higher-dimensional arrays, offset arrays, etc.) tuples (for heterogeneous, immutable, indexed values); named tuples (which add a name to each value in a tuple); and dictionaries (for associative relationships between a key and a value).
Regular arithmetic sequences can be defined by either:
@@ -452,7 +448,7 @@ a = 4
f(3) # now 2 * 4 + 3
```
User-defined functions can have $0$, $1$ or more positional arguments:
User-defined functions can have $0$, $1$, or more positional arguments:
```{julia}
@@ -573,7 +569,7 @@ With `Plots` loaded, we can plot a function by passing the function object by na
```{julia}
plot(sin, 0, 2pi) # plot a function - by name - over an interval [a,b]
plot(sin, 0, 2pi)
```
::: {.callout-note}
@@ -629,7 +625,7 @@ ys = f.(xs)
plot(f, a, b) # recipe for a function
plot(xs, f) # alternate recipe
plot(xs, ys) # plot coordinates as two vectors
plot([(x,f(x)) for x in xs]) # plot a vector o points
plot([(x,f(x)) for x in xs]) # plot a vector of points
```
The choice should depend on convenience.
@@ -637,7 +633,7 @@ The choice should depend on convenience.
## Equations
Notation for `Julia` and math is *similar* for functions - but not for equations. In math, an equation might look like:
Notation for `Julia` and math is *similar* for functions---but not for equations. In math, an equation might look like:
$$
@@ -664,13 +660,15 @@ using SymPy
(A macro rewrites values into other commands before they are interpreted. Macros are prefixed with the `@` sign. In this use, the "macro" `@syms` translates `x a b c` into a command involving `SymPy`s `symbols` function.)
Symbolic expressions - unlike numeric expressions - are not immediately evaluated, though they may be simplified:
Symbolic expressions---unlike numeric expressions---are not immediately evaluated, though they may be simplified:
```{julia}
p = a*x^2 + b*x + c
```
The above command illustrates that the mathematical operations of `*`, `^`, and `+` work with symbolic objects. This is the case for most mathematical functions as well.
To substitute a value, we can use `Julia`'s `pair` notation (`variable=>value`):

View File

@@ -2,13 +2,7 @@ module Make
# makefile for generating typst pdfs
# per directory usage
dir = "precalc"
files = ("calculator",
"variables",
"numbers_types",
"logical_expressions",
"vectors",
"ranges",
"functions",
files = ("functions",
"plotting",
"transformations",
"inversefunctions",

View File

@@ -1,4 +1,4 @@
# The Graph of a Function
# The graph of a function
{{< include ../_common_code.qmd >}}
@@ -9,7 +9,7 @@ This section will use the following packages:
```{julia}
using CalculusWithJulia
using Plots
plotly()
plotly();
```
```{julia}
@@ -18,6 +18,7 @@ plotly()
using Roots
using SymPy
using DataFrames
using Latexify
nothing
```
@@ -83,7 +84,7 @@ plotly()
(Certain graphics are produced with the `gr()` backend.)
With `Plots` loaded, it is straightforward to graph a function.
With `Plots` loaded and a backend chosen, it is straightforward to graph a function.
For example, to graph $f(x) = 1 - x^2/2$ over the interval $[-3,3]$ we have:
@@ -97,8 +98,9 @@ plot(f, -3, 3)
The `plot` command does the hard work behind the scenes. It needs $2$ pieces of information declared:
* **What** to plot. With this invocation, this detail is expressed by passing a function object to `plot`
* **Where** to plot; the `xmin` and `xmax` values. As with a sketch, it is impossible in this case to render a graph with all possible $x$ values in the domain of $f$, so we need to pick some viewing window. In the example this is $[-3,3]$ which is expressed by passing the two endpoints as the second and third arguments.
* **What** to plot. With this invocation, this detail is expressed by passing a function object to `plot`
* **Where** to plot; the `xmin` and `xmax` values. As with a sketch, it is impossible in this case to render a graph with all possible $x$ values in the domain of $f$, so we need to pick some viewing window. In the example this has $x$ limits of $[-3,3]$; expressed by passing the two endpoints as the second and third arguments.
Plotting a function is then this simple: `plot(f, xmin, xmax)`.
@@ -108,11 +110,6 @@ Plotting a function is then this simple: `plot(f, xmin, xmax)`.
:::{.callout-note}
## Note
The time to first plot can feel sluggish, but subsequent plots will be speedy. See the technical note at the end of this section for an explanation.
:::
Let's see some other graphs.
@@ -193,10 +190,74 @@ Some types we will encounter, such as the one for symbolic values or the special
:::
:::{.callout-note}
## Viewing window
The default style for `Plots.jl` is to use a frame style where the viewing window is emphasized. This is a rectangular region, $[x_0, x_1] \times [y_0, y_1]$, which is seen through the tick labeling, the bounding scales on the left and bottom, and emphasized through the grid.
This choices does *not* show the $x-y$ axes. As such, we might layer on the axes when these are of interest.
To emphasize concepts, we may stylize a function graph, rather than display the basic graphic. For example, in this graphic highlighting the amount the function goes up as it moves from $1$ to $x$:
```{julia}
#| echo: false
plt = let
gr()
f(x) = x^2
empty_style = (xaxis=([], false),
yaxis=([], false),
framestyle=:origin,
legend=false)
axis_style = (arrow=true, side=:head, line=(:gray, 1))
text_style = (10,)
fn_style = (;line=(:black, 3))
fn2_style = (;line=(:red, 4))
mark_style = (;line=(:gray, 1, :dot))
plot(; empty_style..., aspect_ratio=:equal)
a, b = 0, 1.25
x = 1.15
plot!(f, a, b; fn_style...)
plot!([-.1, 1.5], [0,0]; axis_style...)
plot!([0,0], [-.1, f(1.35)]; axis_style...)
plot!([1,x,x], [f(1),f(1),f(x)]; line=(:black, 1))
plot!([1,1],[0,f(1)]; mark_style...)
plot!([x,x],[0,f(1)]; mark_style...)
plot!([0,1],[f(1),f(1)]; mark_style...)
plot!([0,x],[f(x),f(x)]; mark_style...)
annotate!([
(1, 0, text(L"1", 10, :top)),
(x, 0, text(L"x", 10, :top)),
(0, f(1), text(L"1", 10, :right)),
(0, f(x), text(L"x^2", 10, :right)),
(1, f(1), text(L"P", 10, :right, :bottom)),
(x, f(x), text(L"Q", 10, :right, :bottom)),
((1 + x)/2, f(1), text(L"\Delta x", 10, :top)),
(x, (f(1) + f(x))/2, text(L"\Delta y", 10, :left))
])
current()
end
plt
```
```{julia}
#| echo: false
plotly()
nothing
```
:::
---
Making a graph with `Plots` is easy, but producing a graph that is informative can be a challenge, as the choice of a viewing window can make a big difference in what is seen. For example, trying to make a graph of $f(x) = \tan(x)$, as below, will result in a bit of a mess - the chosen viewing window crosses several places where the function blows up:
Making a graph with `Plots` is easy, but producing a graph that is informative can be a challenge, as the choice of a viewing window can make a big difference in what is seen. For example, trying to make a graph of $f(x) = \tan(x)$, as below, will result in a bit of a mess---the chosen viewing window crosses several places where the function blows up:
```{julia}
@@ -257,6 +318,7 @@ plot(xs, ys)
This plots the points as pairs and then connects them in order using straight lines. Basically, it creates a dot-to-dot graph. The above graph looks primitive, as it doesn't utilize enough points.
##### Example: Reflections
@@ -338,17 +400,16 @@ The graph is that of the "inverse function" for $\sin(x), x \text{ in } [-\pi/2,
When plotting a univariate function there are three basic patterns that can be employed. We have examples above of:
* `plot(f, xmin, xmax)` uses an adaptive algorithm to identify values for $x$ in the interval `[xmin, xmas]`,
* `plot(xs, f.(xs))` to manually choose the values of $x$ to plot points for, and
* `plot(f, xmin, xmax)` uses a recipe implementing an adaptive algorithm to identify values for $x$ in the interval `[xmin, xmas]`,
* `plot(xs, f.(xs))` to manually choose the values of $x$ to plot points for, and
Finally, there is a merging of the first two following the pattern:
* `plot(xs, f)`
Finally there is a merging of these following either of these patterns:
* `plot(f, xs)` *or* `plot(xs, f)`
Both require a manual choice of the values of the $x$-values to plot, but the broadcasting is carried out in the `plot` command. This style is convenient, for example, to down sample the $x$ range to see the plotting mechanics, such as:
All require a manual choice of the values of the $x$-values to plot, but the broadcasting is carried out in the `plot` command. This style is convenient, for example, to down sample the $x$ range to see the plotting mechanics, such as:
```{julia}
@@ -358,10 +419,10 @@ plot(0:pi/4:2pi, sin)
#### NaN values
At times it is not desirable to draw lines between each successive point. For example, if there is a discontinuity in the function or if there were a vertical asymptote, such as what happens at $0$ with $f(x) = 1/x$.
At times it is not desirable to draw lines between each successive point. For example, if there is a discontinuity in the function or if there were a vertical asymptote.
The most straightforward plot is dominated by the vertical asymptote at $x=0$:
For example,what happens at $0$ with $f(x) = 1/x$. The most straightforward plot is dominated by the vertical asymptote at $x=0$:
```{julia}
@@ -379,10 +440,10 @@ As we see, even with this adjustment, the spurious line connecting the points wi
plot(q, -1, 1, ylims=(-10,10))
```
The dot-to-dot algorithm, at some level, assumes the underlying function is continuous; here $q(x)=1/x$ is not.
The dot-to-dot algorithm, at some level, assumes the underlying function is *continuous*; here $q(x)=1/x$ is not.
There is a convention for most plotting programs that **if** the $y$ value for a point is `NaN` that no lines will connect to that point, `(x,NaN)`. `NaN` conveniently appears in many cases where a plot may have an issue, though not with $1/x$ as `1/0` is `Inf` and not `NaN`. (Unlike, say, `0/0` which is NaN.)
There is a convention for most plotting programs that **if** the $y$ value for a point is `NaN` then no lines will connect to that point, `(x,NaN)`. `NaN` conveniently appears in many cases where a plot may have an issue, though not with $1/x$ as `1/0` is `Inf` and not `NaN`. (Unlike, say, `0/0` which is NaN.)
Here is one way to plot $q(x) = 1/x$ over $[-1,1]$ taking advantage of this convention:
@@ -399,7 +460,7 @@ plot(xs, ys)
By using an odd number of points, we should have that $0.0$ is amongst the `xs`. The next to last line replaces the $y$ value that would be infinite with `NaN`.
As a recommended alternative, we might modify the function so that if it is too large, the values are replaced by `NaN`. Here is one such function consuming a function and returning a modified function put to use to make this graph:
The above is fussy. As a recommended alternative, we might modify the function so that if it is too large, the values are replaced by `NaN`. Here is one such function consuming a function and returning a modified function put to use to make this graph:
```{julia}
@@ -413,7 +474,7 @@ plot(rangeclamp(x -> 1/x), -1, 1)
## Layers
Graphing more than one function over the same viewing window is often desirable. Though this is easily done in `Plots` by specifying a vector of functions as the first argument to `plot` instead of a single function object, we instead focus on building the graph layer by layer.
Graphing more than one function over the same viewing window is often desirable. Though this is easily done all at once in `Plots` by specifying a vector of functions as the first argument to `plot` instead of a single function object, we instead focus on building the graph layer by layer.^[The style of `Plots` is to combine multiple *series* to plot into one object and let `Plots` sort out which (every column is treated as a separate series). This can be very efficient from a programming perspective, but we leave it for power users. The use of layers, seems much easier to understand.]
For example, to see that a polynomial and the cosine function are "close" near $0$, we can plot *both* $\cos(x)$ and the function $f(x) = 1 - x^2/2$ over $[-\pi/2,\pi/2]$:
@@ -447,8 +508,8 @@ For another example, suppose we wish to plot the function $f(x)=x\cdot(x-1)$ ove
```{julia}
#| hold: true
f(x) = x*(x-1)
plot(f, -1, 2, legend=false) # turn off legend
f(x) = x * (x-1)
plot(f, -1, 2; legend=false) # turn off legend
plot!(zero)
scatter!([0,1], [0,0])
```
@@ -456,43 +517,53 @@ scatter!([0,1], [0,0])
The $3$ main functions used in these notes for adding layers are:
* `plot!(f, a, b)` to add the graph of the function `f`; also `plot!(xs, ys)`
* `scatter!(xs, ys)` to add points $(x_1, y_1), (x_2, y_2), \dots$.
* `annotate!((x,y, label))` to add a label at $(x,y)$
* `plot!(f, a, b)` to add the graph of the function `f`; also `plot!(xs, ys)`
* `scatter!(xs, ys)` to add points $(x_1, y_1), (x_2, y_2), \dots$.
* `annotate!((x,y, label))` to add a label at $(x,y)$
:::{.callout-warning}
## Warning
## Trailing ! convention
Julia has a convention to use functions named with a `!` suffix to indicate that they mutate some object. In this case, the object is the current graph, though it is implicit. Both `plot!`, `scatter!`, and `annotate!` (others too) do this by adding a layer.
:::
## Additional arguments
The `Plots` package provides many arguments for adjusting a graphic, here we mention just a few of the [attributes](https://docs.juliaplots.org/latest/attributes/):
The `Plots` package uses positional arguments for input data and keyword arguments for [attributes](https://docs.juliaplots.org/latest/attributes/).
The `Plots` package provides many such arguments for adjusting a graphic, here we mention just a few:
* `plot(..., title="main title", xlab="x axis label", ylab="y axis label")`: add title and label information to a graphic
* `plot(..., color="green")`: this argument can be used to adjust the color of the drawn figure (color can be a string,`"green"`, or a symbol, `:green`, among other specifications)
* `plot(..., linewidth=5)`: this argument can be used to adjust the width of drawn lines
* `plot(..., xlims=(a,b), ylims=(c,d))`: either or both `xlims` and `ylims` can be used to control the viewing window
* `plot(..., linestyle=:dash)`: will change the line style of the plotted lines to dashed lines. Also `:dot`, ...
* `plot(..., aspect_ratio=:equal)`: will keep $x$ and $y$ axis on same scale so that squares look square.
* `plot(..., legend=false)`: by default, different layers will be indicated with a legend, this will turn off this feature
* `plot(..., label="a label")` the `label` attribute will show up when a legend is present. Using an empty string, `""`, will suppress add the layer to the legend.
* `plot(...; title="main title", xlabel="x axis label", ylabel="y axis label")`: add title and label information to a graphic
* `plot(...; label="a label")` the `label` attribute will show up when a legend is present. Using an empty string, `""`, will suppress add the layer to the legend.
* `plot(...; legend=false)`: by default, different layers will be indicated with a legend, this will turn off this feature
* `plot(...; xlims=(a,b), ylims=(c,d))`: either or both `xlims` and `ylims` can be used to control the viewing window
* `plot(...; xticks=[xs..], yticks=[ys...]: either or both `xticks` and `yticks` can be used to specify where the tick marks are to be drawn
* `plot(...; aspect_ratio=:equal)`: will keep $x$ and $y$ axis on same scale so that squares look square.
* `plot(...; framestyle=:origin)`: The default `framestyle` places $x$-$y$ guides on the edges; this specification places them on the $x-y$ plane.
* `plot(...; color="green")`: this argument can be used to adjust the color of the drawn figure (color can be a string,`"green"`, or a symbol, `:green`, among other specifications)
* `plot(...; linewidth=5)`: this argument can be used to adjust the width of drawn lines
* `plot(...; linestyle=:dash)`: will change the line style of the plotted lines to dashed lines. Also `:dot`, ...
For plotting points with `scatter`, or `scatter!` the markers can be adjusted via
* `scatter(..., markersize=5)`: increase marker size
* `scatter(..., marker=:square)`: change the marker (uses a symbol, not a string to specify)
* `scatter(...; markersize=5)`: increase marker size
* `scatter(...; marker=:square)`: change the marker (uses a symbol, not a string to specify)
Of course, zero, one, or more of these can be used on any given call to `plot`, `plot!`, `scatter`, or `scatter!`.
There are also several *shorthands* in `Plots` that allows several related attributes to be specified to a single argument that is disambiguated using the type of the value. (Eg. `line=(5, 0.25, "blue")` will specify the line have width `5`, color `blue`, and alpha-transparency `0.25`.)
### Shorthands
There are also several *shorthands* in `Plots` that allows several related attributes to be specified to a single argument that is disambiguated using the type of the value. A few used herein are:
* `line`. For example, `line=(5, 0.25, "blue")` will specify `linewidth=5` (integer), `linecolor="blue"` (string or symbol), `linealpha=0.25` (floating point)
* `marker`. For example `marker=(:star, 5)` will specify `markerstyle=:star` (symbol) and `markersize=5` (integer).
* `fill`. For example `fill=(:blue, 0.25)` will specify `fillcolor=:blue` (string or symbol) and `fillalpha=0.25` (floating point).
#### Example: Bresenham's algorithm
@@ -512,7 +583,7 @@ With these assumptions, we have an initial decision to make:
We re-express our equation $y=f(x)= mx+b$ in general form $f(x,y) = 0 = Ax + By + C$. Using the other point on the line $A=-(y_1-y_0)$, $B=(x_1-x_0)$, and $C = -x_1y_0 + x_0 y_1$. In particular, by assumption both $A$ and $B$ are positive.
With this, we have $f(x_0,y_0) = 0$. But moreover, any point with $y>y_0$ will have $f(x_0,y)>0$ and if $y < y_0$ the opposite. That is this equation divides the plane into two pieces depending on whether $f$ is positive, the line is the dividing boundary.
With this, we have $f(x_0,y_0) = 0$. But moreover, any point $(x_0,y)$ with $y>y_0$ will have $f(x_0,y)>0$ and if $y < y_0$ the opposite. That is this equation divides the plane into two pieces depending on whether $f$ is positive---the line is the dividing boundary.
For the algorithm, we start at $(x_0, y_0)$ and ask if the pixel $(x_0 + 1, y_0)$ or $(x_0 + 1, y_0 - 1)$ will be lit, then we continue to the right.
@@ -549,9 +620,9 @@ p = plot(f, x₀, x₁; legend=false, aspect_ratio=:equal,
col = RGBA(.64,.64,.64, 0.25)
for xy ∈ xs
x, y = xy
scatter!([x], [y]; markersize=5)
scatter!([x+1], [y - 1/2], markersize=5, markershape=:star7)
plot!(Shape(x .+ [0, 1, 1, 0], y .+ [0, 0, -1, -1]), color=col)
scatter!([x], [y]; marker=(5,))
scatter!([x+1], [y - 1/2]; marker=(5,:star))
plot!(Shape(x .+ [0, 1, 1, 0], y .+ [0, 0, -1, -1]); color=col)
end
p
```
@@ -560,14 +631,46 @@ We see a number of additional arguments used: different marker sizes and shapes
Of course, generalizations for positive slope and slope with magnitude greater than $1$ are needed. As well, this basic algorithm could be optimized, especially if it is part of a lower-level drawing primitive. But this illustrates the considerations involved.
## Points, lines, polygons
Two basic objects to graph are points and lines. Add to these polygons.
A point in two-dimensional space has two coordinates, often denoted by $(x,y)$. In `Julia`, the same notation produces a `tuple`. Using square brackets, as in `[x,y]`, produces a vector. Vectors are are more commonly used in these notes, as we have seen there are algebraic operations defined for them. However, tuples have other advantages and are how `Plots` designates a point.
The plot command `plot(xs, ys)` plots the points $(x_1,y_1), \dots, (x_n, y_n)$ and then connects adjacent points with lines. The command `scatter(xs, ys)` just plots the points.
However, the points might be more naturally specified as coordinate pairs. If tuples are used to pair them off, then `Plots` will plot a vector of tuples as a sequence of points through `plot([(x1,y1), (x2, y2), ..., (xn, yn)])`:
```{julia}
pts = [( 1, 0), ( 1/4, 1/4), (0, 1), (-1/4, 1/4),
(-1, 0), (-1/4, -1/4), (0, -1), ( 1/4, -1/4)]
scatter(pts; legend=false)
```
A line segment simply connects two points. While these can be specified as vectors of $x$ and $y$ values, again it may be more convenient to use coordinate pairs to specify the points. Continuing the above, we can connect adjacent points with line segments:
```{julia}
plot!(pts; line=(:gray, 0.5, :dash))
```
This uses the shorthand notation of `Plots` to specify `linecolor=:gray, linealpha=0.5, linestyle=:dash`. To plot just a line segment, just specifying two points suffices.
The four-pointed star is not closed off, as there isn't a value from the last point to the first point. A polygon closes itself off. The `Shape` function can take a vector of points or a pair of `xs` and `ys` to specify a polygon. When these are plotted, the arguments to `fill` describe the interior of the polygon, the arguments to `line` the boundary:
```{julia}
plot(Shape(pts); fill=(:gray, 0.25), line=(:black, 2), legend=false)
scatter!(pts)
```
## Graphs of parametric equations
If we have two functions $f(x)$ and $g(x)$ there are a few ways to investigate their joint behavior. As just mentioned, we can graph both $f$ and $g$ over the same interval using layers. Such a graph allows an easy comparison of the shape of the two functions and can be useful in solving $f(x) = g(x)$. For the latter, the graph of $h(x) = f(x) - g(x)$ is also of value: solutions to $f(x)=g(x)$ appear as crossing points on the graphs of `f` and `g`, whereas they appear as zeros (crossings of the $x$-axis) when `h` is plotted.
If we have two functions $f(x)$ and $g(x)$ there are a few ways to investigate their joint behavior. As mentioned, we can graph both $f$ and $g$ over the same interval using layers. Such a graph allows an easy comparison of the shape of the two functions and can be useful in solving $f(x) = g(x)$, as the $x$ solutions are where the two curves intersect.
A different graph can be made to compare the two functions side-by-side. This is a parametric plot. Rather than plotting points $(x,f(x))$ and $(x,g(x))$ with two separate graphs, the graph consists of points $(f(x), g(x))$. We illustrate with some examples below:
A different graph can be made to compare the two functions side-by-side. This is a parametric plot. Rather than plotting points $(x,f(x))$ and $(x,g(x))$ with two separate graphs, the graph consists of points $(f(x), g(x))$ over a range of $x$ values. We illustrate with some examples below:
##### Example
@@ -585,7 +688,7 @@ plot(f.(ts), g.(ts), aspect_ratio=:equal) # make equal axes
Any point $(a,b)$ on this graph is represented by $(\cos(t), \sin(t))$ for some value of $t$, and in fact multiple values of $t$, since $t + 2k\pi$ will produce the same $(a,b)$ value as $t$ will.
Making the parametric plot is similar to creating a plot using lower level commands. There a sequence of values is generated to approximate the $x$ values in the graph (`xs`), a set of commands to create the corresponding function values (e.g., `f.(xs)`), and some instruction on how to represent the values, in this case with lines connecting the points (the default for `plot` for two sets of numbers).
Making the parametric plot is similar to creating a plot using lower level commands. There a sequence of values is generated to approximate the $x$ values in the graph (`xs`), a set of commands to create the corresponding function values (e.g., `f.(xs)`), and some instruction on how to represent the values, in this case with lines connecting the points (the default for `plot` for two vectors of numbers).
In this next plot, the angle values are chosen to be the familiar ones, so the mechanics of the graph can be emphasized. Only the upper half is plotted:
@@ -595,7 +698,7 @@ In this next plot, the angle values are chosen to be the familiar ones, so the m
#| hold: true
#| echo: false
θs =[0, PI/6, PI/4, PI/3, PI/2, 2PI/3, 3PI/4,5PI/6, PI]
DataFrame(θ=θs, x=cos.(θs), y=sin.(θs))
latexify(DataFrame(θ=θs, x=cos.(θs), y=sin.(θs)))
```
```{julia}
@@ -608,7 +711,7 @@ scatter!(f.(θs), g.(θs))
---
As with the plot of a univariate function, there is a convenience interface for these plots - just pass the two functions in:
As with the plot of a univariate function, there is a convenience interface for these plots---just pass the two functions in:
```{julia}
@@ -660,7 +763,7 @@ This graph is *nearly* a straight line. At the point $(0,0)=(f(0), g(0))$, we se
##### Example: Etch A Sketch
[Etch A sketch](http://en.wikipedia.org/wiki/Etch_A_Sketch) is a drawing toy where two knobs control the motion of a pointer, one knob controlling the $x$ motion, the other the $y$ motion. The trace of the movement of the pointer is recorded until the display is cleared by shaking. Shake to clear is now a motion incorporated by some smart-phone apps.
[Etch A Sketch](http://en.wikipedia.org/wiki/Etch_A_Sketch) is a drawing toy where two knobs control the motion of a pointer, one knob controlling the $x$ motion, the other the $y$ motion. The trace of the movement of the pointer is recorded until the display is cleared by shaking. Shake to clear is now a motion incorporated by some smart-phone apps.
Playing with the toy makes a few things become clear:
@@ -674,8 +777,9 @@ Playing with the toy makes a few things become clear:
These all apply to parametric plots, as the Etch A Sketch trace is no more than a plot of $(f(t), g(t))$ over some range of values for $t$, where $f$ describes the movement in time of the left knob and $g$ the movement in time of the right.
---
Now, we revisit the last problem in the context of this. We saw in the last problem that the parametric graph was nearly a line - so close the eye can't really tell otherwise. That means that the growth in both $f(t) = t^3$ and $g(t)=t - \sin(t)$ for $t$ around $0$ are in a nearly fixed ratio, as otherwise the graph would have more curve in it.
Now, we revisit the last problem in the context of this. We saw in the last problem that the parametric graph was nearly a line---so close the eye can't really tell otherwise. That means that the growth in both $f(t) = t^3$ and $g(t)=t - \sin(t)$ for $t$ around $0$ are in a nearly fixed ratio, as otherwise the graph would have more curve in it.
##### Example: Spirograph
@@ -699,6 +803,8 @@ plot(f, g, 0, max((R-r)/r, r/(R-r))*2pi)
In the above, one can fix $R=1$. Then different values for `r` and `rho` will produce different graphs. These graphs will be periodic if $(R-r)/r$ is a rational. (Nothing about these equations requires $\rho < r$.)
## Questions
@@ -1031,14 +1137,3 @@ choices = [
answ = 5
radioq(choices, answ, keep_order=true)
```
---
## Technical note
The slow "time to first plot" in `Julia` is a well-known hiccup that is related to how `Julia` can be so fast. Loading Plots and the making the first plot are both somewhat time consuming, though the second and subsequent plots are speedy. Why?
`Julia` is an interactive language that attains its speed by compiling functions on the fly using the [llvm](llvm.org) compiler. When `Julia` encounters a new combination of a function method and argument types it will compile and cache a function for subsequent speedy execution. The first plot is slow, as there are many internal functions that get compiled. This has sped up of late, as excessive recompilations have been trimmed down, but still has a way to go. This is different from "precompilation" which also helps trim down time for initial executions. There are also some more technically challenging means to create `Julia` images for faster start up that can be pursued if needed.

View File

@@ -51,7 +51,12 @@ $$
gr()
anim = @animate for m in 2:2:10
fn = x -> x^m
plot(fn, -1, 1, size = fig_size, legend=false, title="graph of x^{$m}", xlims=(-1,1), ylims=(-.1,1))
title = L"graph of $x^{%$m}$"
plot(fn, -1, 1;
size = fig_size,
legend=false,
title=title,
xlims=(-1,1), ylims=(-.1,1))
end
imgfile = tempname() * ".gif"
@@ -96,7 +101,12 @@ $$
gr()
anim = @animate for m in [-5, -2, -1, 1, 2, 5, 10, 20]
fn = x -> m * x
plot(fn, -1, 1, size = fig_size, legend=false, title="m = $m", xlims=(-1,1), ylims=(-20, 20))
title = L"m = %$m"
plot(fn, -1, 1;
size = fig_size,
legend=false,
title=title,
xlims=(-1,1), ylims=(-20, 20))
end
imgfile = tempname() * ".gif"
@@ -301,7 +311,9 @@ float(y)
The use of the generic `float` method returns a floating point number. `SymPy` objects have their own internal types. To preserve these on conversion to a related `Julia` value, the `N` function from `SymPy` is useful:
The use of the generic `float` method returns a floating point number. (The `.evalf()` method of `SymPy` objects uses `SymPy` to produce floating point versions of symbolic values.
`SymPy` objects have their own internal types. To preserve these on conversion to a related `Julia` value, the `N` function from `SymPy` is useful:
```{julia}
@@ -310,14 +322,7 @@ p = -16x^2 + 100
N(p(2))
```
Where `convert(T, x)` requires a specification of the type to convert `x` to, `N` attempts to match the data type used by SymPy to store the number. As such, the output type of `N` may vary (rational, a BigFloat, a float, etc.) For getting more digits of accuracy, a precision can be passed to `N`. The following command will take the symbolic value for $\pi$, `PI`, and produce about $60$ digits worth as a `BigFloat` value:
```{julia}
N(PI, 60)
```
Conversion by `N` will fail if the value to be converted contains free symbols, as would be expected.
Where `convert(T, x)` requires a specification of the type to convert `x` to, `N` attempts to match the data type used by SymPy to store the number. As such, the output type of `N` may vary (rational, a BigFloat, a float, etc.) Conversion by `N` will fail if the value to be converted contains free symbols, as would be expected.
### Converting symbolic expressions into Julia functions
@@ -362,6 +367,8 @@ pp = lambdify(p, (x,a,b))
pp(1,2,3) # computes 2*1^2 + 3
```
(We suggest using the pair notation when there is more than one variable.)
## Graphical properties of polynomials
@@ -394,7 +401,12 @@ To investigate this last point, let's consider the case of the monomial $x^n$. W
gr()
anim = @animate for m in 0:2:12
fn = x -> x^m
plot(fn, -1.2, 1.2, size = fig_size, legend=false, xlims=(-1.2, 1.2), ylims=(0, 1.2^12), title="x^{$m} over [-1.2, 1.2]")
title = L"$x^{%$m}$ over $[-1.2, 1.2]$"
plot(fn, -1.2, 1.2;
size = fig_size,
legend=false,
xlims=(-1.2, 1.2), ylims=(0, 1.2^12),
title=title)
end
imgfile = tempname() * ".gif"
@@ -433,8 +445,13 @@ anim = @animate for n in 1:6
m = [1, .5, -1, -5, -20, -25]
M = [2, 4, 5, 10, 25, 30]
fn = x -> (x-1)*(x-2)*(x-3)*(x-5)
title = L"Graph of on $(%$(m[n]), %$(M[n]))$"
plt = plot(fn, m[n], M[n], size=fig_size, legend=false, linewidth=2, title ="Graph of on ($(m[n]), $(M[n]))")
plt = plot(fn, m[n], M[n];
size=fig_size,
legend=false,
linewidth=2,
title=title)
if n > 1
plot!(plt, fn, m[n-1], M[n-1], color=:red, linewidth=4)
end
@@ -468,11 +485,20 @@ The following graphic illustrates the $4$ basic *overall* shapes that can result
```{julia}
#| echo: false
plot(; layout=4)
plot!(x -> x^4, -3,3, legend=false, xticks=false, yticks=false, subplot=1, title="n = even, aₙ > 0")
plot!(x -> x^5, -3,3, legend=false, xticks=false, yticks=false, subplot=2, title="n = odd, aₙ > 0")
plot!(x -> -x^4, -3,3, legend=false, xticks=false, yticks=false, subplot=3, title="n = even, aₙ < 0")
plot!(x -> -x^5, -3,3, legend=false, xticks=false, yticks=false, subplot=4, title="n = odd, aₙ < 0")
let
gr()
plot(; layout=4)
plot!(x -> x^4, -3,3, legend=false, xticks=false, yticks=false, subplot=1, title=L"$n = $ even, $a_n > 0$")
plot!(x -> x^5, -3,3, legend=false, xticks=false, yticks=false, subplot=2, title=L"$n = $ odd, $a_m > 0$")
plot!(x -> -x^4, -3,3, legend=false, xticks=false, yticks=false, subplot=3, title=L"$n = $ even, $a_n < 0$")
plot!(x -> -x^5, -3,3, legend=false, xticks=false, yticks=false, subplot=4, title=L"$n = $ odd, $a_n < 0$")
end
```
```{julia}
#| echo: false
plotly()
nothing
```
##### Example
@@ -513,14 +539,15 @@ This observation is the start of Descartes' rule of [signs](http://sepwww.stanfo
Among numerous others, there are two common ways of representing a non-zero polynomial:
* expanded form, as in $a_n x^n + a_{n-1}x^{n-1} + \cdots + a_1 x + a_0, a_n \neq 0$; or
* factored form, as in $a\cdot(x-r_1)\cdot(x-r_2)\cdots(x-r_n), a \neq 0$.
* expanded form, as in $a_n x^n + a_{n-1}x^{n-1} + \cdots + a_1 x + a_0,\quad a_n \neq 0$; or
* factored form, as in $a\cdot(x-r_1)\cdot(x-r_2)\cdots(x-r_n), \quad a \neq 0$.
The former uses the *standard basis* to represent the polynomial $p$.
The latter writes $p$ as a product of linear factors, though this is only possible in general if we consider complex roots. With real roots only, then the factors are either linear or quadratic, as will be discussed later.
There are values to each representation. One value of the expanded form is that polynomial addition and scalar multiplication is much easier than in factored form. For example, adding polynomials just requires matching up the monomials of similar powers. For the factored form, polynomial multiplication is much easier than expanded form. For the factored form it is easy to read off *roots* of the polynomial (values of $x$ where $p$ is $0$), as a product is $0$ only if a term is $0$, so any zero must be a zero of a factor. Factored form has other technical advantages. For example, the polynomial $(x-1)^{1000}$ can be compactly represented using the factored form, but would require $1001$ coefficients to store in expanded form. (As well, due to floating point differences, the two would evaluate quite differently as one would require over a $1000$ operations to compute, the other just two.)
There are values to each representation. One value of the expanded form is that polynomial addition and scalar multiplication is much easier than in factored form. For example, adding polynomials just requires matching up the monomials of similar powers. (These can be realized easily as vector operations.) For the factored form, polynomial multiplication is much easier than expanded form. For the factored form it is easy to read off *roots* of the polynomial (values of $x$ where $p$ is $0$), as a product is $0$ only if a term is $0$, so any zero must be a zero of a factor. Factored form has other technical advantages. For example, the polynomial $(x-1)^{1000}$ can be compactly represented using the factored form, but would require $1001$ coefficients to store in expanded form. (As well, due to floating point differences, the two would evaluate quite differently as one would require over a $1000$ operations to compute, the other just two.)
Translating from factored form to expanded form can be done by carefully following the distributive law of multiplication. For example, with some care it can be shown that:
@@ -587,7 +614,7 @@ It is easy to create a symbolic expression from a function - just evaluate the f
f(x)
```
This is easy - but can also be confusing. The function object is `f`, the expression is `f(x)` - the function evaluated on a symbolic object. Moreover, as seen, the symbolic expression can be evaluated using the same syntax as a function call:
This is easy---but can also be confusing. The function object is `f`, the expression is `f(x)`---the function evaluated on a symbolic object. Moreover, as seen, the symbolic expression can be evaluated using the same syntax as a function call:
```{julia}

View File

@@ -448,12 +448,12 @@ To get the numeric approximation, we can broadcast:
N.(solveset(p ~ 0, x))
```
(There is no need to call `collect` -- though you can -- as broadcasting over a set falls back to broadcasting over the iteration of the set and in this case returns a vector.)
(There is no need to call `collect`---though you can---as broadcasting over a set falls back to broadcasting over the iteration of the set and in this case returns a vector.)
## Do numeric methods matter when you can just graph?
It may seem that certain practices related to roots of polynomials are unnecessary as we could just graph the equation and look for the roots. This feeling is perhaps motivated by the examples given in textbooks to be worked by hand, which necessarily focus on smallish solutions. But, in general, without some sense of where the roots are, an informative graph itself can be hard to produce. That is, technology doesn't displace thinking - it only supplements it.
It may seem that certain practices related to roots of polynomials are unnecessary as we could just graph the equation and look for the roots. This feeling is perhaps motivated by the examples given in textbooks to be worked by hand, which necessarily focus on smallish solutions. But, in general, without some sense of where the roots are, an informative graph itself can be hard to produce. That is, technology doesn't displace thinking---it only supplements it.
For another example, consider the polynomial $(x-20)^5 - (x-20) + 1$. In this form we might think the roots are near $20$. However, were we presented with this polynomial in expanded form: $x^5 - 100x^4 + 4000x^3 - 80000x^2 + 799999x - 3199979$, we might be tempted to just graph it to find roots. A naive graph might be to plot over $[-10, 10]$:
@@ -553,7 +553,7 @@ N.(solve(j ~ 0, x))
### Cauchy's bound on the magnitude of the real roots.
Descartes' rule gives a bound on how many real roots there may be. Cauchy provided a bound on how large they can be. Assume our polynomial is monic (if not, divide by $a_n$ to make it so, as this won't effect the roots). Then any real root is no larger in absolute value than $|a_0| + |a_1| + |a_2| + \cdots + |a_{n-1}| + 1$, (this is expressed in different ways.)
Descartes' rule gives a bound on how many real roots there may be. Cauchy provided a bound on how large they can be. Assume our polynomial is monic (if not, divide by $a_n$ to make it so, as this won't effect the roots). Then any real root is no larger in absolute value than $h = 1 + |a_0| + |a_1| + |a_2| + \cdots + |a_{n-1}|$, (this is expressed in different ways.)
To see precisely [why](https://captainblack.wordpress.com/2009/03/08/cauchys-upper-bound-for-the-roots-of-a-polynomial/) this bound works, suppose $x$ is a root with $|x| > 1$ and let $h$ be the bound. Then since $x$ is a root, we can solve $a_0 + a_1x + \cdots + 1 \cdot x^n = 0$ for $x^n$ as:
@@ -563,7 +563,7 @@ $$
x^n = -(a_0 + a_1 x + \cdots a_{n-1}x^{n-1})
$$
Which after taking absolute values of both sides, yields:
Which after taking absolute values of both sides, yields by the triangle inequality:
$$

View File

@@ -18,7 +18,7 @@ import SymPy # imported only: some functions, e.g. degree, need qualification
---
While `SymPy` can be used to represent polynomials, there are also native `Julia` packages available for this and related tasks. These packages include `Polynomials`, `MultivariatePolynomials`, and `AbstractAlgebra`, among many others. (A search on [juliahub.com](juliahub.com) found over $50$ packages matching "polynomial".) We will look at the `Polynomials` package in the following, as it is straightforward to use and provides the features we are looking at for univariate polynomials.
While `SymPy` can be used to represent polynomials, there are also native `Julia` packages available for this and related tasks. These packages include `Polynomials`, `MultivariatePolynomials`, and `AbstractAlgebra`, among many others. (A search on [juliahub.com](https://juliahub.com) found almost $100$ packages matching "polynomial".) We will look at the `Polynomials` package in the following, as it is straightforward to use and provides the features we are looking at for *univariate* polynomials.
## Construction
@@ -76,7 +76,7 @@ Polynomials may be evaluated using function notation, that is:
p(1)
```
This blurs the distinction between a polynomial expression a formal object consisting of an indeterminate, coefficients, and the operations of addition, subtraction, multiplication, and non-negative integer powers and a polynomial function.
This blurs the distinction between a polynomial expression---a formal object consisting of an indeterminate, coefficients, and the operations of addition, subtraction, multiplication, and non-negative integer powers---and a polynomial function.
The polynomial variable, in this case `1x`, can be returned by `variable`:
@@ -86,14 +86,15 @@ The polynomial variable, in this case `1x`, can be returned by `variable`:
x = variable(p)
```
This variable is a `Polynomial` object, so can be manipulated as a polynomial; we can then construct polynomials through expressions like:
This variable is a `Polynomial` object that prints as `x`.
These variables can be manipulated as any polynomial. We can then construct polynomials through expressions like:
```{julia}
r = (x-2)^2 * (x-1) * (x+1)
```
The product is expanded for storage by `Polynomials`, which may not be desirable for some uses. A new variable can be produced by calling `variable()`; so we could have constructed `p` by:
The product is expanded for storage by `Polynomials`, which may not be desirable for some uses. A new variable can also be produced by calling `variable()`; so we could have constructed `p` by:
```{julia}
@@ -125,7 +126,7 @@ The `Polynomials` package has different ways to represent polynomials, and a fac
fromroots(FactoredPolynomial, [2, 2, 1, -1])
```
This form is helpful for some operations, for example polynomial multiplication and positive integer exponentiation, but not others such as addition of polynomials, where such polynomials must first be converted to the standard basis to add and are then converted back into a factored form.
This form is helpful for some operations, for example polynomial multiplication and positive integer exponentiation, but not others such as addition of polynomials, where such polynomials must first be converted to the standard basis to add and are then converted back into a factored form. (A task that may suffer from floating point roundoff.)
---
@@ -181,7 +182,7 @@ A consequence of the fundamental theorem of algebra and the factor theorem is th
:::{.callout-note}
## Note
`SymPy` also has a `roots` function. If both `Polynomials` and `SymPy` are used together, calling `roots` must be qualified, as with `Polynomials.roots(...)`. Similarly, `degree` is provided in both, so it too must be qualified.
`SymPy` also has a `roots` function. If both `Polynomials` and `SymPy` are used together, calling `roots` must be qualified, as with `Polynomials.roots(...)`. Similarly, `degree` is exported in both, so it too must be qualified.
:::

View File

@@ -15,7 +15,7 @@ import Polynomials
using RealPolynomialRoots
```
The `Polynomials` package is "imported" to avoid naming collisions with `SymPy`; names will need to be qualified.
The `Polynomials` package is "imported" to avoid naming collisions with `SymPy`;some names may need to be qualified.
@@ -410,7 +410,7 @@ The usual recipe for construction follows these steps:
* Identify "test points" within each implied interval (these are $(-\infty, -1)$, $(-1,0)$, $(0,1)$, and $(1, \infty)$ in the example) and check for the sign of $f(x)$ at these test points. Write in `-`, `+`, `0`, or `*`, as appropriate. The value comes from the fact that "continuous" functions may only change sign when they cross $0$ or are undefined.
With the computer, where it is convenient to draw a graph, it might be better to emphasize the sign on the graph of the function. The `sign_chart` function from `CalculusWithJulia` does this by numerically identifying points where the function is $0$ or $\infty$ and indicating the sign as $x$ crosses over these points.
With the computer, where it is convenient to draw a graph, it might be better to emphasize the sign on the graph of the function, but at times numeric values are preferred. The `sign_chart` function from `CalculusWithJulia` does this analysis by numerically identifying points where the function is $0$ or $\infty$ and indicating the sign as $x$ crosses over these points.
```{julia}
@@ -419,6 +419,8 @@ f(x) = x^3 - x
sign_chart(f, -3/2, 3/2)
```
This format is a bit different from above, but shows to the left of $-1$ a minussign; between $-1$ and $0$ a plus sign; between $0$ and $1$ a minus sign; and between $1$ and $3/2$ a plus sign.
## Pade approximate

View File

@@ -22,7 +22,7 @@ nothing
---
Thinking of functions as objects themselves that can be manipulated - rather than just blackboxes for evaluation - is a major abstraction of calculus. The main operations to come: the limit *of a function*, the derivative *of a function*, and the integral *of a function* all operate on functions. Hence the idea of an [operator](http://tinyurl.com/n5gp6mf). Here we discuss manipulations of functions from pre-calculus that have proven to be useful abstractions.
Thinking of functions as objects themselves that can be manipulated---rather than just blackboxes for evaluation---is a major abstraction of calculus. The main operations to come: the limit *of a function*, the derivative *of a function*, and the integral *of a function* all operate on functions. Hence the idea of an [operator](http://tinyurl.com/n5gp6mf). Here we discuss manipulations of functions from pre-calculus that have proven to be useful abstractions.
## The algebra of functions
@@ -141,7 +141,7 @@ The real value of composition is to break down more complicated things into a se
### Shifting and scaling graphs
It is very useful to mentally categorize functions within families. The difference between $f(x) = \cos(x)$ and $g(x) = 12\cos(2(x - \pi/4))$ is not that much - both are cosine functions, one is just a simple enough transformation of the other. As such, we expect bounded, oscillatory behaviour with the details of how large and how fast the oscillations are to depend on the specifics of the function. Similarly, both these functions $f(x) = 2^x$ and $g(x)=e^x$ behave like exponential growth, the difference being only in the rate of growth. There are families of functions that are qualitatively similar, but quantitatively different, linked together by a few basic transformations.
It is very useful to mentally categorize functions within families. The difference between $f(x) = \cos(x)$ and $g(x) = 12\cos(2(x - \pi/4))$ is not that much---both are cosine functions, one is just a simple enough transformation of the other. As such, we expect bounded, oscillatory behaviour with the details of how large and how fast the oscillations are to depend on the specifics of the function. Similarly, both these functions $f(x) = 2^x$ and $g(x)=e^x$ behave like exponential growth, the difference being only in the rate of growth. There are families of functions that are qualitatively similar, but quantitatively different, linked together by a few basic transformations.
There is a set of operations of functions, which does not really change the type of function. Rather, it basically moves and stretches how the functions are graphed. We discuss these four main transformations of $f$:
@@ -220,7 +220,7 @@ Scaling by $2$ shrinks the non-zero domain, scaling by $1/2$ would stretch it. I
---
More exciting is what happens if we combine these operations.
More exciting is what happens if we compose these operations.
A shift right by $2$ and up by $1$ is achieved through
@@ -267,7 +267,7 @@ We can view this as a composition of "scale" by $1/a$, then "over" by $b$, and
#| hold: true
a = 2; b = 5
h(x) = stretch(over(scale(f, 1/a), b), 1/a)(x)
plot(f, -1, 8, label="f")
plot(f, -1, 8, label="f"; xticks=-1:8)
plot!(h, label="h")
```
@@ -322,7 +322,7 @@ datetime = 12 + 10/60 + 38/60/60
delta = (newyork(266) - datetime) * 60
```
This is off by a fair amount - almost $8$ minutes. Clearly a trigonometric model, based on the assumption of circular motion of the earth around the sun, is not accurate enough for precise work, but it does help one understand how summer days are longer than winter days and how the length of a day changes fastest at the spring and fall equinoxes.
This is off by a fair amount---almost $8$ minutes. Clearly a trigonometric model, based on the assumption of circular motion of the earth around the sun, is not accurate enough for precise work, but it does help one understand how summer days are longer than winter days and how the length of a day changes fastest at the spring and fall equinoxes.
##### Example: the pipeline operator
@@ -358,7 +358,7 @@ Suppose we have a data set like the following:^[Which comes from the "Palmer Pen
| 48.8 | 18.4 | 3733 | male | Chinstrap |
| 47.5 | 15.0 | 5076 | male | Gentoo |
We might want to plot on an $x-y$ axis flipper length versus bill length but also indicate body size with a large size marker for bigger sizes.
We might want to plot on an $x$-$y$ axis flipper length versus bill length but also indicate body size with a large size marker for bigger sizes.
We could do so by transforming a marker: scaling by size, then shifting it to an `x-y` position; then plotting. Something like this:
@@ -473,7 +473,7 @@ S(D(f))(15), f(15) - f(0)
That is the accumulation of differences is just the difference of the end values.
These two operations are discrete versions of the two main operations of calculus - the derivative and the integral. This relationship will be known as the "fundamental theorem of calculus."
These two operations are discrete versions of the two main operations of calculus---the derivative and the integral. This relationship will be known as the "fundamental theorem of calculus."
## Questions

View File

@@ -34,12 +34,37 @@ For a right triangle with angles $\theta$, $\pi/2 - \theta$, and $\pi/2$ ($0 < \
```{julia}
#| hide: true
#| echo: false
p = plot(legend=false, xlim=(-1/4,5), ylim=(-1/2, 3),
xticks=nothing, yticks=nothing, border=:none)
plot!([0,4,4,0],[0,0,3,0], linewidth=3)
del = .25
plot!([4-del, 4-del,4], [0, del, del], color=:black, linewidth=3)
annotate!([(.75, .25, "θ"), (4.0, 1.25, "opposite"), (2, -.25, "adjacent"), (1.5, 1.25, "hypotenuse")])
let
gr()
p = plot(;legend=false,
xticks=nothing, yticks=nothing,
border=:none,
xlim=(-1/4,5), ylim=(-1/2, 3))
plot!([0,4,4,0],[0,0,3,0], linewidth=3)
θ = atand(3,4)
del = .25
plot!([4-del, 4-del,4], [0, del, del], color=:black, linewidth=3)
theta = pi/20
r = sqrt((3/4)^2 + (1/4)^2)
ts = range(0, theta, 20)
plot!(r*cos.(ts), r*sin.(ts); line=(:gray, 1))
ts = range(atan(3/4) - theta, atan(3,4), 20)
plot!(r*cos.(ts), r*sin.(ts); line=(:gray, 1))
annotate!([
(.75, .25, L"\theta"),
(4.0, 1.5+.1, text("opposite", rotation=90,:top)),
(2, -.25, "adjacent"),
(2, 1.5+.1, text("hypotenuse", rotation=θ,:bottom))
])
end
```
```{julia}
#| echo: false
plotly()
nothing
```
With these, the basic definitions for the primary trigonometric functions are
@@ -77,9 +102,13 @@ gr()
function plot_angle(m)
r = m*pi
n,d = numerator(m), denominator(m)
ts = range(0, stop=2pi, length=100)
tit = "$m ⋅ pi -> ($(round(cos(r), digits=2)), $(round(sin(r), digits=2)))"
tit = latexstring("\\frac{$n}{$d}") *
L"\cdot\pi\rightarrow (" *
latexstring("$(round(cos(r), digits=2)),$(round(sin(r), digits=2)))")
ts = range(0, 2pi, 151)
p = plot(cos.(ts), sin.(ts), legend=false, aspect_ratio=:equal,title=tit)
plot!(p, [-1,1], [0,0], color=:gray30)
plot!(p, [0,0], [-1,1], color=:gray30)
@@ -95,13 +124,11 @@ function plot_angle(m)
plot!(p, [0,l*cos(r)], [0,l*sin(r)], color=:green, linewidth=4)
scatter!(p, [cos(r)], [sin(r)], markersize=5)
annotate!(p, [(1/4+cos(r), sin(r), "(x,y)")])
annotate!(p, [(1/4+cos(r), sin(r), L"(x,y)")])
p
end
## different linear graphs
anim = @animate for m in -4//3:1//6:10//3
plot_angle(m)
@@ -242,7 +269,7 @@ plot(sin, 0, 4pi)
The graph shows two periods. The wavy aspect of the graph is why this function is used to model periodic motions, such as the amount of sunlight in a day, or the alternating current powering a computer.
From this graph - or considering when the $y$ coordinate is $0$ - we see that the sine function has zeros at any integer multiple of $\pi$, or $k\pi$, $k$ in $\dots,-2,-1, 0, 1, 2, \dots$.
From this graph---or considering when the $y$ coordinate is $0$---we see that the sine function has zeros at any integer multiple of $\pi$, or $k\pi$, $k$ in $\dots,-2,-1, 0, 1, 2, \dots$.
The cosine function is similar, in that it has the same domain and range, but is "out of phase" with the sine curve. A graph of both shows the two are related:
@@ -406,7 +433,7 @@ Suppose both $\alpha$ and $\beta$ are positive with $\alpha + \beta \leq \pi/2$.
```{julia}
#| echo: false
gr()
using Plots, LaTeXStrings
# two angles
@@ -425,7 +452,10 @@ color1 = :royalblue
color2 = :forestgreen
color3 = :brown3
color4 = :mediumorchid2
canvas() = plot(axis=([],false), legend=false, aspect_ratio=:equal)
canvas() = plot(;
axis=([],false),
legend=false,
aspect_ratio=:equal)
p1 = canvas()
plot!(Shape([A,B,F]), fill=(color4, 0.15))
@@ -440,29 +470,29 @@ ddf = sqrt(sum((D.-F).^2))
Δ = 0.0
alphabeta = (r*cos(α/2 + β/2), r*sin(α/2 + β/2),
text("α + β",:hcenter; rotation=pi/2))
cosαβ = (B[1]/2, 0, text("cos(α + β)", :top))
sinαβ = (B[1], F[2]/2, text("sin(α + β)"))
text(L"\alpha + \beta",:left, rotation=pi/2))
cosαβ = (B[1]/2, 0, text(L"\cos(\alpha + \beta)", :top))
sinαβ = (B[1], F[2]/2, text(L"\sin(\alpha + \beta)", rotation=90,:top))
txtpoints = (
one = (F[1]/2, F[2]/2, "1",:right),
one = (F[1]/2, F[2]/2, text(L"1", :bottom)),
beta=(r*cos(α + β/2), r*sin(α + β/2),
text(", :hcenter)),
text(L"\beta", :hcenter)),
alpha = (r*cos(α/2), r*sin(α/2),
text("α",:hcenter)),
text(L"\alpha",:hcenter)),
alphaa = (F[1] + r*sin(α/2), F[2] - r*cos(α/2) ,
text("α"),:hcenter),
text(L"\alpha"),:hcenter),
cosβ = (dae/2*cos(α),dae/2*sin(α) + Δ,
text("cos(β)",:hcenter)),
text(L"\cos(\beta)",:bottom, rotation=rad2deg(α))),
sinβ = (B[1] + dbc/2 + Δ/2, D[2] + ddf/2 + Δ/2,
text("sin(β)",:bottom)),
cosαcosβ = (C[1]/2, 0 - Δ, text("cos(α)cos(β)", :top)),
sinαcosβ = (cos(α)*cos(β) - 0.1, dce/2 ,
text("sin(α)cos(β)", :hcenter)),
text(L"\sin(\beta)",:bottom, rotation=-(90-rad2deg(α)))),
cosαcosβ = (C[1]/2, 0 - Δ, text(L"\cos(\alpha)\cos(\beta)", :top)),
sinαcosβ = (cos(α)*cos(β), dce/2 ,
text(L"\sin(\alpha)\cos(\beta)", :top, rotation=90)),
cosαsinβ = (D[1] - Δ, D[2] + ddf/2 ,
text("cos(α)sin(β)", :top)),
text(L"\cos(\alpha)\sin(\beta)", :bottom, 10, rotation=90)),
sinαsinβ = (D[1] + dde/2, D[2] + Δ ,
text("sin(α)sin(β)", :top)),
text(L"\sin(\alpha)\sin(\beta)", 10, :top)),
)
# Plot 1
@@ -574,6 +604,12 @@ $$
$$
:::
```{julia}
#| echo: false
plotly()
nothing
```
##### Example
@@ -657,14 +693,144 @@ atan(y, x)
##### Example
A (white) light shining through a [prism](http://tinyurl.com/y8sczg4t) will be deflected depending on the material of the prism and the angles involved (refer to the link for a figure). The relationship can be analyzed by tracing a ray through the figure and utilizing Snell's law. If the prism has index of refraction $n$ then the ray will deflect by an amount $\delta$ that depends on the angle, $\alpha$ of the prism and the initial angle ($\theta_0$) according to:
A (white) light shining through a [dispersive prism](https://en.wikipedia.org/wiki/Dispersive_prism) will be deflected depending on the material of the prism and the angles involved. The relationship can be analyzed by tracing a ray through the figure and utilizing Snell's law which relates the angle of incidence with the angle of refraction through two different media through:
$$
\delta = \theta_0 - \alpha + \arcsin(n \sin(\alpha - \arcsin(\frac{1}{n}\sin(\theta_0)))).
n_0 \sin(\theta_0) = n_1 \sin(\theta_1)
$$
If $n=1.5$ (glass), $\alpha = \pi/3$ and $\theta_0=\pi/6$, find the deflection (in radians).
:::{#fig-snells-law-prism}
```{julia}
#| echo: false
p1 = let
gr()
plot(; empty_style..., aspect_ratio=:equal)
n₀,n₁,n₂ = 1,3, 1
θ₀ = pi/7
α = pi/8
θ₁ = asin(n₀/n₁* θ₀)
θ₁′ = α - θ₁
θ₂′ = asin(n₁/n₂ * θ₁′)
θ₂ = θ₂′ - α
plot!([(-1,0), (1,0)]; line=(:black, 1))
plot!([(0,-1),(0,1)]; line=(:black, 1))
plot!([(0,1), (2tan(α),-1)]; line=(:black, 1))
S = Shape([(0,-1),(2tan(α),-1),(0,1)])
plot!(S, fill=(:gray80, 0.25), line=nothing)
xx = tan(α)/ (1 + tan(α)*tan(θ₁))
sl(x) = 1 - x/tan(α)
yy = sl(xx)
plot!(sl, xx-1/8, xx+1/9; line=(:red,3))
plot!(x -> yy + tan(α)*(x-xx), xx-1/4, xx+1/4; line=(:red, 3))
plot!([(-1,-sin(θ₀)), (0,0)]; line=(:black, 2))
plot!([(0,0), (xx, xx*tan(θ₁))]; line=(:black, 2))
plot!([(xx,yy), (xx + 5/8, yy - 5/8*tan(θ₂))]; line=(:black, 2))
annotate!([
(-1/2,1/2*sin(pi + θ₀/2), text(L"\theta_0")),
(1/5, 1/5*sin(θ₁/2), text(L"\theta_1")),
(2tan(α), -0.075, text(L"\theta_2")),
(-1/2, -3/4, text(L"n_0")),
(2tan(α/2), -3/4, text(L"n_1")),
(1 - (1-2tan(α))/2, -3/4, text(L"n_2"))
])
current()
end
p2 = let
plot(; empty_style..., aspect_ratio=:equal)
n₀,n₁,n₂ = 1,3, 1
θ₀ = pi/7
α = pi/8
θ₁ = asin(n₀/n₁* θ₀)
θ₁′ = α - θ₁
θ₂′ = asin(n₁/n₂ * θ₁′)
θ₂ = θ₂′ - α
xx = tan(α)/ (1 + tan(α)*tan(θ₁))
sl(x) = 1 - x/tan(α)
yy = sl(xx)
plot!(sl, xx-1/8, xx+1/9; line=(:red,3))
plot!(x -> yy + tan(α)*(x-xx), xx-1/4, xx+1/4; line=(:red, 3))
S = Shape([(0, sl(xx-1/8)), (xx-1/8,sl(xx-1/8)),
(xx+1/9, sl(xx+1/9)), (0, sl(xx+1/9))])
plot!(S, fill=(:gray80, 0.25), line=nothing)
#plot!([(-1,-sin(θ₀)), (0,0)]; line=(:black, 2))
plot!([(0,0), (xx, xx*tan(θ₁))]; line=(:black, 2))
plot!([(xx,yy), (xx + 2/8, yy - 2/8*tan(θ₂))]; line=(:black, 2))
annotate!([
(1/5, .1*sin(θ₁/2), text(L"\theta_1\prime")),
(xx + .1, 0.06, text(L"\theta_2\prime")),
(2tan(α/2), -1/8, text(L"n_1")),
(17/32, -1/8, text(L"n_2"))
])
current()
end
plot(p1, p2; layout=(1,2))
```
Light bending through a prism. The right graphic shows the second bending.
:::
```{julia}
#| echo: false
plotly()
nothing
```
Following Wikipedia, we have
$$
\theta_1 = \sin^{-1}\left( \frac{n_0}{n_1} \sin(\theta_0) \right)
$$
Both $\theta_0$ and $\theta_1$ are measured with respect to the coordinate system that looks like the $x-y$ plane. The red coordinate system is used to identify the angle of incidence for the second bending. Some right-triangle geometry relates the new angle $\theta'_1$ with $\theta_1$ through $\theta'_1 = \alpha - \theta_1$. With this new angle of incidence, the angle of refraction, $\theta'_2$, satisfies:
$$
n1 \sin(\theta'_1) = n2 \sin(\theta'_2)
$$
Or
$$
\theta'_2 = \sin^{-1}\left(\frac{n_1}{n_2}\sin(\theta'_1) \right)
$$
Finally, using right-triangle geometry, the angle $\theta_2 = \theta'_2 - \alpha$ can be identified.
For a prism, in air, we would have $n_0 = n_2 = 1$. Letting $n_1 = n$, and combining we get
$$
\begin{align*}
\delta &= \theta_0 + \theta_2\\
&=\theta_0 + \sin^{-1}\left(\frac{n_1}{n_2}\sin(\theta'_1) \right)- \alpha\\
&= \theta_0 - \alpha + \sin^{-1}\left(\frac{n}{1}\sin(\alpha -\theta_1) \right)\\
&= \theta_0 - \alpha + \sin^{-1}\left(n\sin\left(\alpha - \sin^{-1}\left( \frac{n_0}{n_1} \sin(\theta_0) \right)\right)\right)\\
&= \theta_0 - \alpha + \sin^{-1}\left(n\sin\left(\alpha -\sin^{-1}\left( \frac{1}{n} \sin(\theta_0) \right)\right) \right)
\end{align*}
$$
If the prism has index of refraction $n$ then the ray will deviate by this amount $\delta$ that depends on the initial incidence angle, $\alpha$ of the prism and $n$.
When $n=1.5$ (glass), $\alpha = \pi/3$ and $\theta_0=\pi/6$, find the deflection (in radians).
We have:
@@ -759,6 +925,8 @@ plot(abs ∘ T4, -1,1, label="|T₄|")
plot!(abs ∘ q, -1,1, label="|q|")
```
We will return to this family of polynomials in the section on Orthogonal Polynomials.
## Hyperbolic trigonometric functions
@@ -1028,4 +1196,3 @@ Is this identical to the pattern for the regular sine function?
#| echo: false
yesnoq(false)
```

View File

@@ -18,6 +18,15 @@
@Misc{Angenent,
key = {WisconsinCalculus},
author = {Sigurd Angenent},
title = {Wisconsin Calculus},
howpublished = {https://github.com/SigurdAngenent/WisconsinCalculus/tree/master},
year = 2012,
note = {GNU Free Documentation License, Version 1.2}
}
@Book{Schey,
author = {H.M. Schey},
title = {Div, Grad, Curl, and all that},
@@ -87,3 +96,25 @@ edition = {Julia adaptation},
URL = {https://tobydriscoll.net/fnc-julia/frontmatter.html},
eprint = {https://epubs.siam.org/doi/pdf/10.1137/1.9781611975086}
}
## matrix calculus
@misc{BrightEdelmanJohnson,
title={Matrix Calculus (for Machine Learning and Beyond)},
author={Paige Bright and Alan Edelman and Steven G. Johnson},
year={2025},
eprint={2501.14787},
archivePrefix={arXiv},
primaryClass={math.HO},
url={https://arxiv.org/abs/2501.14787},
}
@misc{CarlssonNikitinTroedssonWendt,
title={The bilinear Hessian for large scale optimization},
author={Marcus Carlsson and Viktor Nikitin and Erik Troedsson and Herwig Wendt},
year={2025},
eprint={2502.03070},
archivePrefix={arXiv},
primaryClass={math.OC},
url={https://arxiv.org/abs/2502.03070},
}