minor edits

This commit is contained in:
jverzani 2023-06-27 18:39:13 -04:00
parent f0f81e86d1
commit c8342b85a8
3 changed files with 31 additions and 23 deletions

View File

@ -688,12 +688,9 @@ $$
### Chain rule
Finally, the derivative of a composition of functions can be computed using pieces of each function. This gives a rule called the *chain rule*. Before deriving, let's give a slight motivation through two examples.
Finally, the derivative of a composition of functions can be computed using pieces of each function. This gives a rule called the *chain rule*. Before deriving, let's give a slight motivation through an example.
The first involves working out on a treadmill. For this example, there is a presumed linear relationship between miles run and calories burned. With that, the rate of calories burned per hour would be proportional to the miles per hours
Now consider the output of a factory for some widget. It depends on two steps: an initial manufacturing step and a finishing step. The number of employees is important in how much is initially manufactured. Suppose $x$ is the number of employees and $g(x)$ is the amount initially manufactured. Adding more employees increases the amount made by the made-up rule $g(x) = \sqrt{x}$. The finishing step depends on how much is made by the employees. If $y$ is the amount made, then $f(y)$ is the number of widgets finished. Suppose for some reason that $f(y) = y^2.$
Consider the output of a factory for some widget. It depends on two steps: an initial manufacturing step and a finishing step. The number of employees is important in how much is initially manufactured. Suppose $x$ is the number of employees and $g(x)$ is the amount initially manufactured. Adding more employees increases the amount made by the made-up rule $g(x) = \sqrt{x}$. The finishing step depends on how much is made by the employees. If $y$ is the amount made, then $f(y)$ is the number of widgets finished. Suppose for some reason that $f(y) = y^2.$
How many widgets are made as a function of employees? The composition $u(x) = f(g(x))$ would provide that. Changes in the initial manufacturing step lead to changes in how much is initially made; changes in the initial amount made leads to changes in the finished products. Each change contributes to the overall change.
@ -824,7 +821,11 @@ Find the derivative of $\sin(x)\cos(2x)$ at $x=\pi$.
##### Proof of the Chain Rule
A function is *differentiable* at $a$ if the following limit exists $\lim_{h \rightarrow 0}(f(a+h)-f(a))/h$. Reexpressing this as: $f(a+h) - f(a) - f'(a)h = \epsilon_f(h) h$ where as $h\rightarrow 0$, $\epsilon_f(h) \rightarrow 0$. Then, we have:
A function is *differentiable* at $a$ if the following limit exists $\lim_{h \rightarrow 0}(f(a+h)-f(a))/h$.
This is reexpressed as: $f(a+h) - f(a) - f'(a)h = \epsilon_f(h) h$ where as $h\rightarrow 0$, $\epsilon_f(h) \rightarrow 0$.
With that in mind, we have:
$$
@ -843,11 +844,12 @@ f(g(a) + g'(a)h + \epsilon_g(h)h) - f(g(a)) \\
Rearranging:
$$
f(g(a+h)) - f(g(a)) - f'(g(a)) g'(a) h = f'(g(a))\epsilon_g(h)h + \epsilon_f(h')(h') =
(f'(g(a)) \epsilon_g(h) + \epsilon_f(h') (g'(a) + \epsilon_g(h)))h =
\epsilon(h)h,
$$
\begin{align*}
f(g(a+h)) &- f(g(a)) - f'(g(a)) g'(a) h\\
&= f'(g(a))\epsilon_g(h)h + \epsilon_f(h')(h')\\
&=(f'(g(a)) \epsilon_g(h) + \epsilon_f(h') (g'(a) + \epsilon_g(h)))h \\
&=\epsilon(h)h,
\end{align*}
where $\epsilon(h)$ combines the above terms which go to zero as $h\rightarrow 0$ into one. This is the alternative definition of the derivative, showing $(f\circ g)'(a) = f'(g(a)) g'(a)$ when $g$ is differentiable at $a$ and $f$ is differentiable at $g(a)$.

View File

@ -45,7 +45,7 @@ As such there is a balancing act:
* if $h$ is too small the round-off errors are problematic,
* if $h$ is too big, the approximation to the limit is not good.
* if $h$ is too big the approximation to the limit is not good.
For the forward difference $h$ values around $10^{-8}$ are typically good, for the central difference, values around $10^{-6}$ are typically good.
@ -70,7 +70,7 @@ We can compare to the actual with:
```{julia}
@syms x
df = diff(f(x), x)
factual = N(df(c))
factual = convert(Float64, df(c))
abs(factual - fapprox)
```
@ -136,16 +136,16 @@ The forward derivative is found with:
```{julia}
𝒇(x) = sqrt(1 + sin(cos(x)))
𝒄, 𝒉 = pi/4, 1e-8
fwd = (𝒇(𝒄+𝒉) - 𝒇(𝒄))/𝒉
f(x) = sqrt(1 + sin(cos(x)))
c, h = pi/4, 1e-8
fwd = (f(c+h) - f(c))/h
```
That given by `D` is:
```{julia}
ds_value = D(𝒇)(𝒄)
ds_value = D(f)(c)
ds_value, fwd, ds_value - fwd
```
@ -153,11 +153,11 @@ Finally, `SymPy` gives an exact value we use to compare:
```{julia}
𝒇𝒑 = diff(𝒇(x), x)
fp = diff(f(x), x)
```
```{julia}
actual = N(𝒇𝒑(PI/4))
actual = convert(Float64, fp(PI/4))
actual - ds_value, actual - fwd
```

View File

@ -104,7 +104,7 @@ function D(::Val{:+}, ::Val{:nary}, args, var)
end
```
The `args` are always held in a container, so the unary method must pull out the first one. The binary case should read as: apply `D` to each of the two arguments, and then create a quoted expression containing the sum of the results. The dollar signs interpolate into the quoting. (The "primes" are unicode notation achieved through `\prime[tab]` and not operations.) The *nary* case does something similar, only uses splatting to produce the sum.
The `args` are always held in a container, so the unary method must pull out the first one. The binary case should read as: apply `D` to each of the two arguments, and then create a quoted expression containing the sum of the results. The dollar signs interpolate into the quoting. (The "primes" are unicode notation achieved through `\prime[tab]` and not operations.) The *nary* case does something similar, only using splatting to produce the sum.
Subtraction must also be implemented in a similar manner, but not for the *nary* case:
@ -195,7 +195,15 @@ function D(::Val{:cos}, ::Val{:unary}, args, var)
end
```
The pattern is similar for each. The `$a` factor is needed due to the *chain rule*. The above illustrates the simple pattern necessary to add a derivative rule for a function. More could be, but for this example the above will suffice, as now the system is ready to be put to work.
The pattern is similar for each. The `$a` factor is needed due to the *chain rule*. The above illustrates the simple pattern necessary to add a derivative rule for a function.
:::{.callout-note}
Several automatic differentiation packages use a set of rules defined following an interface spelled out in the package `ChainRules.jl`. Leveraging multi-dimensional derivatives, the chain rule is the only rule needed of the sum, product, quotient and chain rules.
:::
More functions could be included, but for this example the above will suffice, as now the system is ready to be put to work.
```{julia}
@ -223,5 +231,3 @@ D(D(ex₃, :x), :x)
```
The length of the expression should lead to further appreciation for simplification steps taken when doing such a computation by hand.