Compare commits

...

19 Commits

Author SHA1 Message Date
Steven G. Johnson
e4ff01607e rm Adam for now 2024-10-31 22:08:16 -04:00
Steven G. Johnson
9ed7e009b2 notes on gradient descent for quadratic functions 2024-10-31 22:06:46 -04:00
Steven G. Johnson
92d1615dfc tweak 2024-09-30 10:44:53 -04:00
Steven G. Johnson
1898b763a6 link 2024-09-30 09:42:46 -04:00
Steven G. Johnson
0a0e166616 link 2024-09-29 21:26:42 -04:00
Steven G. Johnson
d94c8aeb09 typo 2024-09-29 21:24:12 -04:00
Steven G. Johnson
547b7c5739 some LS fit notes 2024-09-29 21:19:47 -04:00
Steven G. Johnson
7800272033 add eigenproblem examples 2024-02-15 14:42:47 -05:00
Steven G. Johnson
21d4885e64 Update pset4sol.ipynb: fix code comment 2023-09-01 13:49:02 -04:00
Steven G. Johnson
245bc6befd link 0.30000000000000004 example 2023-05-15 14:10:06 -04:00
Steven G. Johnson
8d14eb5120 lecture 38 notes 2023-05-15 09:44:14 -04:00
Steven G. Johnson
5d63fb8438 lecture summary 2023-05-10 16:23:51 -04:00
Steven G. Johnson
f5b44d43e7 lecture 36 notes 2023-05-08 12:49:12 -04:00
Steven G. Johnson
e1c3af4f37 pset 6 solutions 2023-05-08 11:45:16 -04:00
Steven G. Johnson
76957f16dc lecture 35 notes 2023-05-05 14:43:28 -04:00
Steven G. Johnson
adc374586b lecture 34 notes 2023-05-03 14:36:01 -04:00
Steven G. Johnson
33481fffd0 lecture 33 notes 2023-05-01 17:40:09 -04:00
Steven G. Johnson
58931ea10a missing scale factor 2023-05-01 08:48:02 -04:00
Steven G. Johnson
b4dcc299cd pset 6 2023-04-28 20:13:13 -04:00
8 changed files with 6803 additions and 3 deletions

View File

@@ -327,13 +327,12 @@ http://dx.doi.org/10.1137/S1052623499362822) — I used the "linear and separabl
* Non-negative matrix factorization — guest lecture by [Prof. Ankur Moitra](https://people.csail.mit.edu/moitra/).
**Further reading:** Coming soon.
**Further reading:** See section 2.1 of Moitra's [*Algorithmic Aspects of Machine Learning*](http://people.csail.mit.edu/moitra/docs/bookexv2.pdf). An interesting application to image analysis can be found in [this paper](http://www.cs.columbia.edu/~blei/fogm/2020F/readings/LeeSeung1999.pdf).
## Lecture 29 (Apr 21)
* [(Discrete) convolutions](https://en.wikipedia.org/wiki/Convolution#Discrete_convolution), translation-invariance, [circulant matrices](https://en.wikipedia.org/wiki/Circulant_matrix), and convolutional neural networks (CNNs)
* [pset 5 solutions](psets/pset5sol.ipynb)
* pset 6: coming soon, due Friday May 5.
**Further reading:** Strang textbook sections IV.2 (circulant matrices) and VII.2 (CNNs), and [OCW lecture 32](https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/resources/lecture-32-imagenet-is-a-cnn-the-convolution-rule/). See also these [Stanford lecture slides](http://cs231n.stanford.edu/slides/2016/winter1516_lecture7.pdf) and [MIT lecture slides](https://mit6874.github.io/assets/sp2020/slides/L03_CNNs_MK2.pdf).
@@ -352,6 +351,57 @@ http://dx.doi.org/10.1137/S1052623499362822) — I used the "linear and separabl
## Lecture 32 (Apr 25)
* [pset 6](psets/pset6.ipynb): due Friday May 5.
Fourier series vs. DFT: If we view the DFT as a [Riemann sum](https://en.wikipedia.org/wiki/Riemann_sum) approximation for a [Fourier series](https://en.wikipedia.org/wiki/Fourier_series) coefficient (which turns out to be *exponentially* accurate for smooth periodic f(t)!), then the errors are an instance of [aliasing](https://en.wikipedia.org/wiki/Aliasing) (see e.g. the ["wagon-wheel" effect](https://en.wikipedia.org/wiki/Wagon-wheel_effect)). For band-limited signals where we sample at a rate > twice the bandwidth, there is no aliasing and no information loss, a result known as the [Nyquist—Shannon sampling theorem](https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem); in the common case where the bandwidth is centered at ω=0, this corresponds to sampling at more than *twice* the highest frequency.
**Further reading**: For a periodic function, a Riemann sum is equivalent to a trapezoidal rule (since the 0th and Nth samples are identical), and the exponential convergence to the integral is reviewed by [Trefethen and Weideman (2014)](https://epubs.siam.org/doi/pdf/10.1137/130932132); SGJ gave a [simplified review for IAP (2011)](https://math.mit.edu/~stevenj/trap-iap-2011.pdf). The subject of aliasing, sampling, and signal processing leads to the field of [digital signal processing (DSP)](https://en.wikipedia.org/wiki/Digital_signal_processing), on which there are many books and courses. A classic textbook is [*Discrete-Time Signal Processing*](https://research.iaun.ac.ir/pd/naghsh/pdfs/UploadFile_2230.pdf) by Oppenheim and Schafer, and there are whole courses at MIT (like 6.3000 and 6.7000) on these topics.
## Lecture 33 (May 1)
* The [DTFT](https://en.wikipedia.org/wiki/Discrete-time_Fourier_transform), [windowing](https://en.wikipedia.org/wiki/Window_function) (e.g. relating DTFT to DFT) and spectral leakage.
* [Filter design](https://en.wikipedia.org/wiki/Filter_design) and [FIR filters](https://en.wikipedia.org/wiki/Finite_impulse_response).
**Further reading**: See the DSP links from lecture 32.
## Lecture 34 (May 3)
* FIR filter design = polynomial fitting. Windowing & least-squares, or minimax design by [Parks-McClellan](https://en.wikipedia.org/wiki/Parks%E2%80%93McClellan_filter_design_algorithm) and similar.
* Changing variables H(z) = ĥ(ω) for z=exp(iω): the [Z transform](https://en.wikipedia.org/wiki/Z-transform)
* [IIR filters](https://en.wikipedia.org/wiki/Infinite_impulse_response): definition, relation to [rational functions](https://en.wikipedia.org/wiki/Rational_function), first-order IIR filter example, stability.
**Further reading**: See the DSP links from lecture 32.
## Lecture 35 (May 5)
* IIR filter stability: poles zₚ must lie inside the unit circle |zₚ|
* IIR filter design: non-convex and complicated, lots of algorithms and approximations.
* [Kronecker products A⊗B](https://en.wikipedia.org/wiki/Kronecker_product) and ["vectorization" vec(A)](https://en.wikipedia.org/wiki/Vectorization_(mathematics)): expressing "multidimensional linear operations" ([multilinear algebra](https://en.wikipedia.org/wiki/Multilinear_algebra)) as ordinary matrixvector products.
* [pset 6 solutions](psets/pset6sol.ipynb)
## Lecture 36 (May 6)
* Kronecker products, [Hadamard matrices](https://en.wikipedia.org/wiki/Hadamard_matrix), and the mixed-product property (A⊗B)(C⊗D)=AC⊗BD: Kronecker products of unitary matrices are unitary.
* Kronecker products and the [fast WalshHadamard transform (FWHT)](https://en.wikipedia.org/wiki/Fast_Walsh%E2%80%93Hadamard_transform): fast algorithms (FWHT, FFT, …) as *sparse factorizations* of dense matrices via Kronecker products, and equivalently as mono-to-multi-dimensional mappings.
* [Sylvester equations](https://en.wikipedia.org/wiki/Sylvester_equation) and [Lyapunov equations](https://en.wikipedia.org/wiki/Lyapunov_equation): can be solved as ordinary matrixvector equations via Kronecker products, but this naively requires Θ(n⁶) operations, whereas exploiting the structure gives Θ(n³). Kronecker products are often a nice way to *think* about things but *not* to explicitly compute with.
## Lecture 37 (May 10)
* [Graphs](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) and their application to representing how entities are connected — lots of applications, from data mining of social networks or the web, to bioinformatics, to analyzing sparse-matrix algorithms
* Representing graphs via matrices: [incidence matrices E](https://en.wikipedia.org/wiki/Incidence_matrix), [adjacency matrices A](https://en.wikipedia.org/wiki/Adjacency_matrix), and [graph Laplacians L=EᵀE=D-A](https://en.wikipedia.org/wiki/Laplacian_matrix)
* [Graph partitioning](https://en.wikipedia.org/wiki/Graph_partition): spectral partitioning via the [Fiedler vector](https://en.wikipedia.org/wiki/Algebraic_connectivity), corresponding to the **second-smallest** eigenvalue of L
**Further reading:** Textbook sections VI.6VI.7 and [OCW lecture 35](https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/resources/lecture-35-finding-clusters-in-graphs-second-project-handwriting/). Using incidence matrices to identify cycles and spanning trees in graphs is also covered in 18.06 and in Strang's *Introduction to Linear Algebra* book (5th ed. section 10.1 or 6th ed. section 3.5), as well as in [this interactive Julia notebook](https://github.com/mitmath/1806/blob/1a9ff5c359b79f28c534bf1e1daeadfdc7aee054/notes/Graphs-Networks.ipynb). A popular software package for graph and mesh partitioning is [METIS](https://github.com/KarypisLab/METIS). The Google [PageRank algorithm](https://en.wikipedia.org/wiki/PageRank) is another nice application of linear algebra to graphs, in this case to rank web pages by "importance". Direct methods (e.g. Gaussian elimination) for sparse-matrix problems turn out to be all about analyzing sparsity patterns via graph theory; see e.g. the [book by Timothy Davis](https://epubs.siam.org/doi/book/10.1137/1.9780898718881).
## Lecture 38 (May 15)
* [Floating-point introduction](notes/Floating-Point-Intro.ipynb)
If you want to continue with "numerical computing" in any form — whether it be for data analysis, machine learning, scientific/engineering modeling, or other areas — at some point you will need to learn more about **computational errors**. The mathematical field that studies **algorithms + approximations** is called [numerical analysis](https://en.wikipedia.org/wiki/Numerical_analysis), covered by courses such as [18.335](https://github.com/mitmath/18335) at MIT.
One of the most basic sources of computational error is that **computer arithmetic is generally inexact**, leading to [roundoff errors](https://en.wikipedia.org/wiki/Round-off_error). The reason for this is simple: computers can only work with numbers having a **finite number of digits**, so they **cannot even store** arbitrary real numbers. Only a _finite subset_ of the real numbers can be represented using a particular number of "bits", and the question becomes _which subset_ to store, how arithmetic on this subset is defined, and how to analyze the errors compared to theoretical exact arithmetic on real numbers.
In **floating-point** arithmetic, we store both an integer coefficient and an exponent in some base: essentially, scientific notation. This allows large dynamic range and fixed _relative_ accuracy: if fl(x) is the closest floating-point number to any real x, then |fl(x)-x| < ε|x| where ε is the _machine precision_. This makes error analysis much easier and makes algorithms mostly insensitive to overall scaling or units, but has the disadvantage that it requires specialized floating-point hardware to be fast. Nowadays, all general-purpose computers, and even many little computers like your cell phones, have floating-point units.
Went through some simple definitions and examples in Julia (see notebook above), illustrating the basic ideas and a few interesting tidbits. In particular, we looked at **error accumulation** during long calculations (e.g. summation), as well as examples of [catastrophic cancellation](https://en.wikipedia.org/wiki/Loss_of_significance) and how it can sometimes be avoided by rearranging a calculation.
**Further reading:** [Trefethen & Bau's *Numerical Linear Algebra*](https://people.maths.ox.ac.uk/trefethen/text.html), lecture 13. [What Every Computer Scientist Should Know About Floating Point Arithmetic](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.6768) (David Goldberg, ACM 1991). William Kahan, [How Java's floating-point hurts everyone everywhere](http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf) (2004): contains a nice discussion of floating-point myths and misconceptions. A brief but useful summary can be found in [this Julia-focused floating-point overview](https://discourse.julialang.org/t/psa-floating-point-arithmetic/8678) by Prof. John Gibson. Because many programmers never learn how floating-point arithmetic actually works, there are [many common myths](https://github.com/mitmath/18335/blob/spring21/notes/fp-myths.pdf) about its behavior. (An infamous example is `0.1 + 0.2` giving `0.30000000000000004`, which people are puzzled by so frequently it has led to a web site [https://0.30000000000000004.com/](https://0.30000000000000004.com/)!)

1299
notes/Eigenproblems.ipynb Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -586,7 +586,7 @@
" # approximate gradient from this mini-batch\n",
" ∇f̃ = (2/M)*(Ã'*(Ã*x - b̃))\n",
"\n",
" # Adam update\n",
" # Yogi update\n",
" m = β₁*m + (1-β₁)*∇f̃\n",
" v = @. β₂*v - (1-β₂) * sign(v - ∇f̃^2) * (∇f̃^2)\n",
" m̂ = m / (1 - β₁^t) # de-bias: normalize by total weights\n",

131
psets/pset6.ipynb Normal file
View File

@@ -0,0 +1,131 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a65bff2e",
"metadata": {},
"source": [
"# 18.065 Problem Set 6\n",
"\n",
"Due Friday May 5."
]
},
{
"cell_type": "markdown",
"id": "46c6dd3c",
"metadata": {},
"source": [
"## Problem 1 (10+10 points)\n",
"\n",
"**(a)** In class, we showed that if $\\tilde{y} = \\bar{F} y, \\tilde{a} = \\bar{F} a, \\tilde{x} = \\bar{F} x$ where $F$ is the DFT matrix, then $y = a \\circledast x$ (circular convolution) implies $\\tilde{y} = \\tilde{a} \\, \\mbox{.*} \\, \\tilde{x}$ (element-wise product). Show that the reverse is also true: show that if $\\tilde{y} = \\tilde{a} \\circledast \\tilde{x}$, then $y = \\alpha (a \\, \\mbox{.*} \\, x) $ for some scale factor $\\alpha = ???$. \n",
"\n",
"**(b)** Differentiating through convolutions, in class, involved the \"reversal\" permutation $R$:\n",
"$$\n",
"R\\begin{pmatrix} x_0 \\\\ x_1 \\\\ x_2 \\\\ \\vdots \\\\ x_{N-1} \\end{pmatrix}\n",
"=\\begin{pmatrix} x_0 \\\\ x_{N-1} \\\\ x_{N-2} \\\\ \\vdots \\\\ x_1 \\end{pmatrix}\n",
"$$\n",
"What happens when we combine this with a DFT, e.g. to apply the convolution theorem? Show that $FRx = \\bar{F}x$.\n",
"\n",
"(This is illustrated by the code below: `fft(x)` computes $Fx$ and `bfft(x)` computes $\\bar{F}x$ using the [FFTW.jl package](https://github.com/JuliaMath/FFTW.jl).)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1ce6605",
"metadata": {},
"outputs": [],
"source": [
"using FFTW\n",
"R(x) = [x[begin]; x[end:-1:begin+1]] # compute R*x\n",
"x = rand(20)\n",
"fft(R(x)) ≈ bfft(x)"
]
},
{
"cell_type": "markdown",
"id": "fc26407d",
"metadata": {},
"source": [
"## Problem 2 (10+10+10 points)\n",
"\n",
"Think way back to grade school, when you learned to multiply two numbers $a_{N_a-1} \\cdots a_2 a_1 a_0 \\times x_{N_x-1} \\cdots x_2 x_1 x_0$, where $a_k$ and $x_k$ are the **decimal digits** of two numbers with $N_a$ and $N_x$ digits, respectively. (That is, $x_{N_x-1} \\cdots x_2 x_1 x_0 = \\sum_k {10}^k x_k$.)\n",
"\n",
"**(a)** Show that, if you ignore carries (i.e. you allow \"digits\" $> 9$) then the *result* $ y_{N_y-1} \\cdots y_2 y_1 y_0$ is in fact a **convolution** of the digits $y = a \\ast x$, but *not* a cyclic convolution — there is no \"wraparound\". (What is $N_y$?)\n",
"\n",
"**(b)** Show that you can write your previous answer as a *cyclic* convolution simply by \"padding\" $a,x,y$ with zeros. That is, if you add some zeros to each of them (in the right place) to extend them to a common length $N$ (what must $N$ be?), then $\\mbox{pad}(y,N) = \\mbox{pad}(a,N) \\circledast \\mbox{pad}(x,N)$.\n",
"\n",
"**(c)** Write Julia code to compute\n",
"$$\n",
"\\underbrace{94403147635990179114}_a \\times \\underbrace{33855185913241354461}_x \\\\\n",
"= \\underbrace{3196036114011618584561027454963352927554}_y\n",
"$$\n",
"by:\n",
"* Implement the function `pad(x,N)` to zero-pad as in part (b)\n",
"* Implement a function `conv(a,x)` to perform a circular convolution using $Fx =$ `fft(x)` and $\\bar{F}x =$ `bfft(x)`. (You may want to call `round.(Int,real(y))` on the result to discard tiny roundoff errors, since the final values should be integers).\n",
"* Implement a function `carry(y)` to perform the carries: any \"digits\" larger than $10$ should have the tens place \"carried\" to the next digit etc. (You may find the [`divrem` function](https://docs.julialang.org/en/v1/base/math/#Base.divrem) useful to divide by 10 and find a remainder.)\n",
"Check that you get the right answer: `bignum(carry(y)) == 3196036114011618584561027454963352927554`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ac00a3f",
"metadata": {},
"outputs": [],
"source": [
"using FFTW\n",
"\n",
"a = [4, 1, 1, 9, 7, 1, 0, 9, 9, 5, 3, 6, 7, 4, 1, 3, 0, 4, 4, 9]\n",
"x = [1, 6, 4, 4, 5, 3, 1, 4, 2, 3, 1, 9, 5, 8, 1, 5, 5, 8, 3, 3]\n",
"\n",
"bignum(x) = evalpoly(BigInt(10), x)\n",
"@show bignum(a)\n",
"@show bignum(x)\n",
"@show bignum(a) * bignum(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bdcd7ae6",
"metadata": {},
"outputs": [],
"source": [
"pad(x, N) = ...\n",
"conv(x,y) = ..."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53f03b79",
"metadata": {},
"outputs": [],
"source": [
"function carry(y)\n",
" ...\n",
"end"
]
}
],
"metadata": {
"@webio": {
"lastCommId": null,
"lastKernelId": null
},
"kernelspec": {
"display_name": "Julia 1.8.3",
"language": "julia",
"name": "julia-1.8"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

439
psets/pset6sol.ipynb Normal file
View File

@@ -0,0 +1,439 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a65bff2e",
"metadata": {},
"source": [
"# 18.065 Problem Set 6 Solutions"
]
},
{
"cell_type": "markdown",
"id": "46c6dd3c",
"metadata": {},
"source": [
"## Problem 1 (10+10 points)\n",
"\n",
"**(a)** In class, we showed that if $\\tilde{y} = \\bar{F} y, \\tilde{a} = \\bar{F} a, \\tilde{x} = \\bar{F} x$ where $F$ is the DFT matrix, then $y = a \\circledast x$ (circular convolution) implies $\\tilde{y} = \\tilde{a} \\, \\mbox{.*} \\, \\tilde{x}$ (element-wise product). Show that the reverse is also true: show that if $\\tilde{y} = \\tilde{a} \\circledast \\tilde{x}$, then $y = \\alpha (a \\, \\mbox{.*} \\, x) $ for some scale factor $\\alpha = ???$. \n",
"\n",
"**(b)** Differentiating through convolutions, in class, involved the \"reversal\" permutation $R$:\n",
"$$\n",
"R\\begin{pmatrix} x_0 \\\\ x_1 \\\\ x_2 \\\\ \\vdots \\\\ x_{N-1} \\end{pmatrix}\n",
"=\\begin{pmatrix} x_0 \\\\ x_{N-1} \\\\ x_{N-2} \\\\ \\vdots \\\\ x_1 \\end{pmatrix}\n",
"$$\n",
"What happens when we combine this with a DFT, e.g. to apply the convolution theorem? Show that $FRx = \\bar{F}x$.\n",
"\n",
"(This is illustrated by the code below: `fft(x)` computes $Fx$ and `bfft(x)` computes $\\bar{F}x$ using the [FFTW.jl package](https://github.com/JuliaMath/FFTW.jl).)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e1ce6605",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"true"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using FFTW\n",
"R(x) = [x[begin]; x[end:-1:begin+1]] # compute R*x\n",
"x = rand(20)\n",
"fft(R(x)) ≈ bfft(x)"
]
},
{
"cell_type": "markdown",
"id": "be151905",
"metadata": {},
"source": [
"## Solutions:\n",
"\n",
"**(a)** There are many ways to approach this problem, one approach is simply to re-trace our steps from class.\n",
"\n",
"In class, we showed that the columns of $F$ are eigenvectors of cyclic convolutions, with the corresponding eigenvalues given by $\\bar{F} a$. Hence we obtained $a \\circledast x = F (\\bar{F}a \\,\\mbox{.*}) F^{-1} x = \\frac{1}{N} F (\\bar{F}a \\, \\mbox{.*} \\, \\bar{F} x) = \\bar{F}^{-1} (\\tilde{a} \\,\\mbox{.*}\\, \\tilde{x})$.\n",
"\n",
"Now, we want to show almost the same thing but we want to transform in the opposite direction. That is, we want to replace $F$ with $F^{-1} = \\bar{F}/N$ and vice versa. If we proceed as in class, we can start by showing that the columns of $\\bar{F}$ are eigenvectors of convolutions. But our proof from class for $F$ only relied on the cyclic property $\\omega_N^N = 1$! The proof is completely unchanged if we replace $\\omega_N$ with $\\overline{\\omega_N}$ (corresponding to replacing $F$ with $\\bar{F}$)!\n",
"\n",
"Hence we can simply quote the results above and replace $F$ with $\\bar{F}$:\n",
"$$\n",
"\\tilde{y} = \\tilde{a} \\circledast \\tilde{x} = \\bar{F} (F\\tilde{a}\\, \\mbox{.*}) \\bar{F}^{-1} \\tilde{x} \n",
"= F^{-1} (F\\tilde{a} \\,\\mbox{.*}\\, F\\tilde{x})\n",
"$$\n",
"Since $\\tilde{x} = \\bar{F} x$, that means $x = \\bar{F}^{-1} \\tilde{x} = F \\tilde{x} / N$, i.e. $F\\tilde{x} = Nx$. Similarly for $a$. Substituting that in, we obtain:\n",
"$$\n",
"\\tilde{y} = N^2 F^{-1} (a \\,\\mbox{.*}\\, x) \\Longleftrightarrow \\boxed{y = N (a \\,\\mbox{.*}\\, x)}.\n",
"$$\n",
"which is the desired result with $\\boxed{\\alpha = N}$.\n",
"\n",
"*Optional*: Since it's easy to get this algebra wrong, let's do a quick check of our result in Julia:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d365ac92",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"true"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# brute-force circular convolution\n",
"function conv(a,x)\n",
" N = length(a); N == length(x) || throw(DimensionMismatch())\n",
" return [sum(i -> a[begin+i] * x[begin+mod(j-i, N)], 0:N-1) for j in 0:N-1]\n",
"end\n",
"\n",
"N = 10\n",
"ã = randn(ComplexF64, N)\n",
"x̃ = randn(ComplexF64, N)\n",
"ỹ = conv(ã, x̃)\n",
"a = fft(ã)/N # Fã/N\n",
"x = fft(x̃)/N # Fx̃/N\n",
"y = fft(ỹ)/N # Fỹ/N\n",
"y ≈ N * (a .* x)"
]
},
{
"cell_type": "markdown",
"id": "b61d45ed",
"metadata": {},
"source": [
"Hooray! Math works!\n",
"\n",
"**(b)**\n",
"\n",
"If $y = Rx$, then $y_n = x_{-n}$ (with indices interpreted $\\mod N$ as usual). If $z = FRx = Fy$, then we have:\n",
"$$\n",
"z_m = \\sum_{n=0}^{N-1} \\omega_N^{mn} y_n = \\sum_{n=0}^{N-1} \\omega_N^{mn} x_{-n} = \\sum_{k=0}^{N-1} \\omega_N^{-mk} x_{k} = (\\bar{F} x)_m\n",
"$$\n",
"where we simply performed a change of variables $k = -n$ in the sum. Hence $z = FRx = \\bar{F}x$ as desired."
]
},
{
"cell_type": "markdown",
"id": "fc26407d",
"metadata": {},
"source": [
"## Problem 2 (10+10+10 points)\n",
"\n",
"Think way back to grade school, when you learned to multiply two numbers $a_{N_a-1} \\cdots a_2 a_1 a_0 \\times x_{N_x-1} \\cdots x_2 x_1 x_0$, where $a_k$ and $x_k$ are the **decimal digits** of two numbers with $N_a$ and $N_x$ digits, respectively. (That is, $x_{N_x-1} \\cdots x_2 x_1 x_0 = \\sum_k {10}^k x_k$.)\n",
"\n",
"**(a)** Show that, if you ignore carries (i.e. you allow \"digits\" $> 9$) then the *result* $ y_{N_y-1} \\cdots y_2 y_1 y_0$ is in fact a **convolution** of the digits $y = a \\ast x$, but *not* a cyclic convolution — there is no \"wraparound\". (What is $N_y$?)\n",
"\n",
"**(b)** Show that you can write your previous answer as a *cyclic* convolution simply by \"padding\" $a,x,y$ with zeros. That is, if you add some zeros to each of them (in the right place) to extend them to a common length $N$ (what must $N$ be?), then $\\mbox{pad}(y,N) = \\mbox{pad}(a,N) \\circledast \\mbox{pad}(x,N)$.\n",
"\n",
"**(c)** Write Julia code to compute\n",
"$$\n",
"\\underbrace{94403147635990179114}_a \\times \\underbrace{33855185913241354461}_x \\\\\n",
"= \\underbrace{3196036114011618584561027454963352927554}_y\n",
"$$\n",
"by:\n",
"* Implement the function `pad(x,N)` to zero-pad as in part (b)\n",
"* Implement a function `conv(a,x)` to perform a circular convolution using $Fx =$ `fft(x)` and $\\bar{F}x =$ `bfft(x)`. (You may want to call `round.(Int,real(y))` on the result to discard tiny roundoff errors, since the final values should be integers).\n",
"* Implement a function `carry(y)` to perform the carries: any \"digits\" larger than $10$ should have the tens place \"carried\" to the next digit etc. (You may find the [`divrem` function](https://docs.julialang.org/en/v1/base/math/#Base.divrem) useful to divide by 10 and find a remainder.)\n",
"Check that you get the right answer: `bignum(carry(y)) == 3196036114011618584561027454963352927554`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b098b074",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bignum(a) = 94403147635990179114\n",
"bignum(x) = 33855185913241354461\n",
"bignum(a) * bignum(x) = 3196036114011618584561027454963352927554\n"
]
},
{
"data": {
"text/plain": [
"3196036114011618584561027454963352927554"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using FFTW\n",
"\n",
"a = [4, 1, 1, 9, 7, 1, 0, 9, 9, 5, 3, 6, 7, 4, 1, 3, 0, 4, 4, 9]\n",
"x = [1, 6, 4, 4, 5, 3, 1, 4, 2, 3, 1, 9, 5, 8, 1, 5, 5, 8, 3, 3]\n",
"\n",
"bignum(x) = evalpoly(BigInt(10), x)\n",
"@show bignum(a)\n",
"@show bignum(x)\n",
"@show bignum(a) * bignum(x)"
]
},
{
"cell_type": "markdown",
"id": "bb0e34b1",
"metadata": {},
"source": [
"## Solutions:\n",
"\n",
"**(a)** Think about the usual \"long multiplication algorithm, omitting carries. To get the 1's place of the output, i.e. the \"digit\" $y_0$, we simply multiply the 1's -places of the inputs: $y_0 = a_0 x_0$. To get the 10's place of the ouptut, i.e. the \"digit\" $y_1$, we multiply the 1's place of $a$ with the 10's place of $x$ and vice versa, and add them, i.e. $y_1 = a_0 x_1 + a_1 x_0$. And so forth, giving a pattern like:\n",
"$$\n",
"y_0 = a_0 x_0 \\\\\n",
"y_1 = a_0 x_1 + a_1 x_0 \\\\\n",
"y_2 = a_0 x_2 + a_1 x_1 + a_2 x_0 \\\\\n",
"\\vdots \\\\\n",
"y_m = \\sum_{n=0}^m a_n x_{m-n}\n",
"$$\n",
"which is exactly in the form of a linear convolution! To make the sums look more uniform, we can equivalently think of each number as having an *infinite* number of digits, but most of the digits are zero (e.g. $x_k = 0$ for $k < 0$ or $k \\ge N_x$, where the former correspond to \"zeros after the decimal point\" and the latter correspond to zeros before the most significant digit). With this convention, we can simply write:\n",
"$$\n",
"y_m = \\sum_{n=-\\infty}^\\infty a_n x_{m-n}\n",
"$$\n",
"\n",
"**(b)** We must pad with zeros to a length $N$ that is big enough that the circular \"wrap around\" of a cyclic convolution does not distort the result — the \"wrapped around\" digits must all be zeros. Equivalently, since a convolution is a sequence of dot product where we \"slide\" $x$ past $a$ (or vice versa), the overlap should be zero before $x$ slides past the boundary. The last possible overlap, corresponds to the most significant digit in the output $y$, which is $y_m$ for $m = N_a + N_x - 2$: the output (not including carries) has $N_a + N_x - 1$ digits.\n",
"In short, we simply need to pad to a length long enough to hold all the digits of the output, i.e. to any $\\boxed{N \\ge N_a + N_x - 1}$.\n",
"\n",
"Be careful in part (c), however! Once we include carries, this can introduce *additional* digits, so the carried result may eventually be a longer length.\n",
"\n",
"**(c)** Our implementation and tests are as follows"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "bdcd7ae6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"conv (generic function with 2 methods)"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# pad x with N-length(x) zeros of the same type\n",
"pad(x, N) = [x; zeros(eltype(x), N - length(x))]\n",
" \n",
"# there are many ways to write this,\n",
"# since from problem 1 the convolution theorem takes many forms\n",
"conv(x,y) = ifft(fft(x) .* fft(y))\n",
"# or: conv(x,y) = bfft(fft(x) .* fft(y)) / length(x)\n",
"# or: conv(x,y) = fft(bfft(x) .* bfft(y)) / length(x)\n",
"\n",
"# for integer vectors, we expect an integer result, but\n",
"# FFTs will give us slightly complex/noninteger results due\n",
"# to roundoff errors. Let's correct this as suggested:\n",
"conv(x::AbstractVector{<:Integer}, y::AbstractVector{<:Integer}) =\n",
" round.(promote_type(eltype(x),eltype(y)), real.(conv(float.(x), float.(y))))"
]
},
{
"cell_type": "markdown",
"id": "792e6fbb",
"metadata": {},
"source": [
"Before we do any carries, let's check that our convolution and zero-padding are correct — evaluating the resulting \"digits\" as an integer using the `bignum` function from above should still work even if we have digits $> 10$."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "8f411509",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bignum(y) == bignum(a) * bignum(x) = true\n",
"bignum(y) == bignum(a) * bignum(x) = true\n",
"bignum(y) == bignum(a) * bignum(x) = true\n",
"bignum(y) == bignum(a) * bignum(x) = true\n",
"bignum(y) == bignum(a) * bignum(x) = true\n",
"bignum(y) == bignum(a) * bignum(x) = true\n"
]
}
],
"source": [
"Na = length(a)\n",
"Nx = length(x)\n",
"\n",
"# any length ≥ Na + Nx - 1 should work. let's try a few:\n",
"for N in (Na + Nx - 1) .+ (0:5)\n",
" y = conv(pad(a, N), pad(x, N))\n",
" @show bignum(y) == bignum(a) * bignum(x)\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "07051556",
"metadata": {},
"source": [
"Now, let's implement the carry function. The key step, for a digit $y_k = {}$ `y[k]`, is that we want to find the quotient $q$ and the remainder $r$ after dividing by 10, which can be accomplished in Julia by:\n",
"```jl\n",
"q, r = divrem(y[k], 10)\n",
"```\n",
"Then, the remainder is the *new k-th digit* ($y_k \\to r$) and the quotient *gets carried* (added to $y_{k+1}$). Note, however, that `y[k]` in this expression *includes any previous carries*."
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "53f03b79",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"carry (generic function with 1 method)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function carry(y)\n",
" c = similar(y, 0) # result array of the same type as y, of zero length\n",
" push!(c, 0) # let's start off with a result of zero.\n",
" for k = 1:length(y)\n",
" q, r = divrem(y[k]+c[k], 10)\n",
" c[k] = r # remainder = new digit (+ any previous carry)\n",
" push!(c, q) # push quotient as the next digit\n",
" end\n",
" return c\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "0dabf98c",
"metadata": {},
"source": [
"Let's try it:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "1a7587ea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"y = [4, 5, 5, 7, 2, 9, 2, 5, 3, 3, 6, 9, 4, 5, 4, 7, 2, 0, 1, 6, 5, 4, 8, 5, 8, 1, 6, 1, 1, 0, 4, 1, 1, 6, 3, 0, 6, 9, 1, 3]\n",
"y == digits(bignum(a) * bignum(x)) = true\n",
"bignum(y) == bignum(a) * bignum(x) = true\n"
]
},
{
"data": {
"text/plain": [
"true"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"N = Na + Nx - 1\n",
"y = carry(conv(pad(a, N), pad(x, N)))\n",
"\n",
"@show y\n",
"@show y == digits(bignum(a) * bignum(x))\n",
"@show bignum(y) == bignum(a) * bignum(x)"
]
},
{
"cell_type": "markdown",
"id": "5d6e8273",
"metadata": {},
"source": [
"Hooray, it's the correct digits $319603 \\ldots 927554$ (starting with the least signifcant!) and passes all of our tests!\n",
"\n",
"Note that the carried result is one digit longer than $N$ digits:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "d4dd3e88",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"N = 39\n",
"length(y) = 40\n"
]
},
{
"data": {
"text/plain": [
"40"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@show N\n",
"@show length(y)"
]
}
],
"metadata": {
"@webio": {
"lastCommId": null,
"lastKernelId": null
},
"kernelspec": {
"display_name": "Julia 1.8.3",
"language": "julia",
"name": "julia-1.8"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}