rm Adam for now

notes on gradient descent for quadratic functions
tweak
2024-10-31 22:08:16 -04:00 · 2024-10-31 22:06:46 -04:00 · 2024-09-30 10:44:53 -04:00 · 2024-09-30 09:42:46 -04:00 · 2024-09-29 21:26:42 -04:00 · 2024-09-29 21:24:12 -04:00
8 changed files with 6803 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -327,13 +327,12 @@ http://dx.doi.org/10.1137/S1052623499362822) — I used the "linear and separabl

 * Non-negative matrix factorization — guest lecture by [Prof. Ankur Moitra](https://people.csail.mit.edu/moitra/).

-**Further reading:** Coming soon.
+**Further reading:** See section 2.1 of Moitra's [*Algorithmic Aspects of Machine Learning*](http://people.csail.mit.edu/moitra/docs/bookexv2.pdf).  An interesting application to image analysis can be found in [this paper](http://www.cs.columbia.edu/~blei/fogm/2020F/readings/LeeSeung1999.pdf).

 ## Lecture 29 (Apr 21)

 * [(Discrete) convolutions](https://en.wikipedia.org/wiki/Convolution#Discrete_convolution), translation-invariance, [circulant matrices](https://en.wikipedia.org/wiki/Circulant_matrix), and convolutional neural networks (CNNs)
 * [pset 5 solutions](psets/pset5sol.ipynb)
-* pset 6: coming soon, due Friday May 5.

 **Further reading:** Strang textbook sections IV.2 (circulant matrices) and VII.2 (CNNs), and [OCW lecture 32](https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/resources/lecture-32-imagenet-is-a-cnn-the-convolution-rule/).  See also these [Stanford lecture slides](http://cs231n.stanford.edu/slides/2016/winter1516_lecture7.pdf) and [MIT lecture slides](https://mit6874.github.io/assets/sp2020/slides/L03_CNNs_MK2.pdf).

@@ -352,6 +351,57 @@ http://dx.doi.org/10.1137/S1052623499362822) — I used the "linear and separabl

 ## Lecture 32 (Apr 25)

+* [pset 6](psets/pset6.ipynb): due Friday May 5.
+
 Fourier series vs. DFT: If we view the DFT as a [Riemann sum](https://en.wikipedia.org/wiki/Riemann_sum) approximation for a [Fourier series](https://en.wikipedia.org/wiki/Fourier_series) coefficient (which turns out to be *exponentially* accurate for smooth periodic f(t)!), then the errors are an instance of [aliasing](https://en.wikipedia.org/wiki/Aliasing) (see e.g. the ["wagon-wheel" effect](https://en.wikipedia.org/wiki/Wagon-wheel_effect)).  For band-limited signals where we sample at a rate > twice the bandwidth, there is no aliasing and no information loss, a result known as the [Nyquist—Shannon sampling theorem](https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem); in the common case where the bandwidth is centered at ω=0, this corresponds to sampling at more than *twice* the highest frequency.

 **Further reading**: For a periodic function, a Riemann sum is equivalent to a trapezoidal rule (since the 0th and Nth samples are identical), and the exponential convergence to the integral is reviewed by [Trefethen and Weideman (2014)](https://epubs.siam.org/doi/pdf/10.1137/130932132); SGJ gave a [simplified review for IAP (2011)](https://math.mit.edu/~stevenj/trap-iap-2011.pdf). The subject of aliasing, sampling, and signal processing leads to the field of [digital signal processing (DSP)](https://en.wikipedia.org/wiki/Digital_signal_processing), on which there are many books and courses.  A classic textbook is [*Discrete-Time Signal Processing*](https://research.iaun.ac.ir/pd/naghsh/pdfs/UploadFile_2230.pdf) by Oppenheim and Schafer, and there are whole courses at MIT (like 6.3000 and 6.7000) on these topics.
+
+## Lecture 33 (May 1)
+
+* The [DTFT](https://en.wikipedia.org/wiki/Discrete-time_Fourier_transform), [windowing](https://en.wikipedia.org/wiki/Window_function) (e.g. relating DTFT to DFT) and spectral leakage.
+* [Filter design](https://en.wikipedia.org/wiki/Filter_design) and [FIR filters](https://en.wikipedia.org/wiki/Finite_impulse_response).
+
+**Further reading**: See the DSP links from lecture 32.
+
+## Lecture 34 (May 3)
+* FIR filter design = polynomial fitting.  Windowing & least-squares, or minimax design by [Parks-McClellan](https://en.wikipedia.org/wiki/Parks%E2%80%93McClellan_filter_design_algorithm) and similar.
+* Changing variables H(z) = ĥ(ω) for z=exp(iω): the [Z transform](https://en.wikipedia.org/wiki/Z-transform)
+* [IIR filters](https://en.wikipedia.org/wiki/Infinite_impulse_response): definition, relation to [rational functions](https://en.wikipedia.org/wiki/Rational_function), first-order IIR filter example, stability.
+
+**Further reading**: See the DSP links from lecture 32.
+
+## Lecture 35 (May 5)
+
+* IIR filter stability: poles zₚ must lie inside the unit circle |zₚ|
+* IIR filter design: non-convex and complicated, lots of algorithms and approximations.
+* [Kronecker products A⊗B](https://en.wikipedia.org/wiki/Kronecker_product) and ["vectorization" vec(A)](https://en.wikipedia.org/wiki/Vectorization_(mathematics)): expressing "multidimensional linear operations" ([multilinear algebra](https://en.wikipedia.org/wiki/Multilinear_algebra)) as ordinary matrix–vector products.
+* [pset 6 solutions](psets/pset6sol.ipynb)
+
+## Lecture 36 (May 6)
+
+* Kronecker products, [Hadamard matrices](https://en.wikipedia.org/wiki/Hadamard_matrix), and the mixed-product property (A⊗B)(C⊗D)=AC⊗BD: Kronecker products of unitary matrices are unitary.
+* Kronecker products and the [fast Walsh–Hadamard transform (FWHT)](https://en.wikipedia.org/wiki/Fast_Walsh%E2%80%93Hadamard_transform): fast algorithms (FWHT, FFT, …) as *sparse factorizations* of dense matrices via Kronecker products, and equivalently as mono-to-multi-dimensional mappings.
+* [Sylvester equations](https://en.wikipedia.org/wiki/Sylvester_equation) and [Lyapunov equations](https://en.wikipedia.org/wiki/Lyapunov_equation): can be solved as ordinary matrix–vector equations via Kronecker products, but this naively requires Θ(n⁶) operations, whereas exploiting the structure gives Θ(n³).  Kronecker products are often a nice way to *think* about things but *not* to explicitly compute with.
+
+## Lecture 37 (May 10)
+
+* [Graphs](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) and their application to representing how entities are connected — lots of applications, from data mining of social networks or the web, to bioinformatics, to analyzing sparse-matrix algorithms
+* Representing graphs via matrices: [incidence matrices E](https://en.wikipedia.org/wiki/Incidence_matrix), [adjacency matrices A](https://en.wikipedia.org/wiki/Adjacency_matrix), and [graph Laplacians L=EᵀE=D-A](https://en.wikipedia.org/wiki/Laplacian_matrix)
+* [Graph partitioning](https://en.wikipedia.org/wiki/Graph_partition): spectral partitioning via the [Fiedler vector](https://en.wikipedia.org/wiki/Algebraic_connectivity), corresponding to the **second-smallest** eigenvalue of L
+
+**Further reading:** Textbook sections VI.6–VI.7 and [OCW lecture 35](https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/resources/lecture-35-finding-clusters-in-graphs-second-project-handwriting/).  Using incidence matrices to identify cycles and spanning trees in graphs is also covered in 18.06 and in Strang's *Introduction to Linear Algebra* book (5th ed. section 10.1 or 6th ed. section 3.5), as well as in [this interactive Julia notebook](https://github.com/mitmath/1806/blob/1a9ff5c359b79f28c534bf1e1daeadfdc7aee054/notes/Graphs-Networks.ipynb).  A popular software package for graph and mesh partitioning is [METIS](https://github.com/KarypisLab/METIS).  The Google [PageRank algorithm](https://en.wikipedia.org/wiki/PageRank) is another nice application of linear algebra to graphs, in this case to rank web pages by "importance".  Direct methods (e.g. Gaussian elimination) for sparse-matrix problems turn out to be all about analyzing sparsity patterns via graph theory; see e.g. the [book by Timothy Davis](https://epubs.siam.org/doi/book/10.1137/1.9780898718881).
+
+## Lecture 38 (May 15)
+
+* [Floating-point introduction](notes/Floating-Point-Intro.ipynb)
+
+If you want to continue with "numerical computing" in any form — whether it be for data analysis, machine learning, scientific/engineering modeling, or other areas — at some point you will need to learn more about **computational errors**.  The mathematical field that studies **algorithms + approximations** is called [numerical analysis](https://en.wikipedia.org/wiki/Numerical_analysis), covered by courses such as [18.335](https://github.com/mitmath/18335) at MIT.
+
+One of the most basic sources of computational error is that **computer arithmetic is generally inexact**, leading to [roundoff errors](https://en.wikipedia.org/wiki/Round-off_error).  The reason for this is simple: computers can only work with numbers having a **finite number of digits**, so they **cannot even store** arbitrary real numbers.  Only a _finite subset_ of the real numbers can be represented using a particular number of "bits", and the question becomes _which subset_ to store, how arithmetic on this subset is defined, and how to analyze the errors compared to theoretical exact arithmetic on real numbers.
+
+In **floating-point** arithmetic, we store both an integer coefficient and an exponent in some base: essentially, scientific notation. This allows large dynamic range and fixed _relative_ accuracy: if fl(x) is the closest floating-point number to any real x, then |fl(x)-x| < ε|x| where ε is the _machine precision_. This makes error analysis much easier and makes algorithms mostly insensitive to overall scaling or units, but has the disadvantage that it requires specialized floating-point hardware to be fast. Nowadays, all general-purpose computers, and even many little computers like your cell phones, have floating-point units.
+
+Went through some simple definitions and examples in Julia (see notebook above), illustrating the basic ideas and a few interesting tidbits.  In particular, we looked at **error accumulation** during long calculations (e.g. summation), as well as examples of [catastrophic cancellation](https://en.wikipedia.org/wiki/Loss_of_significance) and how it can sometimes be avoided by rearranging a calculation.
+
+**Further reading:** [Trefethen & Bau's *Numerical Linear Algebra*](https://people.maths.ox.ac.uk/trefethen/text.html), lecture 13. [What Every Computer Scientist Should Know About Floating Point Arithmetic](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.6768) (David Goldberg, ACM 1991). William Kahan, [How Java's floating-point hurts everyone everywhere](http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf) (2004): contains a nice discussion of floating-point myths and misconceptions.   A brief but useful summary can be found in [this Julia-focused floating-point overview](https://discourse.julialang.org/t/psa-floating-point-arithmetic/8678) by Prof. John Gibson. Because many programmers never learn how floating-point arithmetic actually works, there are [many common myths](https://github.com/mitmath/18335/blob/spring21/notes/fp-myths.pdf) about its behavior.   (An infamous example is `0.1 + 0.2` giving `0.30000000000000004`, which people are puzzled by so frequently it has led to a web site [https://0.30000000000000004.com/](https://0.30000000000000004.com/)!)
--- a/notes/Eigenproblems.ipynb
+++ b/notes/Eigenproblems.ipynb
--- a/notes/Floating-Point-Intro.ipynb
+++ b/notes/Floating-Point-Intro.ipynb
--- a/notes/Least-Square
+++ b/notes/Least-Square
--- a/notes/Quadratic-Gradient-Descent.ipynb
+++ b/notes/Quadratic-Gradient-Descent.ipynb
--- a/psets/pset4sol.ipynb
+++ b/psets/pset4sol.ipynb
@@ -586,7 +586,7 @@
    "    # approximate gradient from this mini-batch\n",
    "    ∇f̃ = (2/M)*(Ã'*(Ã*x - b̃))\n",
    "\n",
-    "    # Adam update\n",
+    "    # Yogi update\n",
    "    m = β₁*m + (1-β₁)*∇f̃\n",
    "    v = @. β₂*v - (1-β₂) * sign(v - ∇f̃^2) * (∇f̃^2)\n",
    "    m̂ = m / (1 - β₁^t) # de-bias: normalize by total weights\n",
--- a/psets/pset6.ipynb
+++ b/psets/pset6.ipynb
@@ -0,0 +1,131 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a65bff2e",
+   "metadata": {},
+   "source": [
+    "# 18.065 Problem Set 6\n",
+    "\n",
+    "Due Friday May 5."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46c6dd3c",
+   "metadata": {},
+   "source": [
+    "## Problem 1 (10+10 points)\n",
+    "\n",
+    "**(a)** In class, we showed that if $\\tilde{y} = \\bar{F} y, \\tilde{a} = \\bar{F} a, \\tilde{x} = \\bar{F} x$ where $F$ is the DFT matrix, then $y = a \\circledast x$ (circular convolution) implies $\\tilde{y} = \\tilde{a} \\, \\mbox{.*} \\, \\tilde{x}$ (element-wise product).   Show that the reverse is also true: show that if $\\tilde{y} = \\tilde{a} \\circledast \\tilde{x}$, then $y = \\alpha (a \\, \\mbox{.*} \\, x) $ for some scale factor $\\alpha = ???$. \n",
+    "\n",
+    "**(b)** Differentiating through convolutions, in class, involved the \"reversal\" permutation $R$:\n",
+    "$$\n",
+    "R\\begin{pmatrix} x_0 \\\\ x_1 \\\\ x_2 \\\\ \\vdots \\\\ x_{N-1} \\end{pmatrix}\n",
+    "=\\begin{pmatrix} x_0 \\\\ x_{N-1} \\\\ x_{N-2} \\\\ \\vdots \\\\ x_1 \\end{pmatrix}\n",
+    "$$\n",
+    "What happens when we combine this with a DFT, e.g. to apply the convolution theorem?  Show that $FRx = \\bar{F}x$.\n",
+    "\n",
+    "(This is illustrated by the code below: `fft(x)` computes $Fx$ and `bfft(x)` computes $\\bar{F}x$ using the [FFTW.jl package](https://github.com/JuliaMath/FFTW.jl).)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e1ce6605",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "using FFTW\n",
+    "R(x) = [x[begin]; x[end:-1:begin+1]] # compute R*x\n",
+    "x = rand(20)\n",
+    "fft(R(x)) ≈ bfft(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc26407d",
+   "metadata": {},
+   "source": [
+    "## Problem 2 (10+10+10 points)\n",
+    "\n",
+    "Think way back to grade school, when you learned to multiply two numbers $a_{N_a-1} \\cdots a_2 a_1 a_0 \\times x_{N_x-1} \\cdots x_2 x_1 x_0$, where $a_k$ and $x_k$ are the **decimal digits** of two numbers with $N_a$ and $N_x$ digits, respectively.  (That is, $x_{N_x-1} \\cdots x_2 x_1 x_0 = \\sum_k {10}^k x_k$.)\n",
+    "\n",
+    "**(a)** Show that, if you ignore carries (i.e. you allow \"digits\" $> 9$) then the *result* $ y_{N_y-1} \\cdots y_2 y_1 y_0$ is in fact a **convolution** of the digits $y = a \\ast x$, but *not* a cyclic convolution — there is no \"wraparound\".  (What is $N_y$?)\n",
+    "\n",
+    "**(b)** Show that you can write your previous answer as a *cyclic* convolution simply by \"padding\" $a,x,y$ with zeros.  That is, if you add some zeros to each of them (in the right place) to extend them to a common length $N$ (what must $N$ be?), then $\\mbox{pad}(y,N) = \\mbox{pad}(a,N) \\circledast \\mbox{pad}(x,N)$.\n",
+    "\n",
+    "**(c)** Write Julia code to compute\n",
+    "$$\n",
+    "\\underbrace{94403147635990179114}_a \\times \\underbrace{33855185913241354461}_x \\\\\n",
+    "= \\underbrace{3196036114011618584561027454963352927554}_y\n",
+    "$$\n",
+    "by:\n",
+    "* Implement the function `pad(x,N)` to zero-pad as in part (b)\n",
+    "* Implement a function `conv(a,x)` to perform a circular convolution using $Fx =$ `fft(x)` and $\\bar{F}x =$ `bfft(x)`.  (You may want to call `round.(Int,real(y))` on the result to discard tiny roundoff errors, since the final values should be integers).\n",
+    "* Implement a function `carry(y)` to perform the carries: any \"digits\" larger than $10$ should have the tens place \"carried\" to the next digit etc.  (You may find the [`divrem` function](https://docs.julialang.org/en/v1/base/math/#Base.divrem) useful to divide by 10 and find a remainder.)\n",
+    "Check that you get the right answer: `bignum(carry(y)) == 3196036114011618584561027454963352927554`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ac00a3f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "using FFTW\n",
+    "\n",
+    "a = [4, 1, 1, 9, 7, 1, 0, 9, 9, 5, 3, 6, 7, 4, 1, 3, 0, 4, 4, 9]\n",
+    "x = [1, 6, 4, 4, 5, 3, 1, 4, 2, 3, 1, 9, 5, 8, 1, 5, 5, 8, 3, 3]\n",
+    "\n",
+    "bignum(x) = evalpoly(BigInt(10), x)\n",
+    "@show bignum(a)\n",
+    "@show bignum(x)\n",
+    "@show bignum(a) * bignum(x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bdcd7ae6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pad(x, N) = ...\n",
+    "conv(x,y) = ..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "53f03b79",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "function carry(y)\n",
+    "    ...\n",
+    "end"
+   ]
+  }
+ ],
+ "metadata": {
+  "@webio": {
+   "lastCommId": null,
+   "lastKernelId": null
+  },
+  "kernelspec": {
+   "display_name": "Julia 1.8.3",
+   "language": "julia",
+   "name": "julia-1.8"
+  },
+  "language_info": {
+   "file_extension": ".jl",
+   "mimetype": "application/julia",
+   "name": "julia",
+   "version": "1.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/psets/pset6sol.ipynb
+++ b/psets/pset6sol.ipynb
@@ -0,0 +1,439 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a65bff2e",
+   "metadata": {},
+   "source": [
+    "# 18.065 Problem Set 6 Solutions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46c6dd3c",
+   "metadata": {},
+   "source": [
+    "## Problem 1 (10+10 points)\n",
+    "\n",
+    "**(a)** In class, we showed that if $\\tilde{y} = \\bar{F} y, \\tilde{a} = \\bar{F} a, \\tilde{x} = \\bar{F} x$ where $F$ is the DFT matrix, then $y = a \\circledast x$ (circular convolution) implies $\\tilde{y} = \\tilde{a} \\, \\mbox{.*} \\, \\tilde{x}$ (element-wise product).   Show that the reverse is also true: show that if $\\tilde{y} = \\tilde{a} \\circledast \\tilde{x}$, then $y = \\alpha (a \\, \\mbox{.*} \\, x) $ for some scale factor $\\alpha = ???$. \n",
+    "\n",
+    "**(b)** Differentiating through convolutions, in class, involved the \"reversal\" permutation $R$:\n",
+    "$$\n",
+    "R\\begin{pmatrix} x_0 \\\\ x_1 \\\\ x_2 \\\\ \\vdots \\\\ x_{N-1} \\end{pmatrix}\n",
+    "=\\begin{pmatrix} x_0 \\\\ x_{N-1} \\\\ x_{N-2} \\\\ \\vdots \\\\ x_1 \\end{pmatrix}\n",
+    "$$\n",
+    "What happens when we combine this with a DFT, e.g. to apply the convolution theorem?  Show that $FRx = \\bar{F}x$.\n",
+    "\n",
+    "(This is illustrated by the code below: `fft(x)` computes $Fx$ and `bfft(x)` computes $\\bar{F}x$ using the [FFTW.jl package](https://github.com/JuliaMath/FFTW.jl).)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "e1ce6605",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "true"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "using FFTW\n",
+    "R(x) = [x[begin]; x[end:-1:begin+1]] # compute R*x\n",
+    "x = rand(20)\n",
+    "fft(R(x)) ≈ bfft(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be151905",
+   "metadata": {},
+   "source": [
+    "## Solutions:\n",
+    "\n",
+    "**(a)** There are many ways to approach this problem, one approach is simply to re-trace our steps from class.\n",
+    "\n",
+    "In class, we showed that the columns of $F$ are eigenvectors of cyclic convolutions, with the corresponding eigenvalues given by $\\bar{F} a$.  Hence we obtained $a \\circledast x = F (\\bar{F}a \\,\\mbox{.*}) F^{-1} x =  \\frac{1}{N} F (\\bar{F}a \\, \\mbox{.*} \\, \\bar{F} x) = \\bar{F}^{-1} (\\tilde{a} \\,\\mbox{.*}\\, \\tilde{x})$.\n",
+    "\n",
+    "Now, we want to show almost the same thing but we want to transform in the opposite direction.  That is, we want to replace $F$ with $F^{-1} = \\bar{F}/N$ and vice versa.   If we proceed as in class, we can start by showing that the columns of $\\bar{F}$ are eigenvectors of convolutions.   But our proof from class for $F$ only relied on the cyclic property $\\omega_N^N = 1$!  The proof is completely unchanged if we replace $\\omega_N$ with $\\overline{\\omega_N}$ (corresponding to replacing $F$ with $\\bar{F}$)!\n",
+    "\n",
+    "Hence we can simply quote the results above and replace $F$ with $\\bar{F}$:\n",
+    "$$\n",
+    "\\tilde{y} = \\tilde{a} \\circledast \\tilde{x} = \\bar{F} (F\\tilde{a}\\, \\mbox{.*}) \\bar{F}^{-1} \\tilde{x} \n",
+    "=  F^{-1} (F\\tilde{a} \\,\\mbox{.*}\\, F\\tilde{x})\n",
+    "$$\n",
+    "Since $\\tilde{x} = \\bar{F} x$, that means $x = \\bar{F}^{-1} \\tilde{x} = F \\tilde{x} / N$, i.e. $F\\tilde{x} = Nx$.  Similarly for $a$.  Substituting that in, we obtain:\n",
+    "$$\n",
+    "\\tilde{y} = N^2 F^{-1} (a \\,\\mbox{.*}\\, x) \\Longleftrightarrow \\boxed{y = N (a \\,\\mbox{.*}\\, x)}.\n",
+    "$$\n",
+    "which is the desired result with $\\boxed{\\alpha = N}$.\n",
+    "\n",
+    "*Optional*: Since it's easy to get this algebra wrong, let's do a quick check of our result in Julia:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d365ac92",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "true"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# brute-force circular convolution\n",
+    "function conv(a,x)\n",
+    "    N = length(a); N == length(x) || throw(DimensionMismatch())\n",
+    "    return [sum(i -> a[begin+i] * x[begin+mod(j-i, N)], 0:N-1) for j in 0:N-1]\n",
+    "end\n",
+    "\n",
+    "N = 10\n",
+    "ã = randn(ComplexF64, N)\n",
+    "x̃ = randn(ComplexF64, N)\n",
+    "ỹ = conv(ã, x̃)\n",
+    "a = fft(ã)/N # Fã/N\n",
+    "x = fft(x̃)/N # Fx̃/N\n",
+    "y = fft(ỹ)/N # Fỹ/N\n",
+    "y ≈ N * (a .* x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b61d45ed",
+   "metadata": {},
+   "source": [
+    "Hooray!  Math works!\n",
+    "\n",
+    "**(b)**\n",
+    "\n",
+    "If $y = Rx$, then $y_n = x_{-n}$ (with indices interpreted $\\mod N$ as usual).  If $z = FRx = Fy$, then we have:\n",
+    "$$\n",
+    "z_m = \\sum_{n=0}^{N-1} \\omega_N^{mn} y_n = \\sum_{n=0}^{N-1} \\omega_N^{mn} x_{-n} = \\sum_{k=0}^{N-1} \\omega_N^{-mk} x_{k} = (\\bar{F} x)_m\n",
+    "$$\n",
+    "where we simply performed a change of variables $k = -n$ in the sum.  Hence $z = FRx = \\bar{F}x$ as desired."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc26407d",
+   "metadata": {},
+   "source": [
+    "## Problem 2 (10+10+10 points)\n",
+    "\n",
+    "Think way back to grade school, when you learned to multiply two numbers $a_{N_a-1} \\cdots a_2 a_1 a_0 \\times x_{N_x-1} \\cdots x_2 x_1 x_0$, where $a_k$ and $x_k$ are the **decimal digits** of two numbers with $N_a$ and $N_x$ digits, respectively.  (That is, $x_{N_x-1} \\cdots x_2 x_1 x_0 = \\sum_k {10}^k x_k$.)\n",
+    "\n",
+    "**(a)** Show that, if you ignore carries (i.e. you allow \"digits\" $> 9$) then the *result* $ y_{N_y-1} \\cdots y_2 y_1 y_0$ is in fact a **convolution** of the digits $y = a \\ast x$, but *not* a cyclic convolution — there is no \"wraparound\".  (What is $N_y$?)\n",
+    "\n",
+    "**(b)** Show that you can write your previous answer as a *cyclic* convolution simply by \"padding\" $a,x,y$ with zeros.  That is, if you add some zeros to each of them (in the right place) to extend them to a common length $N$ (what must $N$ be?), then $\\mbox{pad}(y,N) = \\mbox{pad}(a,N) \\circledast \\mbox{pad}(x,N)$.\n",
+    "\n",
+    "**(c)** Write Julia code to compute\n",
+    "$$\n",
+    "\\underbrace{94403147635990179114}_a \\times \\underbrace{33855185913241354461}_x \\\\\n",
+    "= \\underbrace{3196036114011618584561027454963352927554}_y\n",
+    "$$\n",
+    "by:\n",
+    "* Implement the function `pad(x,N)` to zero-pad as in part (b)\n",
+    "* Implement a function `conv(a,x)` to perform a circular convolution using $Fx =$ `fft(x)` and $\\bar{F}x =$ `bfft(x)`.  (You may want to call `round.(Int,real(y))` on the result to discard tiny roundoff errors, since the final values should be integers).\n",
+    "* Implement a function `carry(y)` to perform the carries: any \"digits\" larger than $10$ should have the tens place \"carried\" to the next digit etc.  (You may find the [`divrem` function](https://docs.julialang.org/en/v1/base/math/#Base.divrem) useful to divide by 10 and find a remainder.)\n",
+    "Check that you get the right answer: `bignum(carry(y)) == 3196036114011618584561027454963352927554`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "b098b074",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "bignum(a) = 94403147635990179114\n",
+      "bignum(x) = 33855185913241354461\n",
+      "bignum(a) * bignum(x) = 3196036114011618584561027454963352927554\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "3196036114011618584561027454963352927554"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "using FFTW\n",
+    "\n",
+    "a = [4, 1, 1, 9, 7, 1, 0, 9, 9, 5, 3, 6, 7, 4, 1, 3, 0, 4, 4, 9]\n",
+    "x = [1, 6, 4, 4, 5, 3, 1, 4, 2, 3, 1, 9, 5, 8, 1, 5, 5, 8, 3, 3]\n",
+    "\n",
+    "bignum(x) = evalpoly(BigInt(10), x)\n",
+    "@show bignum(a)\n",
+    "@show bignum(x)\n",
+    "@show bignum(a) * bignum(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb0e34b1",
+   "metadata": {},
+   "source": [
+    "## Solutions:\n",
+    "\n",
+    "**(a)** Think about the usual \"long multiplication algorithm, omitting carries.  To get the 1's place of the output, i.e. the \"digit\" $y_0$, we simply multiply the 1's -places of the inputs: $y_0 = a_0 x_0$.   To get the 10's place of the ouptut, i.e. the \"digit\" $y_1$, we multiply the 1's place of $a$ with the 10's place of $x$ and vice versa, and add them, i.e. $y_1 = a_0 x_1 + a_1 x_0$.  And so forth, giving a pattern like:\n",
+    "$$\n",
+    "y_0 = a_0 x_0 \\\\\n",
+    "y_1 = a_0 x_1 + a_1 x_0 \\\\\n",
+    "y_2 = a_0 x_2 + a_1 x_1 + a_2 x_0 \\\\\n",
+    "\\vdots \\\\\n",
+    "y_m = \\sum_{n=0}^m a_n x_{m-n}\n",
+    "$$\n",
+    "which is exactly in the form of a linear convolution!  To make the sums look more uniform, we can equivalently think of each number as having an *infinite* number of digits, but most of the digits are zero (e.g. $x_k = 0$ for $k < 0$ or $k \\ge N_x$, where the former correspond to \"zeros after the decimal point\" and the latter correspond to zeros before the most significant digit).  With this convention, we can simply  write:\n",
+    "$$\n",
+    "y_m = \\sum_{n=-\\infty}^\\infty a_n x_{m-n}\n",
+    "$$\n",
+    "\n",
+    "**(b)** We must pad with zeros to a length $N$ that is big enough that the circular \"wrap around\" of a cyclic convolution does not distort the result — the \"wrapped around\" digits must all be zeros.  Equivalently, since a convolution is a sequence of dot product where we \"slide\" $x$ past $a$ (or vice versa), the overlap should be zero before $x$ slides past the boundary.  The last possible overlap, corresponds to the most significant digit in the output $y$, which is $y_m$ for $m = N_a + N_x - 2$: the output (not including carries) has $N_a + N_x - 1$ digits.\n",
+    "In short, we simply need to pad to a length long enough to hold all the digits of the output, i.e. to any $\\boxed{N \\ge N_a + N_x - 1}$.\n",
+    "\n",
+    "Be careful in part (c), however!  Once we include carries, this can introduce *additional* digits, so the carried result may eventually be a longer length.\n",
+    "\n",
+    "**(c)** Our implementation and tests are as follows"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "bdcd7ae6",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "conv (generic function with 2 methods)"
+      ]
+     },
+     "execution_count": 22,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# pad x with N-length(x) zeros of the same type\n",
+    "pad(x, N) = [x; zeros(eltype(x), N - length(x))]\n",
+    "    \n",
+    "# there are many ways to write this,\n",
+    "# since from problem 1 the convolution theorem takes many forms\n",
+    "conv(x,y) = ifft(fft(x) .* fft(y))\n",
+    "# or: conv(x,y) = bfft(fft(x) .* fft(y)) / length(x)\n",
+    "# or: conv(x,y) = fft(bfft(x) .* bfft(y)) / length(x)\n",
+    "\n",
+    "# for integer vectors, we expect an integer result, but\n",
+    "# FFTs will give us slightly complex/noninteger results due\n",
+    "# to roundoff errors.  Let's correct this as suggested:\n",
+    "conv(x::AbstractVector{<:Integer}, y::AbstractVector{<:Integer}) =\n",
+    "    round.(promote_type(eltype(x),eltype(y)), real.(conv(float.(x), float.(y))))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "792e6fbb",
+   "metadata": {},
+   "source": [
+    "Before we do any carries, let's check that our convolution and zero-padding are correct — evaluating the resulting \"digits\" as an integer using the `bignum` function from above should still work even if we have digits $> 10$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "8f411509",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "bignum(y) == bignum(a) * bignum(x) = true\n",
+      "bignum(y) == bignum(a) * bignum(x) = true\n",
+      "bignum(y) == bignum(a) * bignum(x) = true\n",
+      "bignum(y) == bignum(a) * bignum(x) = true\n",
+      "bignum(y) == bignum(a) * bignum(x) = true\n",
+      "bignum(y) == bignum(a) * bignum(x) = true\n"
+     ]
+    }
+   ],
+   "source": [
+    "Na = length(a)\n",
+    "Nx = length(x)\n",
+    "\n",
+    "# any length ≥ Na + Nx - 1 should work.  let's try a few:\n",
+    "for N in (Na + Nx - 1) .+ (0:5)\n",
+    "    y = conv(pad(a, N), pad(x, N))\n",
+    "    @show bignum(y) == bignum(a) * bignum(x)\n",
+    "end"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07051556",
+   "metadata": {},
+   "source": [
+    "Now, let's implement the carry function.  The key step, for a digit $y_k = {}$ `y[k]`, is that we want to find the quotient $q$ and the remainder $r$ after dividing by 10, which can be accomplished in Julia by:\n",
+    "```jl\n",
+    "q, r = divrem(y[k], 10)\n",
+    "```\n",
+    "Then, the remainder is the *new k-th digit* ($y_k \\to r$) and the quotient *gets carried* (added to $y_{k+1}$).   Note, however, that `y[k]` in this expression *includes any previous carries*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "53f03b79",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "carry (generic function with 1 method)"
+      ]
+     },
+     "execution_count": 29,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "function carry(y)\n",
+    "    c = similar(y, 0) # result array of the same type as y, of zero length\n",
+    "    push!(c, 0) # let's start off with a result of zero.\n",
+    "    for k = 1:length(y)\n",
+    "        q, r = divrem(y[k]+c[k], 10)\n",
+    "        c[k] = r # remainder = new digit (+ any previous carry)\n",
+    "        push!(c, q) # push quotient as the next digit\n",
+    "    end\n",
+    "    return c\n",
+    "end"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0dabf98c",
+   "metadata": {},
+   "source": [
+    "Let's try it:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "1a7587ea",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "y = [4, 5, 5, 7, 2, 9, 2, 5, 3, 3, 6, 9, 4, 5, 4, 7, 2, 0, 1, 6, 5, 4, 8, 5, 8, 1, 6, 1, 1, 0, 4, 1, 1, 6, 3, 0, 6, 9, 1, 3]\n",
+      "y == digits(bignum(a) * bignum(x)) = true\n",
+      "bignum(y) == bignum(a) * bignum(x) = true\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "true"
+      ]
+     },
+     "execution_count": 32,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "N = Na + Nx - 1\n",
+    "y = carry(conv(pad(a, N), pad(x, N)))\n",
+    "\n",
+    "@show y\n",
+    "@show y == digits(bignum(a) * bignum(x))\n",
+    "@show bignum(y) == bignum(a) * bignum(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d6e8273",
+   "metadata": {},
+   "source": [
+    "Hooray, it's the correct digits $319603 \\ldots 927554$ (starting with the least signifcant!) and passes all of our tests!\n",
+    "\n",
+    "Note that the carried result is one digit longer than $N$ digits:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "d4dd3e88",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "N = 39\n",
+      "length(y) = 40\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "40"
+      ]
+     },
+     "execution_count": 33,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "@show N\n",
+    "@show length(y)"
+   ]
+  }
+ ],
+ "metadata": {
+  "@webio": {
+   "lastCommId": null,
+   "lastKernelId": null
+  },
+  "kernelspec": {
+   "display_name": "Julia 1.8.3",
+   "language": "julia",
+   "name": "julia-1.8"
+  },
+  "language_info": {
+   "file_extension": ".jl",
+   "mimetype": "application/julia",
+   "name": "julia",
+   "version": "1.8.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
Author	SHA1	Message	Date
Steven G. Johnson	e4ff01607e	rm Adam for now	2024-10-31 22:08:16 -04:00
Steven G. Johnson	9ed7e009b2	notes on gradient descent for quadratic functions	2024-10-31 22:06:46 -04:00
Steven G. Johnson	92d1615dfc	tweak	2024-09-30 10:44:53 -04:00
Steven G. Johnson	1898b763a6	link	2024-09-30 09:42:46 -04:00
Steven G. Johnson	0a0e166616	link	2024-09-29 21:26:42 -04:00
Steven G. Johnson	d94c8aeb09	typo	2024-09-29 21:24:12 -04:00
Steven G. Johnson	547b7c5739	some LS fit notes	2024-09-29 21:19:47 -04:00
Steven G. Johnson	7800272033	add eigenproblem examples	2024-02-15 14:42:47 -05:00
Steven G. Johnson	21d4885e64	Update pset4sol.ipynb: fix code comment	2023-09-01 13:49:02 -04:00
Steven G. Johnson	245bc6befd	link 0.30000000000000004 example	2023-05-15 14:10:06 -04:00
Steven G. Johnson	8d14eb5120	lecture 38 notes	2023-05-15 09:44:14 -04:00
Steven G. Johnson	5d63fb8438	lecture summary	2023-05-10 16:23:51 -04:00
Steven G. Johnson	f5b44d43e7	lecture 36 notes	2023-05-08 12:49:12 -04:00
Steven G. Johnson	e1c3af4f37	pset 6 solutions	2023-05-08 11:45:16 -04:00
Steven G. Johnson	76957f16dc	lecture 35 notes	2023-05-05 14:43:28 -04:00
Steven G. Johnson	adc374586b	lecture 34 notes	2023-05-03 14:36:01 -04:00
Steven G. Johnson	33481fffd0	lecture 33 notes	2023-05-01 17:40:09 -04:00
Steven G. Johnson	58931ea10a	missing scale factor	2023-05-01 08:48:02 -04:00
Steven G. Johnson	b4dcc299cd	pset 6	2023-04-28 20:13:13 -04:00