From c920f093e8be5de3d9b59056d24bcfec1e261007 Mon Sep 17 00:00:00 2001 From: "Steven G. Johnson" Date: Thu, 9 Feb 2023 22:08:44 -0500 Subject: [PATCH] pset 1 --- README.md | 1 + psets/pset1.ipynb | 237 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 238 insertions(+) create mode 100644 psets/pset1.ipynb diff --git a/README.md b/README.md index 9528ccf..e297025 100644 --- a/README.md +++ b/README.md @@ -43,6 +43,7 @@ useful study guide. * Matrix multiplication by blocks and columns-times-rows. Complexity: standard algorithm for (m×p)⋅(p×n) is [Θ(mnp): roughly proportional](https://en.wikipedia.org/wiki/Big_O_notation) to mnp for large m,n,p, regardless of how we rearrange it into blocks. (There also [exist theoretically better](https://en.wikipedia.org/wiki/Computational_complexity_of_matrix_multiplication), but highly impractical, algorithms.) * Briefly reviewed the "famous four" matrix factorizations: [LU](https://en.wikipedia.org/wiki/LU_decomposition), [diagonalization XΛX⁻¹ or QΛQᵀ](https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix), [QR](https://en.wikipedia.org/wiki/QR_decomposition), and the [SVD UΣVᵀ](https://en.wikipedia.org/wiki/Singular_value_decomposition). QR and QΛQᵀ in the columns-times-rows picture, especially QΛQᵀ (diagonalization for real A=Aᵀ) as a sum of symmetric rank-1 projections. * The [four fundamental subspaces](https://web.mit.edu/18.06/www/Essays/newpaper_ver3.pdf) for an m×n matrix A of rank r, mapping "inputs" x∈ℝⁿ to "outputs" Ax∈ℝᵐ: the "input" subspaces C(Aᵀ) (row space, dimension r) and its [orthogonal complement](https://en.wikipedia.org/wiki/Orthogonal_complement) N(A) (nullspace, dimension n–r); and the "output" subspaces C(A) (column space, dimension r) and its orthogonal complement N(Aᵀ) (left nullspace, dimension m–r). +* [pset 1](psets/pset1.ipynb), due Friday Feb 17 **Further reading**: Textbook 1.3–1.6. [OCW lecture 2](https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/resources/lecture-2-multiplying-and-factoring-matrices/). If you haven't seen [matrix multiplication by blocks](https://www.math.ucdavis.edu/~linear/old/notes9.pdf) before, [here is a nice video](https://www.youtube.com/watch?v=KCUgWj5nhYc). diff --git a/psets/pset1.ipynb b/psets/pset1.ipynb new file mode 100644 index 0000000..ffdcf81 --- /dev/null +++ b/psets/pset1.ipynb @@ -0,0 +1,237 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "08cd437b", + "metadata": {}, + "source": [ + "# 18.065 Pset 1\n", + "\n", + "Due Friday 2/17 at 1pm. Submit in PDF format: a decent-quality scan/image of any handwritten solutions (e.g. get a scanner app on your phone or use a tablet), and a PDF printout of your Jupyter notebook showing your code and (clearly labeled) results." + ] + }, + { + "cell_type": "markdown", + "id": "0d53b289", + "metadata": {}, + "source": [ + "## Problem 1 (4+4+4+4+4 points)\n", + "\n", + "Recall from class that multiplying an $m \\times p$ by a $p \\times n$ matrix costs $mnp$ scalar multiplications (and a similar number of additions) by the standard (practical) algorithms.\n", + "\n", + "Matrix multiplication is **not commutative** ($AB \\ne BA$ in general), but it **is associative**: $(AB)C=A(BC)$. It turns out that where you put the parentheses (i.e. in *what order* you do the multiplications) can make a *huge* difference in computational cost.\n", + "\n", + "**(a)** If $x \\in \\mathbb{R}^n$ and $A,B$ are $n \\times n$ matrices, compare the scalar multiplication counts of $(AB)x$ vs. $A(Bx)$, i.e. if we do the multiplications in the order indicated by the parentheses.\n", + "\n", + "**(b)** If $x, b\\in \\mathbb{R}^n$, **how many scalar multiplications** does the computation $$p = (I - (xx^T)/(x^T x)) b$$ take if we *do it in the order indicated by the parentheses*? (Note that dividing by a scalar $\\alpha$ is equivalent to multiplying by $\\alpha^{-1}$ at the negligible cost of one scalar division.)\n", + "\n", + "**(c)** Explain how to compute the *same* $p$ as in part (b) using as *few multiplications as possible*. Outline the sequence of computational steps, and give the count of multiplications.\n", + "\n", + "**(d)** $p^T x = $ what?\n", + "\n", + "**(e)** Implement your algorithm from (c) in Julia, filling in the code below, and time it for $n=1000$ using the `@btime` macro from the [BenchmarkTools package](https://github.com/JuliaCI/BenchmarkTools.jl), along with the algorithm from part (b), following the outline below. How does the ratio of the two times compare to your ratio of multiplication counts?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b149871c", + "metadata": {}, + "outputs": [], + "source": [ + "using LinearAlgebra, BenchmarkTools\n", + "\n", + "# algorithm from part (b)\n", + "function part_b(x, b)\n", + " return (I - (x*x')*(x'*x)^-1) * b\n", + "end\n", + "\n", + "# algorithm from part (c)\n", + "function part_c(x, b)\n", + " # CHANGE THIS:\n", + " return x + b\n", + "end\n", + "\n", + "# test and benchmark on random vectors:\n", + "n = 1000\n", + "x, b = rand(n), rand(n)\n", + "\n", + "# test it first — should give same answer up to roundoff error\n", + "if part_c(x, b) ≈ part_b(x, b)\n", + " println(\"Hooray, part (c) and part (b) agree!\")\n", + "else\n", + " error(\"You made a mistake: part (c) and part (b) do not agree!\")\n", + "end\n", + "\n", + "# benchmark it:\n", + "\n", + "println(\"\\npart (b): \")\n", + "@btime part_b($x, $b);\n", + "\n", + "println(\"\\npart (c): \")\n", + "@btime part_c($x, $b);" + ] + }, + { + "cell_type": "markdown", + "id": "bfa41bf1", + "metadata": {}, + "source": [ + "## Problem 2 (8+4 points)\n", + "\n", + "**(a)** Describe the four fundamental subspaces of the **rank-1 matrix** $A = uv^T$ where $u \\in \\mathbb{R}^m$ and $n \\in \\mathbb{R}^n$.\n", + "\n", + "**(b)** For any column vectors $u,v \\in \\mathbb{R}^3$, the matrix $uv^T$ is rank 1, except when \\_\\_\\_\\_\\_\\_\\_\\_, in which case $uv^T$ has rank \\_\\_\\_\\_." + ] + }, + { + "cell_type": "markdown", + "id": "82b2c169", + "metadata": {}, + "source": [ + "## Problem 3 (5+4+4+4 points)\n", + "\n", + "**(a)** Pick the choices that makes this statement correct for arbitrary matrices A and B: $C(AB)$ (*contains / is contained in*) the column space of (*A / B*). Briefly justify your answer.\n", + "\n", + "**(b)** Suppose that $A$ is a $1000\\times 1000$ matrix of rank $< 10$. Suppose we multiply it by 10 random vectors $x_1, x_2, \\ldots, x_{10}$, e.g. generated by `randn(1000)`. How could we use the results to get a $10\\times 10$ matrix $C$ whose rank (almost certainly) matches $A$'s?\n", + "\n", + "**(c)** Suppose we instead make $1000\\times 10$ matrix $X$ whose columns are $x_1, x_2, \\ldots, x_{10}$. Give a formula for the *same* matrix $C$ in terms of matrix products involving $A$ and $X$.\n", + "\n", + "**(d)** Fill in the code for $C$ below, and compare the biggest 10 singular values of $A$ (chosen to be rank ≈ 4 in this case) to the corresponding 10 singular values of $C$. Does it match what you expect?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7b32901", + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "using LinearAlgebra\n", + "\n", + "# random 1000x1000 matrix of rank 4\n", + "A = randn(1000, 4) * randn(4, 1000)\n", + "@show svdvals(A)[1:10]\n", + "\n", + "X = randn(1000, 10)\n", + "C = ???\n", + "@show svdvals(C)" + ] + }, + { + "cell_type": "markdown", + "id": "ee67e16b", + "metadata": {}, + "source": [ + "## Problem 4 (4+5+5 points)\n", + "\n", + "The famous Hadamard matrices are filled with ±1 and have orthogonal columns (orthonormal if we divide $H_n$ by $1/\\sqrt{n}$). The first few are:\n", + "\\begin{align}\n", + " H_1 &= \\begin{pmatrix} 1 \\end{pmatrix}, \\\\\n", + " H_2 &= \\begin{pmatrix}\n", + " 1 & 1 \\\\\n", + " 1 & -1\n", + " \\end{pmatrix}, \\\\\n", + " H_4 &= \\begin{pmatrix} H_2 & H_2 \\\\ H_2 & -H_2 \\end{pmatrix} = \\begin{pmatrix}\n", + " 1 & 1 & 1 & 1\\\\\n", + " 1 & -1 & 1 & -1\\\\\n", + " 1 & 1 & -1 & -1\\\\\n", + " 1 & -1 & -1 & 1\n", + " \\end{pmatrix} \\, .\n", + "\\end{align}\n", + "Notice that (for power-of-2 sizes), they are built up \"recursively\" out of smaller Hadamard matrices. Multiplying a vector by a Hadamard matrix requires no multiplications at all, only additions/subtractions.\n", + "\n", + "**(a)** If you multiply $H_4 x$ for some $x \\in \\mathbb{R}^4$ by the normal \"rows-times-columns\" method (*without* exploiting any special patterns), exactly how many scalar additions/subtractions are required?\n", + "\n", + "**(b)** Let's break $x$ into two blocks: $x = \\begin{pmatrix} x_1 \\\\ x_2 \\end{pmatrix}$ for $x_1, x_2 \\in \\mathbb{R}^2$. Write out $H_4 x$ in terms of a sequence of $2\\times 2$ block multiplications with $\\pm H_2$. You'll notice that some of these $2\\times 2$ multiplications are repeated. If we re-use these repeated multiplications rather than doing them twice, we can save a bunch of arithmetic — what is the new count of scalar additions/subtractions if you do this?\n", + "\n", + "**(c)** Similarly, the $8\\times 8$ Hadamard matrix $H_8 = \\begin{pmatrix} H_4 & H_4 \\\\ H_4 & -H_4 \\end{pmatrix}$ is made out of $H_4$ matrices. To multiply it by a vector $y \\in \\mathbb{R}^8$, the naive rows-times columns method would require \\_\\_\\_\\_ scalar additions/subtractions, whereas if you broke them up first into blocks of 4, used your solution from (b), and then re-used any repeated $H_4$ products, it would only require \\_\\_\\_\\_ scalar additions/subtractions." + ] + }, + { + "cell_type": "markdown", + "id": "c2a2b8eb", + "metadata": {}, + "source": [ + "## Problem 5 (5+5 points)\n", + "\n", + "The famous \"discrete Fourier transform\" matrix $F$ has columns that are actually eigenvectors of the (unitary) permutation matrix:\n", + "$$\n", + "P = \\begin{pmatrix} & 1 & & \\\\ & &1 & \\\\ & & &1 \\\\ 1 & & & \\end{pmatrix}\n", + "$$\n", + "for the $4\\times 4$ case, and similarly for larger matrices.\n", + "\n", + "**(a)** One way of saying way Fourier transforms are practically important is that they *diagonalize* (are eigenvectors of) matrices that *commute* with $P$. If $A$ is a $4\\times 4$ matrix whose first row is $(a\\, b\\, c\\, d)$\n", + "$$\n", + "A = \\begin{pmatrix} a & b & c & d \\\\ ? & ? &? & ? \\\\ ? &? & ? &? \\\\ ? & ? & ? &? \\end{pmatrix}\n", + "$$\n", + "that commutes with $P$ (i.e. $AP=PA$), what must be true of the other (\"?\") entries of $A$?\n", + "\n", + "**(b)** Fill in the matrix `A` in Julia below and fill in and run the code to check that it commutes with $P$ and is diagonalized by $F$:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7222137", + "metadata": {}, + "outputs": [], + "source": [ + "a, b, c, d = 1, 7, 3, 2 # 4 arbitrarily chosen values\n", + "\n", + "P = [0 1 0 0\n", + " 0 0 1 0\n", + " 0 0 0 1\n", + " 1 0 0 0]\n", + "\n", + "F = im .^ ((0:3) .* (0:3)') # the 4×4 Fourier matrix\n", + "\n", + "# fill in:\n", + "A = [a b c d\n", + " ? ? ? ?\n", + " ? ? ? ?\n", + " ? ? ? ?]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25dc7673", + "metadata": {}, + "outputs": [], + "source": [ + "# check:\n", + "P * A == A * P" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0ae85697", + "metadata": {}, + "outputs": [], + "source": [ + "# check that F diagonalizes A. (How?)\n", + "\n", + "????" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Julia 1.8.3", + "language": "julia", + "name": "julia-1.8" + }, + "language_info": { + "file_extension": ".jl", + "mimetype": "application/julia", + "name": "julia", + "version": "1.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}