pset 1

2023-02-09 22:08:44 -05:00 · 2023-02-09 22:08:44 -05:00 · c920f093e8
commit c920f093e8
parent 3c31b09cf0
2 changed files with 238 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -43,6 +43,7 @@ useful study guide.
 * Matrix multiplication by blocks and columns-times-rows.  Complexity: standard algorithm for (m×p)⋅(p×n) is [Θ(mnp): roughly proportional](https://en.wikipedia.org/wiki/Big_O_notation) to mnp for large m,n,p, regardless of how we rearrange it into blocks. (There also [exist theoretically better](https://en.wikipedia.org/wiki/Computational_complexity_of_matrix_multiplication), but highly impractical, algorithms.)
 * Briefly reviewed the "famous four" matrix factorizations: [LU](https://en.wikipedia.org/wiki/LU_decomposition), [diagonalization XΛX⁻¹ or QΛQᵀ](https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix), [QR](https://en.wikipedia.org/wiki/QR_decomposition), and the [SVD UΣVᵀ](https://en.wikipedia.org/wiki/Singular_value_decomposition).  QR and QΛQᵀ in the columns-times-rows picture, especially QΛQᵀ (diagonalization for real A=Aᵀ) as a sum of symmetric rank-1 projections.
 * The [four fundamental subspaces](https://web.mit.edu/18.06/www/Essays/newpaper_ver3.pdf) for an m×n matrix A of rank r, mapping "inputs" x∈ℝⁿ to "outputs" Ax∈ℝᵐ: the "input" subspaces C(Aᵀ) (row space, dimension r) and its [orthogonal complement](https://en.wikipedia.org/wiki/Orthogonal_complement) N(A) (nullspace, dimension n–r); and the "output" subspaces C(A) (column space, dimension r) and its orthogonal complement N(Aᵀ) (left nullspace, dimension m–r).
+* [pset 1](psets/pset1.ipynb), due Friday Feb 17

 **Further reading**: Textbook 1.3–1.6. [OCW lecture 2](https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/resources/lecture-2-multiplying-and-factoring-matrices/).  If you haven't seen [matrix multiplication by blocks](https://www.math.ucdavis.edu/~linear/old/notes9.pdf) before, [here is a nice video](https://www.youtube.com/watch?v=KCUgWj5nhYc).

--- a/psets/pset1.ipynb
+++ b/psets/pset1.ipynb
@ -0,0 +1,237 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "08cd437b",
+   "metadata": {},
+   "source": [
+    "# 18.065 Pset 1\n",
+    "\n",
+    "Due Friday 2/17 at 1pm.   Submit in PDF format: a decent-quality scan/image of any handwritten solutions (e.g. get a scanner app on your phone or use a tablet), and a PDF printout of your Jupyter notebook showing your code and (clearly labeled) results."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d53b289",
+   "metadata": {},
+   "source": [
+    "## Problem 1 (4+4+4+4+4 points)\n",
+    "\n",
+    "Recall from class that multiplying an $m \\times p$ by a $p \\times n$ matrix costs $mnp$ scalar multiplications (and a similar number of additions) by the standard (practical) algorithms.\n",
+    "\n",
+    "Matrix multiplication is **not commutative** ($AB \\ne BA$ in general), but it **is associative**: $(AB)C=A(BC)$.   It turns out that where you put the parentheses (i.e. in *what order* you do the multiplications) can make a *huge* difference in computational cost.\n",
+    "\n",
+    "**(a)** If $x \\in \\mathbb{R}^n$ and $A,B$ are $n \\times n$ matrices, compare the scalar multiplication counts of $(AB)x$ vs. $A(Bx)$, i.e. if we do the multiplications in the order indicated by the parentheses.\n",
+    "\n",
+    "**(b)** If $x, b\\in \\mathbb{R}^n$, **how many scalar multiplications** does the computation $$p = (I - (xx^T)/(x^T x)) b$$ take if we *do it in the order indicated by the parentheses*?  (Note that dividing by a scalar $\\alpha$ is equivalent to multiplying by $\\alpha^{-1}$ at the negligible cost of one scalar division.)\n",
+    "\n",
+    "**(c)** Explain how to compute the *same* $p$ as in part (b) using as *few multiplications as possible*.  Outline the sequence of computational steps, and give the count of multiplications.\n",
+    "\n",
+    "**(d)** $p^T x = $ what?\n",
+    "\n",
+    "**(e)** Implement your algorithm from (c) in Julia, filling in the code below, and time it for $n=1000$ using the `@btime` macro from the [BenchmarkTools package](https://github.com/JuliaCI/BenchmarkTools.jl), along with the algorithm from part (b), following the outline below.  How does the ratio of the two times compare to your ratio of multiplication counts?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b149871c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "using LinearAlgebra, BenchmarkTools\n",
+    "\n",
+    "# algorithm from part (b)\n",
+    "function part_b(x, b)\n",
+    "    return (I - (x*x')*(x'*x)^-1) * b\n",
+    "end\n",
+    "\n",
+    "# algorithm from part (c)\n",
+    "function part_c(x, b)\n",
+    "    # CHANGE THIS:\n",
+    "    return x + b\n",
+    "end\n",
+    "\n",
+    "# test and benchmark on random vectors:\n",
+    "n = 1000\n",
+    "x, b = rand(n), rand(n)\n",
+    "\n",
+    "# test it first — should give same answer up to roundoff error\n",
+    "if part_c(x, b) ≈ part_b(x, b)\n",
+    "    println(\"Hooray, part (c) and part (b) agree!\")\n",
+    "else\n",
+    "    error(\"You made a mistake: part (c) and part (b) do not agree!\")\n",
+    "end\n",
+    "\n",
+    "# benchmark it:\n",
+    "\n",
+    "println(\"\\npart (b): \")\n",
+    "@btime part_b($x, $b);\n",
+    "\n",
+    "println(\"\\npart (c): \")\n",
+    "@btime part_c($x, $b);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfa41bf1",
+   "metadata": {},
+   "source": [
+    "## Problem 2 (8+4 points)\n",
+    "\n",
+    "**(a)** Describe the four fundamental subspaces of the **rank-1 matrix** $A = uv^T$ where $u \\in \\mathbb{R}^m$ and $n \\in \\mathbb{R}^n$.\n",
+    "\n",
+    "**(b)** For any column vectors $u,v \\in \\mathbb{R}^3$, the matrix $uv^T$ is rank 1, except when \\_\\_\\_\\_\\_\\_\\_\\_, in which case $uv^T$ has rank \\_\\_\\_\\_."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82b2c169",
+   "metadata": {},
+   "source": [
+    "## Problem 3 (5+4+4+4 points)\n",
+    "\n",
+    "**(a)** Pick the choices that makes this statement correct for arbitrary matrices A and B:  $C(AB)$ (*contains / is contained in*) the column space of (*A / B*).  Briefly justify your answer.\n",
+    "\n",
+    "**(b)** Suppose that $A$ is a $1000\\times 1000$ matrix of rank $< 10$.  Suppose we multiply it by 10 random vectors $x_1, x_2, \\ldots, x_{10}$, e.g. generated by `randn(1000)`.  How could we use the results to get a $10\\times 10$ matrix $C$ whose rank (almost certainly) matches $A$'s?\n",
+    "\n",
+    "**(c)** Suppose we instead make $1000\\times 10$ matrix $X$ whose columns are $x_1, x_2, \\ldots, x_{10}$.  Give a formula for the *same* matrix $C$ in terms of matrix products involving $A$ and $X$.\n",
+    "\n",
+    "**(d)** Fill in the code for $C$ below, and compare the biggest 10 singular values of $A$ (chosen to be rank ≈ 4 in this case) to the corresponding 10 singular values of $C$.  Does it match what you expect?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e7b32901",
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "using LinearAlgebra\n",
+    "\n",
+    "# random 1000x1000 matrix of rank 4\n",
+    "A = randn(1000, 4) * randn(4, 1000)\n",
+    "@show svdvals(A)[1:10]\n",
+    "\n",
+    "X = randn(1000, 10)\n",
+    "C = ???\n",
+    "@show svdvals(C)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee67e16b",
+   "metadata": {},
+   "source": [
+    "## Problem 4 (4+5+5 points)\n",
+    "\n",
+    "The famous Hadamard matrices are filled with ±1 and have orthogonal columns (orthonormal if we divide $H_n$ by $1/\\sqrt{n}$).  The first few are:\n",
+    "\\begin{align}\n",
+    " H_1 &= \\begin{pmatrix} 1 \\end{pmatrix}, \\\\\n",
+    "  H_2 &= \\begin{pmatrix}\n",
+    "    1 &  1 \\\\\n",
+    "    1 & -1\n",
+    "  \\end{pmatrix}, \\\\\n",
+    "  H_4 &= \\begin{pmatrix} H_2 & H_2 \\\\ H_2 & -H_2 \\end{pmatrix} = \\begin{pmatrix}\n",
+    "    1 &  1 &  1 &  1\\\\\n",
+    "    1 & -1 &  1 & -1\\\\\n",
+    "    1 &  1 & -1 & -1\\\\\n",
+    "    1 & -1 & -1 &  1\n",
+    "  \\end{pmatrix} \\, .\n",
+    "\\end{align}\n",
+    "Notice that (for power-of-2 sizes), they are built up \"recursively\" out of smaller Hadamard matrices.  Multiplying a vector by a Hadamard matrix requires no multiplications at all, only additions/subtractions.\n",
+    "\n",
+    "**(a)** If you multiply $H_4 x$ for some $x \\in \\mathbb{R}^4$ by the normal \"rows-times-columns\" method (*without* exploiting any special patterns), exactly how many scalar additions/subtractions are required?\n",
+    "\n",
+    "**(b)** Let's break $x$ into two blocks: $x = \\begin{pmatrix} x_1 \\\\ x_2 \\end{pmatrix}$ for $x_1, x_2 \\in \\mathbb{R}^2$.  Write out $H_4 x$ in terms of a sequence of $2\\times 2$ block multiplications with $\\pm H_2$.  You'll notice that some of these $2\\times 2$ multiplications are repeated.  If we re-use these repeated multiplications rather than doing them twice, we can save a bunch of arithmetic — what is the new count of scalar additions/subtractions if you do this?\n",
+    "\n",
+    "**(c)** Similarly, the $8\\times 8$ Hadamard matrix $H_8 = \\begin{pmatrix} H_4 & H_4 \\\\ H_4 & -H_4 \\end{pmatrix}$ is made out of $H_4$ matrices.   To multiply it by a vector $y \\in \\mathbb{R}^8$, the naive rows-times columns method would require \\_\\_\\_\\_ scalar additions/subtractions, whereas if you broke them up first into blocks of 4, used your solution from (b), and then re-used any repeated $H_4$ products, it would only require \\_\\_\\_\\_ scalar additions/subtractions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c2a2b8eb",
+   "metadata": {},
+   "source": [
+    "## Problem 5 (5+5 points)\n",
+    "\n",
+    "The famous \"discrete Fourier transform\" matrix $F$ has columns that are actually eigenvectors of the (unitary) permutation matrix:\n",
+    "$$\n",
+    "P = \\begin{pmatrix}  & 1 & & \\\\ & &1 &  \\\\ & &  &1 \\\\ 1 &  & & \\end{pmatrix}\n",
+    "$$\n",
+    "for the $4\\times 4$ case, and similarly for larger matrices.\n",
+    "\n",
+    "**(a)** One way of saying way Fourier transforms are practically important is that they *diagonalize* (are eigenvectors of) matrices that *commute* with $P$.  If $A$ is a $4\\times 4$ matrix whose first row is $(a\\, b\\, c\\, d)$\n",
+    "$$\n",
+    "A = \\begin{pmatrix} a & b & c & d \\\\ ? & ? &? & ? \\\\ ? &? & ? &? \\\\ ? & ? & ? &? \\end{pmatrix}\n",
+    "$$\n",
+    "that commutes with $P$ (i.e. $AP=PA$), what must be true of the other (\"?\") entries of $A$?\n",
+    "\n",
+    "**(b)** Fill in the matrix `A` in Julia below and fill in and run the code to check that it commutes with $P$ and is diagonalized by $F$:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e7222137",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a, b, c, d = 1, 7, 3, 2    # 4 arbitrarily chosen values\n",
+    "\n",
+    "P = [0 1 0 0\n",
+    "     0 0 1 0\n",
+    "     0 0 0 1\n",
+    "     1 0 0 0]\n",
+    "\n",
+    "F = im .^ ((0:3) .* (0:3)') # the 4×4 Fourier matrix\n",
+    "\n",
+    "# fill in:\n",
+    "A = [a b c d\n",
+    "     ? ? ? ?\n",
+    "     ? ? ? ?\n",
+    "     ? ? ? ?]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "25dc7673",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# check:\n",
+    "P * A == A * P"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0ae85697",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# check that F diagonalizes A.  (How?)\n",
+    "\n",
+    "????"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Julia 1.8.3",
+   "language": "julia",
+   "name": "julia-1.8"
+  },
+  "language_info": {
+   "file_extension": ".jl",
+   "mimetype": "application/julia",
+   "name": "julia",
+   "version": "1.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}