This commit is contained in:
Steven G. Johnson
2023-03-06 17:00:54 -05:00
parent 6f45471a73
commit 0ac08c1459
2 changed files with 170 additions and 1 deletions

View File

@@ -174,7 +174,7 @@ The *nice* case of diagonalization is when you have **orthonormal eigenvectors**
- Randomized linear algebra: by multiplying A on the left/right by small random wide/thin matrices, carefully chosen, we can construct an approximate "sketch" of A that can be used to estimate the SVD, solutions to least-squares, etcetera, and can also accelerate iterative solvers.
- Tricks for special cases: there are various specialized techniques for convolution/circulant matrices (via FFTs), [banded matrices](https://en.wikipedia.org/wiki/Band_matrix) (linear-time methods), and low-rank updates ([ShermanMorrison formula](https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula))
* [pset 2 solutions](psets/pset2sol.ipynb)
* pset 3 (due 3/17): coming soon
* [pset 3](psets/pset3.ipynb) (due 3/17)
**Further reading:** For GramSchmidt and QR, see further reading for lecture 9. Texbook section II.1, [OCW video lecture 10](https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/resources/lecture-10-survey-of-difficulties-with-ax-b/). Sparse-direct solvers are described in detail by the book *Direct Methods for Sparse Linear Systems* by Davis. Iterative methods: More advanced treatments include the book *Numerical Linear Algebra* by Trefethen and Bao, and surveys of algorithms can be found in the *Templates* books for [Ax=b](http://www.netlib.org/linalg/html_templates/Templates.html) and [Ax=λx](http://web.cs.ucdavis.edu/~bai/ET/contents.html). [Some crude rules of thumb](https://github.com/mitmath/18335/blob/spring20/notes/solver-options.pdf) for solving linear systems (from 18.335 spring 2020).

169
psets/pset3.ipynb Normal file
View File

@@ -0,0 +1,169 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "21a9eae2",
"metadata": {},
"source": [
"# 18.065 Problem Set 3\n",
"\n",
"Due Friday, March 17 at 1pm."
]
},
{
"cell_type": "markdown",
"id": "a19ccce3",
"metadata": {},
"source": [
"## Problem 1 (4+4+4+4 points)\n",
"\n",
"Suppose $x = A^{-1} b$ for an invertible $m \\times m$ square matrix $A$. If we perturb the right hand-side $b$ to $b + \\Delta b$, the new solution is $x+\\Delta x$ where $\\Delta x = A^{-1} \\Delta b$.\n",
"\n",
"In class, I claimed that\n",
"$$\n",
"\\frac{\\Vert \\Delta x \\Vert_2}{\\Vert x \\Vert_2} \\le \\kappa(A) \\frac{\\Vert \\Delta b \\Vert_2}{\\Vert b \\Vert_2}\n",
"$$\n",
"where $\\kappa{A} = \\sigma_1 / \\sigma_m$ is the condition number of $A$. In this problem, you will prove that fact.\n",
"\n",
"**(a)** Show that $\\Vert \\Delta x \\Vert_2 \\le \\Vert A^{-1} \\Vert_2 \\, \\Vert \\Delta b \\Vert_2$. (Hint: this should follow easily from the definition of the induced norm, which you studied in pset 2.) Using the same proof, also show that $\\Vert b \\Vert_2 \\le \\Vert A \\Vert_2 \\, \\Vert x \\Vert_2$\n",
"\n",
"**(b)** Why does (a) tell you that $\\frac{\\Vert \\Delta x \\Vert_2}{\\Vert x \\Vert_2} \\le \\Vert A \\Vert_2 \\, \\Vert A^{-1} \\Vert_2 \\frac{\\Vert \\Delta b \\Vert_2}{\\Vert b \\Vert_2}$?\n",
"\n",
"**(c)** Why does $\\Vert A \\Vert_2 \\, \\Vert A^{-1} \\Vert_2 = \\sigma_1 / \\sigma_m = \\kappa(A)$, the ratio of the largest to the smallest singular values of $A$? Why does $\\kappa(A) = \\kappa(A^{-1})$?\n",
"\n",
"**(d)** The following code generates a random 1000x1000 matrix with a condition number of $10^6$. How close does $\\frac{\\Vert \\Delta x \\Vert_2 \\, \\Vert b \\Vert_2}{\\Vert x \\Vert_2\\, \\Vert \\Delta b \\Vert_2}$ get to the upper bound above if you try a random $\\Delta b$ vector for a random $b$? Change the `????` lines in the code below to choose a *particular* $b$ and $\\Delta b$ such that $\\frac{\\Vert \\Delta x \\Vert_2 \\, \\Vert b \\Vert_2}{\\Vert x \\Vert_2\\, \\Vert \\Delta b \\Vert_2}$ reaches the upper bound."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0c203c3",
"metadata": {},
"outputs": [],
"source": [
"using LinearAlgebra\n",
"\n",
"m = 1000\n",
"U = qr(randn(1000,1000)).Q # random unitary U\n",
"V = qr(randn(1000,1000)).Q # random unitary V\n",
"σ = exp10.(range(0, -6, length=m)) # log-spaced singular vals from 1 to 10⁻⁶\n",
"A = U * Diagonal(σ) * V' # construct A from its SVD\n",
"@show cond(A) # condition number ≈ 10⁶\n",
"\n",
"b = randn(m) # a random b\n",
"x = A \\ b\n",
"\n",
"Δb = randn(m) * 1e-3 # random perturbation\n",
"Δx = A \\ Δb # change in solution\n",
"\n",
"(norm(Δx) * norm(b)) / (norm(x) * norm(Δb)) # ‖Δx‖₂‖b‖₂ / ‖x‖₂‖Δb‖₂"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3631ff5",
"metadata": {},
"outputs": [],
"source": [
"# change this code so that ‖Δx‖₂‖b‖₂ / ‖x‖₂‖Δb‖₂ attains its upper bound.\n",
"b = ????\n",
"x = A \\ b\n",
"Δb = ????\n",
"Δx = A \\ Δb\n",
"(norm(Δx) * norm(b)) / (norm(x) * norm(Δb)) # ‖Δx‖₂‖b‖₂ / ‖x‖₂‖Δb‖₂"
]
},
{
"cell_type": "markdown",
"id": "b094adc3",
"metadata": {},
"source": [
"## Problem 2 (10 points)\n",
"\n",
"In class, I said that solving least-square problems via $\\hat{R}\\hat{x} = \\hat{Q}^* b$ using the thin QR factorization ($A = \\hat{Q} \\hat{R}$ where both $A$ and $\\hat{Q}$ are $m \\times n$ with rank $n$) is good because it avoids squaring the condition number.\n",
"\n",
"In particular, I said $\\kappa(A^T A) = \\kappa(A)^2$ but $\\kappa(\\hat{R}) = \\kappa(A)$. Derive this. "
]
},
{
"cell_type": "markdown",
"id": "bffaf19f",
"metadata": {},
"source": [
"## Problem 3 (10 points)\n",
"\n",
"Another form of regularized least-squares problem is to assume you know something about what the solution $x$ should look like, i.e. you have a \"prior\" $x_p$ guess of the solution. Suppose you then want to solve\n",
"$$\n",
"\\min_x \\left(\\Vert b - Ax \\Vert_2 + \\delta^2 \\Vert x - x_p \\Vert_2 \\right)\n",
"$$\n",
"where $\\delta \\ne 0$ is a regularization strength (larger δ means that the solution will be closer to $x_p$).\n",
"\n",
"Show how this is equivalent to solving an equation of the form $(A^T A + ?)\\hat{x} = ?$. (Very similar to how we handled the Tikhonov regularization in class, which corresponds to the special case $x_p = 0$!)"
]
},
{
"cell_type": "markdown",
"id": "5d59e8a0",
"metadata": {},
"source": [
"## Problem 4 (10 points)\n",
"\n",
"In pset 1, problem 3, you took a $1000 \\times 1000$ matrix $A$ of rank 4 and multiplied it by a random $1000 \\times 10$ matrix $X$, and we argued that the rank of $A$ almost certainly equalled the rank of $AX$ or $(AX)^T (AX)$ or $X^T AX$. However, the *values* of the singular values were not the same.\n",
"\n",
"In this, problem, you'll see what happens if you orthonormalize the outputs $AX$. We can orthonormalize $AX$ using `Q = Matrix(qr(A*X).Q)` in Julia, which forms the thin QR factorization.\n",
"\n",
"For the same rank-4 `A` and random `X` as in pset 1, problem 3 (or at least, generated randomly by the same code, reproduced below), compute `Q` and use it to obtain a *much smaller matrix* (e.g. 10x10, 1000x10, or 10x1000) whose 4 biggest singular values *nearly match* those of `A`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74797236",
"metadata": {},
"outputs": [],
"source": [
"using LinearAlgebra\n",
"A = randn(1000, 4) * randn(4, 1000) # random 1000x1000 matrix of rank 4\n",
"@show svdvals(A)[1:6]\n",
"\n",
"X = randn(1000, 10)\n",
"Q = Matrix(qr(A*X).Q)\n",
"\n",
"S = ???? # much smaller matrix using Q and/or A and/or X\n",
"@show svdvals(S)[1:6]"
]
},
{
"cell_type": "markdown",
"id": "88ab5b5e",
"metadata": {},
"source": [
"## Problems 5, 6, etc: coming soon"
]
},
{
"cell_type": "markdown",
"id": "85e81fbe",
"metadata": {},
"source": [
"More problems will be posted by Saturday, March 11."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.8.3",
"language": "julia",
"name": "julia-1.8"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}