lecture 14 notes

2023-03-08 14:24:49 -05:00 · 2023-03-08 14:24:49 -05:00 · e24b37216b
commit e24b37216b
parent 0ac08c1459
1 changed files with 14 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -184,3 +184,17 @@ The *nice* case of diagonalization is when you have **orthonormal eigenvectors**
 * Krylov methods: defined [Krylov subspaces](https://en.wikipedia.org/wiki/Krylov_subspace) reachable by iterative algorithms, defined a Krylov algorithm (loosely) an iterative algorithm that finds the "best" solution in the whole Krylov space (possibly approximately) on the n-th step.  Gave [power iteration](https://en.wikipedia.org/wiki/Power_iteration) for largest |λ| as an example of something *not* a Krylov method.  Explained why the basis (b Ab A²b ⋯) is a poor (ill-conditioned) choice, and instead explained the [Arnoldi iteration](https://en.wikipedia.org/wiki/Arnoldi_iteration) to find an orthonormal basis Qₙ by (essentially) Gram–Schmidt, leading to the [GMRES algorithm](https://en.wikipedia.org/wiki/Generalized_minimal_residual_method) for Ax=b.

 **Further reading:** Arnoldi iterations and GMRES are covered in the Strang textbook section II.1, and briefly in [OCW lecture 12](https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/resources/lecture-12-computing-eigenvalues-and-singular-values/); much more detail is found other sources (Trefethen, etc.) noted in the further reading for Lecture 12.  A review of randomized linear algebra can be found in the Strang textbook sec. II.4, and also in [Halko, Martinsson, and Tropp (2011)](https://epubs.siam.org/doi/10.1137/090771806).  A recent paper on a variety of new randomized algorithms, e.g. for "sketched" least-square problems or to accelerate iterative algorithms like GMRES, is [Nakatsukasa and Tropp (2022)](https://arxiv.org/pdf/2111.00113.pdf).  A nice review of the randomized SVD can be found in a blog post by [Gregory Gundersen (2019)](https://gregorygundersen.com/blog/2019/01/17/randomized-svd/).
+
+
+## Lecture 14 (Mar 8)
+
+* Krylov wrap-up:
+  - GMRES caveats: storage is Θ(mn) after n steps, and cost of orthogonalization is Θ(mn²), so in practice one is often limited to n ≲ 100.  Workarounds include: "[restarted GMRES](https://personal.math.vt.edu/embree/39961.pdf)", randomized "[sketched GMRES](https://arxiv.org/pdf/2111.00113.pdf)", and approximate Krylov methods such as biCGSTAB(ℓ), QMR, or DQGMRES.   If A is Hermitian positive-definite, however, then there is ideal Krylov method called [conjugate gradient (CG)](https://en.wikipedia.org/wiki/Conjugate_gradient_method) that "magically" searches the whole Krylov space using only the two most recent search directions on each iteration; CG is [closely related](https://www.sciencedirect.com/science/article/abs/pii/S0893608003001709) to the "momentum" terms used in stochastic gradient descent / machine learning (covered later in 18.065).
+  - GMRES convergence theory is complicated (see e.g. lecture 35 in [Trefethen & Bau](https://people.maths.ox.ac.uk/trefethen/text.html)), but basically it converges faster if the eigenvalues are mostly "clustered" (making A more like I).  You can therefore accelerate convergence by finding a [preconditioner](https://en.wikipedia.org/wiki/Preconditioner) matrix M such that MA has more-clustered eigenvalues (M is a "crude inverse" of A), and then solve MAx=Mb instead of Ax=b.  This can accelerate convergence by orders of magnitude, but finding a good preconditioner is a tricky problem-dependent task.
+  - Krylov methods also exist for eigenproblems (e.g. Arnoldi and Jacobi-Davidson methods, or Lanczos and LOPCG for Hermitian problems), the SVD (requiring both Ax and Aᵀy operations), least-squares problems, and so on.
+* Randomized linear algebra:
+  - Randomized SVD, as covered in these notes by [Gregory Gundersen (2019)](https://gregorygundersen.com/blog/2019/01/17/randomized-svd/)
+  - Randomized matrix multiplication AB ≈ random sampling (col j of A)(row j of B)/pⱼ with probability pⱼ. Choosing pⱼ proportional to ‖col j of A‖⋅‖row j of B‖ minimizes the variance of this estimate!  See Strang textbook section II.4 and [OCW lecture 13](https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/resources/lecture-13-randomized-matrix-multiplication/).
+
+**Further reading:** See links above, and further reading from lecture 13.
+