add pset 4

2023-03-20 16:29:10 -04:00 · 2023-03-20 16:29:10 -04:00 · 03cdaaf8db
commit 03cdaaf8db
parent 26c3a4eb4d
2 changed files with 170 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -234,7 +234,7 @@ Derivatives viewed as linear approximations have many important applications in

 * d(A⁻¹), adjoint problems, chain rules, and forward vs reverse mode (backpropagation) derivatives
 * [pset 3 solutions](psets/pset3sol.ipynb)
-* pset 4: coming soon, due 4/7
+* [pset 4](psets/pset4.ipynb), due 4/7

 Showed that d(A⁻¹)=–A⁻¹ dA A⁻¹.  (For example, if you have a matrix A(t) that depends on a scalar t∈ℝ, this tells you that d(A⁻¹)/dt = –A⁻¹ dA/dt A⁻¹.)   Applied this to computing ∇ₚf for scalar functions f(A⁻¹b) where A(p) depends on some parameters p∈ℝⁿ, and showed that ∂f/∂pᵢ = -(∇ₓf)ᵀA⁻¹(∂A/∂pᵢ)x.  Moreover, showed that it is critically important to evaluate this expression from *left to right*, i.e. first computing vᵀ=-(∇ₓf)ᵀA⁻¹m, which corresponds to solving the "adjoint (transposed) equation" Aᵀv=–∇ₓf.  By doing so, you can compute both f and ∇ₚf at the cost of only "two solves" with matrices A and Aᵀ.  This is critically important in enabling engineering/physics optimization, where you often want to optimize a property of the solution A⁻¹b of a physical model A depending on many design variables p (e.g. the distribution of materials in space).

--- a/psets/pset4.ipynb
+++ b/psets/pset4.ipynb
@ -0,0 +1,169 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "73fdabd6",
+   "metadata": {},
+   "source": [
+    "# 18.065 Problem Set 4\n",
+    "\n",
+    "Due Friday 4/7 at 1pm."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fb69bf7",
+   "metadata": {},
+   "source": [
+    "## Problem 1\n",
+    "\n",
+    "A \"convex set\" $S$ is one such that for any two points $x,y$ in $S$, the straight line connecting them is also in $S$.  That is, $\\alpha x + (1-\\alpha)y \\in S$ for all $\\alpha \\in [0,1]$.\n",
+    "\n",
+    "**(a)** If $f_i(x)$ is a [convex function](https://en.wikipedia.org/wiki/Convex_function) (as defined in e.g. lecture 16), explain why the constraint $f_i(x) \\le 0$ defines a convex set of feasible points.\n",
+    "\n",
+    "**(b)** Explain why the intersection of two convex sets is a convex set.  (Hence, the feasible set for many convex constraints is a convex set.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "188e8b83",
+   "metadata": {},
+   "source": [
+    "## Problem 2\n",
+    "\n",
+    "Suppose we are solving the $\\ell^1$-regularized least-square problem:\n",
+    "$$\n",
+    "\\min_{x\\in \\mathbb{R}^n} \\left( \\Vert b - Ax \\Vert_2^2 + \\lambda \\Vert x \\Vert_1 \\right)\n",
+    "$$\n",
+    "where $\\lambda > 0$ is some regularization parameter and $ \\Vert x \\Vert_1 = \\sum_i |x_i|$ is the $\\ell^1$ norm.\n",
+    "\n",
+    "Similar to the examples in 16, show that this can be converted into a an equivalent [quadratic programming (QP)](https://en.wikipedia.org/wiki/Quadratic_programming) problem — a convex quadratic objective with affine constraints — by introducing one or more dummy variables (an \"epigraph\" formulation).   This replaces the non-differentiable $\\ell^1$ problem with an equivalent *differentiable* problem with *differentiable constraints*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73427817",
+   "metadata": {},
+   "source": [
+    "## Problem 3\n",
+    "\n",
+    "Let\n",
+    "$$\n",
+    "A(p) = A_0 + \\underbrace{\\begin{pmatrix} p_1 & & & \\\\ & p_2 & & \\\\ & & \\ddots & \\\\ & & & p_m \\end{pmatrix}}_{\\text{diagm(p)}}\n",
+    "$$\n",
+    "where $A_0$ is some $m \\times m$ matrix and $p \\in \\mathbb{R}^m$ are $m$ parameters.\n",
+    "\n",
+    "Now, suppose we compute\n",
+    "$$\n",
+    "f(p) = g(\\underbrace{A(p)^{-1} b}_{x(p)})\n",
+    "$$\n",
+    "where $g(x) = x^T G x$, $b \\in \\mathbb{R}^m$ is some vector, and $G = G^T$ is some symmetric $m \\times m$ matrix.\n",
+    "\n",
+    "**(a)** What is $\\nabla f$?  Explain how you can compute *all m* components of $\\nabla f$ by solving only *two* $m \\times m$ systems followed by $\\Theta(m)$ additional work in *total*.\n",
+    "\n",
+    "**(b)** Implement your solution from (a) by filling in the function `∇f(p)` below, and check that it correctly predicts $df = f(p + dp) - f(p)$ (*approximately*) for a random small $dp$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "ac419822",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "-25.41770831865196"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "using LinearAlgebra\n",
+    "\n",
+    "# some A₀, b, G for m=5\n",
+    "m = 5\n",
+    "A₀ = randn(m,m)\n",
+    "b = randn(m)\n",
+    "G = randn(m,m); G = G' + G\n",
+    "\n",
+    "# our functions\n",
+    "g(x) = x' * G * x\n",
+    "f(p) = g((A₀ + Diagonal(p)) \\ b)\n",
+    "\n",
+    "p = randn(m) # some random parameters\n",
+    "f(p) # make sure it gives a number out"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "12609b9b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# your solution to (b):\n",
+    "\n",
+    "function ∇f(p)\n",
+    "    ????\n",
+    "end\n",
+    "\n",
+    "dp = randn(m) * 1e-8 # a random small dp\n",
+    "df = f(p + dp) - f(p)\n",
+    "\n",
+    "# check: ∇f approximately predicts df\n",
+    "# ???"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84c4326e",
+   "metadata": {},
+   "source": [
+    "## Problem 4\n",
+    "\n",
+    "In class, we considered steepest descent for $f(x) = \\kappa x_1^2 + x_2^2$ in $\\mathbb{R}^2$, and argued that for an arbitrary $x = [x_1,x_2]$ starting point and $\\kappa \\gg 1$, the steepest-descent step $x \\leftarrow x - sz$ is *approximately* $sz \\approx [x_1, x_2/\\kappa]$.\n",
+    "\n",
+    "**(a)** If $sz = [x_1, x_2/\\kappa]$ exactly, then on the next step we would have the new $x \\leftarrow x - sx = [0, (1-\\frac{1}{\\kappa} x_2)]$.   However, explain why a more careful calculation shows that the new $x - sx \\approx [O(1/\\kappa), (1-\\frac{1}{\\kappa} x_2)]$, i.e. the first component is proportional to $1/\\kappa$ to leading order in $1/\\kappa$.\n",
+    "\n",
+    "**(b)** If you start with an $x = [\\#/\\kappa, x_2]$, i.e. where the first component is proportional to $1/\\kappa$ and $\\#$ is some number of the same order of magnitude as $x_2$, show that after one steepest-descent step (for $\\kappa \\gg 1$) the $x_1$ component is *still* roughly order $1/\\kappa$ but of the opposite sign, and $x_2$ again subtracts a term roughly proportional to $x_2/\\kappa$.\n",
+    "\n",
+    "**(c)** Implement this steepest-descent process numerically for $\\kappa = 100$ and a starting point $x_1 = 0.01234567$ and $x_2 = 0.8910$.  Plot $100x_1$ and $x_2$ for 100 iterations of steepest descent.  A more careful analysis would show a convergence proportional to $\\left( \\frac{\\kappa - 1}{\\kappa + 1} \\right)^k$, where $k$ is the iteration number, following equation (4) in the Strang book — include this function for comparison on your plot."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d19b287",
+   "metadata": {},
+   "source": [
+    "## Problems 5, 6, … coming soon"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6ac54afe",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Julia 1.8.3",
+   "language": "julia",
+   "name": "julia-1.8"
+  },
+  "language_info": {
+   "file_extension": ".jl",
+   "mimetype": "application/julia",
+   "name": "julia",
+   "version": "1.8.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}