add pset 4

This commit is contained in:
Steven G. Johnson 2023-03-20 16:29:10 -04:00
parent 26c3a4eb4d
commit 03cdaaf8db
2 changed files with 170 additions and 1 deletions

View File

@ -234,7 +234,7 @@ Derivatives viewed as linear approximations have many important applications in
* d(A⁻¹), adjoint problems, chain rules, and forward vs reverse mode (backpropagation) derivatives
* [pset 3 solutions](psets/pset3sol.ipynb)
* pset 4: coming soon, due 4/7
* [pset 4](psets/pset4.ipynb), due 4/7
Showed that d(A⁻¹)=A⁻¹ dA A⁻¹. (For example, if you have a matrix A(t) that depends on a scalar t∈, this tells you that d(A⁻¹)/dt = A⁻¹ dA/dt A⁻¹.) Applied this to computing ∇ₚf for scalar functions f(A⁻¹b) where A(p) depends on some parameters p∈ⁿ, and showed that ∂f/∂pᵢ = -(∇ₓf)ᵀA⁻¹(∂A/∂pᵢ)x. Moreover, showed that it is critically important to evaluate this expression from *left to right*, i.e. first computing vᵀ=-(∇ₓf)ᵀA⁻¹m, which corresponds to solving the "adjoint (transposed) equation" Aᵀv=∇ₓf. By doing so, you can compute both f and ∇ₚf at the cost of only "two solves" with matrices A and Aᵀ. This is critically important in enabling engineering/physics optimization, where you often want to optimize a property of the solution A⁻¹b of a physical model A depending on many design variables p (e.g. the distribution of materials in space).

169
psets/pset4.ipynb Normal file
View File

@ -0,0 +1,169 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "73fdabd6",
"metadata": {},
"source": [
"# 18.065 Problem Set 4\n",
"\n",
"Due Friday 4/7 at 1pm."
]
},
{
"cell_type": "markdown",
"id": "0fb69bf7",
"metadata": {},
"source": [
"## Problem 1\n",
"\n",
"A \"convex set\" $S$ is one such that for any two points $x,y$ in $S$, the straight line connecting them is also in $S$. That is, $\\alpha x + (1-\\alpha)y \\in S$ for all $\\alpha \\in [0,1]$.\n",
"\n",
"**(a)** If $f_i(x)$ is a [convex function](https://en.wikipedia.org/wiki/Convex_function) (as defined in e.g. lecture 16), explain why the constraint $f_i(x) \\le 0$ defines a convex set of feasible points.\n",
"\n",
"**(b)** Explain why the intersection of two convex sets is a convex set. (Hence, the feasible set for many convex constraints is a convex set.)"
]
},
{
"cell_type": "markdown",
"id": "188e8b83",
"metadata": {},
"source": [
"## Problem 2\n",
"\n",
"Suppose we are solving the $\\ell^1$-regularized least-square problem:\n",
"$$\n",
"\\min_{x\\in \\mathbb{R}^n} \\left( \\Vert b - Ax \\Vert_2^2 + \\lambda \\Vert x \\Vert_1 \\right)\n",
"$$\n",
"where $\\lambda > 0$ is some regularization parameter and $ \\Vert x \\Vert_1 = \\sum_i |x_i|$ is the $\\ell^1$ norm.\n",
"\n",
"Similar to the examples in 16, show that this can be converted into a an equivalent [quadratic programming (QP)](https://en.wikipedia.org/wiki/Quadratic_programming) problem — a convex quadratic objective with affine constraints — by introducing one or more dummy variables (an \"epigraph\" formulation). This replaces the non-differentiable $\\ell^1$ problem with an equivalent *differentiable* problem with *differentiable constraints*."
]
},
{
"cell_type": "markdown",
"id": "73427817",
"metadata": {},
"source": [
"## Problem 3\n",
"\n",
"Let\n",
"$$\n",
"A(p) = A_0 + \\underbrace{\\begin{pmatrix} p_1 & & & \\\\ & p_2 & & \\\\ & & \\ddots & \\\\ & & & p_m \\end{pmatrix}}_{\\text{diagm(p)}}\n",
"$$\n",
"where $A_0$ is some $m \\times m$ matrix and $p \\in \\mathbb{R}^m$ are $m$ parameters.\n",
"\n",
"Now, suppose we compute\n",
"$$\n",
"f(p) = g(\\underbrace{A(p)^{-1} b}_{x(p)})\n",
"$$\n",
"where $g(x) = x^T G x$, $b \\in \\mathbb{R}^m$ is some vector, and $G = G^T$ is some symmetric $m \\times m$ matrix.\n",
"\n",
"**(a)** What is $\\nabla f$? Explain how you can compute *all m* components of $\\nabla f$ by solving only *two* $m \\times m$ systems followed by $\\Theta(m)$ additional work in *total*.\n",
"\n",
"**(b)** Implement your solution from (a) by filling in the function `∇f(p)` below, and check that it correctly predicts $df = f(p + dp) - f(p)$ (*approximately*) for a random small $dp$."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ac419822",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-25.41770831865196"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using LinearAlgebra\n",
"\n",
"# some A₀, b, G for m=5\n",
"m = 5\n",
"A₀ = randn(m,m)\n",
"b = randn(m)\n",
"G = randn(m,m); G = G' + G\n",
"\n",
"# our functions\n",
"g(x) = x' * G * x\n",
"f(p) = g((A₀ + Diagonal(p)) \\ b)\n",
"\n",
"p = randn(m) # some random parameters\n",
"f(p) # make sure it gives a number out"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12609b9b",
"metadata": {},
"outputs": [],
"source": [
"# your solution to (b):\n",
"\n",
"function ∇f(p)\n",
" ????\n",
"end\n",
"\n",
"dp = randn(m) * 1e-8 # a random small dp\n",
"df = f(p + dp) - f(p)\n",
"\n",
"# check: ∇f approximately predicts df\n",
"# ???"
]
},
{
"cell_type": "markdown",
"id": "84c4326e",
"metadata": {},
"source": [
"## Problem 4\n",
"\n",
"In class, we considered steepest descent for $f(x) = \\kappa x_1^2 + x_2^2$ in $\\mathbb{R}^2$, and argued that for an arbitrary $x = [x_1,x_2]$ starting point and $\\kappa \\gg 1$, the steepest-descent step $x \\leftarrow x - sz$ is *approximately* $sz \\approx [x_1, x_2/\\kappa]$.\n",
"\n",
"**(a)** If $sz = [x_1, x_2/\\kappa]$ exactly, then on the next step we would have the new $x \\leftarrow x - sx = [0, (1-\\frac{1}{\\kappa} x_2)]$. However, explain why a more careful calculation shows that the new $x - sx \\approx [O(1/\\kappa), (1-\\frac{1}{\\kappa} x_2)]$, i.e. the first component is proportional to $1/\\kappa$ to leading order in $1/\\kappa$.\n",
"\n",
"**(b)** If you start with an $x = [\\#/\\kappa, x_2]$, i.e. where the first component is proportional to $1/\\kappa$ and $\\#$ is some number of the same order of magnitude as $x_2$, show that after one steepest-descent step (for $\\kappa \\gg 1$) the $x_1$ component is *still* roughly order $1/\\kappa$ but of the opposite sign, and $x_2$ again subtracts a term roughly proportional to $x_2/\\kappa$.\n",
"\n",
"**(c)** Implement this steepest-descent process numerically for $\\kappa = 100$ and a starting point $x_1 = 0.01234567$ and $x_2 = 0.8910$. Plot $100x_1$ and $x_2$ for 100 iterations of steepest descent. A more careful analysis would show a convergence proportional to $\\left( \\frac{\\kappa - 1}{\\kappa + 1} \\right)^k$, where $k$ is the iteration number, following equation (4) in the Strang book — include this function for comparison on your plot."
]
},
{
"cell_type": "markdown",
"id": "9d19b287",
"metadata": {},
"source": [
"## Problems 5, 6, … coming soon"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ac54afe",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.8.3",
"language": "julia",
"name": "julia-1.8"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}