pset 5

2023-04-09 16:44:24 -04:00 · 2023-04-09 16:44:24 -04:00 · 3f10441047
commit 3f10441047
parent 950b335116
2 changed files with 161 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -291,7 +291,7 @@ Described the KKT conditions for a (local) optimum/extremum (Boyd, section 5.5.3
 * Quick review of augmented Lagrangians and ADMM from last lecture. Indicator function example from section III.4.
 * CCSA interior-point algorithm
 * [pset 4 solutions](psets/pset4sol.ipynb)
-* pset 5: coming soon, due 4/21
+* [pset 5](psets/pset5.ipynb): due 4/21

 Went over very different example of a nonlinear optimization scheme, solving a fairly general inequality-constrained nonlinear-programming problem: the CCSA algorithm(s), as described by Svanberg (2002). This is a surprisingly simple algorithm (the [NLopt](http://ab-initio.mit.edu/nlopt) implementation is only 300 lines of C code), but is robust and provably convergent, and illustrates a number of important ideas in optimization: optimizing an approximation to update the parameters **x**, guarding the approximation with trust regions and penalty terms, and optimizing via the dual function (Lagrange multipliers). Like many optimization algorithms, the general ideas are very straightforward, but getting the details right can be delicate!

--- a/psets/pset5.ipynb
+++ b/psets/pset5.ipynb
@ -0,0 +1,160 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "fd786915",
+   "metadata": {},
+   "source": [
+    "# 18.065 Problem Set 5\n",
+    "\n",
+    "Due Friday, April 21 at 1pm."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff18eab8",
+   "metadata": {},
+   "source": [
+    "## Problem 1 (5+6 points)\n",
+    "\n",
+    "Consider the following optimization problem:\n",
+    "$$\n",
+    "\\min_{x \\in \\mathbb{R}^2} x_1 \\\\\n",
+    "\\mbox{subject to } x_2 \\le x_1^3 \\mbox{ and } x_2 \\ge 0 \\, .\n",
+    "$$\n",
+    "\n",
+    "**(a)** Draw a sketch of the feasible set in the $x_1, x_2$ plane and indicate the optimum $x_*$.\n",
+    "\n",
+    "**(b)** Show that the optimum $x_*$ does *not* satisfy the KKT conditions, but explain why this is possible because the LICQ conditions are violated (see the last slide of lecture 22).\n",
+    "\n",
+    "(Most problems have local minima that satisfy KKT, but you can see from the picture in (a) that this is a weird case!)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87b4dd84",
+   "metadata": {},
+   "source": [
+    "## Problem 2 (5+6+6 points)\n",
+    "\n",
+    "Consider the convex problem:\n",
+    "$$\n",
+    "\\min_{x\\in \\mathbb{R}^n} \\Vert b - Ax \\Vert_2^2 \\\\\n",
+    "\\mbox{subject to } \\Vert x \\Vert_2^2 \\le r^2\n",
+    "$$\n",
+    "for some $r > 0$, $m \\times n$ matrix $A$ (of rank $n$), and $b \\in \\mathbb{R}^m$ — that is, least-squares optimization with the solution constrained to lie inside a sphere of radius $r$.\n",
+    "\n",
+    "**(a)** What is the Lagrange dual function $g(\\lambda)$?   (You can give a closed-form expression.  Hint: review Tikhonov-regularized least-squares.)   Define a corresponding Julia function `g(λ; r=1.0)` for the sample parameters given below (this syntax defines an optional keyword argument `r` that defaults to $r=1$).\n",
+    "\n",
+    "**(b)** If the unconstrained least-square solution $\\hat{x} = (A^T A)^{-1} A^T b$ satisfies $\\Vert \\hat{x} \\Vert_2 < r$, then what must be true of the derivative $g'(0)$?  What if $\\Vert \\hat{x} \\Vert_2 > r$?\n",
+    "\n",
+    "Check in Julia that $g'(0)$ matches your expectations by computing the derivative using automatic differentiation:\n",
+    "```jl\n",
+    "using ForwardDiff\n",
+    "dgdλ(λ; r=1.0) = ForwardDiff.derivative(λ -> g(λ; r=r), λ)\n",
+    "```\n",
+    "and evaluating it at `dgdλ(0; r=???)` for two values of `r`: one $r > \\Vert \\hat{x} \\Vert_2$ (so that the constraint is inactive) and one $r < \\Vert \\hat{x} \\Vert_2$ (so that the constraint is active).\n",
+    "\n",
+    "(In principle, you could take this derivative by hand using matrix calculus, but it's pretty error-prone.)\n",
+    "\n",
+    "**(c)** You can take the *second* derivative of $g(\\lambda)$ via AD by:\n",
+    "```jl\n",
+    "d²gdλ²(λ; r=0.5) = ForwardDiff.derivative(λ -> dgdλ(λ; r=0.5), λ)\n",
+    "```\n",
+    "Use this to implement a Newton iteration to maximize $g(\\lambda)$ by finding a root of $g'(\\lambda)$, starting with an initial guess of $\\lambda=0$, for $r = 0.5$.  (It should converge in only a few iterations.)   To at least 8 significant digits, give the resulting dual optimum $\\lambda_*$ and the primal optimum $x_*$ (strong duality holds in this convex problem!), and check that $x_*$ is feasible."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "e95087fb",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.5"
+      ]
+     },
+     "execution_count": 38,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "using LinearAlgebra\n",
+    "\n",
+    "m = 5\n",
+    "n = 4\n",
+    "A = [ -9   2  -2   3\n",
+    "      -5  -3   9   3\n",
+    "      -1  -6   9  -2\n",
+    "      -3  -4   5   4\n",
+    "      -8   9  -6   4 ]\n",
+    "b = [1,2,3,4,5]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "id": "cb117722",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "g′ (generic function with 2 methods)"
+      ]
+     },
+     "execution_count": 52,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "g(λ; r=1) = ???"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba18e276",
+   "metadata": {},
+   "source": [
+    "## Problem 3 (5+5+6 points)\n",
+    "\n",
+    "Solve the optimization problem from 2c above, with the same parameters, by implementing ADMM for the problem\n",
+    "$$\n",
+    "\\min_{x \\in \\mathbb{R}^n} \\left( \\Vert b - Ax \\Vert_2^2 + \\begin{cases} 0 & \\Vert x \\Vert_2 \\le r \\\\ \\infty & \\mbox{otherwise} \\end{cases} \\right)\n",
+    "$$\n",
+    "where the second term is the \"indicator\" function of the feasible set (the radius-$r$ ball) as in lecture 24 and section III.4 of the text.\n",
+    "\n",
+    "A basic iteration of ADMM consists of 3 steps, as described in class and in the textbook:\n",
+    "\n",
+    "1. $x^{(k+1)} = \\mbox{arg }\\min_x \\Vert b - Ax \\Vert_2^2  + \\frac{\\rho}{2} \\Vert x - z^{(k)} + s^{(k)} \\Vert_2^2$ (for some penalty parameter $\\rho > 0$)\n",
+    "2. $z^{(k+1)}$ is the projection of $x^{(k+1)} + s^{(k)}$ onto the *closest* point in the feasible set.\n",
+    "3. $s^{(k+1)} = s^{(k)} + x^{(k+1)} - z^{(k+1)}$\n",
+    "\n",
+    "**(a)** Give a closed-form solution for step 1.  (Hint: a problem from pset 3 should be helpful.)\n",
+    "\n",
+    "**(b)** Give a closed-form solution for step 2.\n",
+    "\n",
+    "**(c)** Implement this iteration in Julia to solve problem 2c, starting from $x = z = s = \\vec{0}$.   Make a (semi-log) plot of the error $\\Vert x^{(k)} - x_* \\Vert_2$ versus $k$, where $x_*$ is your solution from 2c, for $\\rho = 1$ and $\\rho = 10$.  (The error should converge to zero!)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Julia 1.8.0",
+   "language": "julia",
+   "name": "julia-1.8"
+  },
+  "language_info": {
+   "file_extension": ".jl",
+   "mimetype": "application/julia",
+   "name": "julia",
+   "version": "1.8.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}