add pset 4
This commit is contained in:
parent
26c3a4eb4d
commit
03cdaaf8db
@ -234,7 +234,7 @@ Derivatives viewed as linear approximations have many important applications in
|
||||
|
||||
* d(A⁻¹), adjoint problems, chain rules, and forward vs reverse mode (backpropagation) derivatives
|
||||
* [pset 3 solutions](psets/pset3sol.ipynb)
|
||||
* pset 4: coming soon, due 4/7
|
||||
* [pset 4](psets/pset4.ipynb), due 4/7
|
||||
|
||||
Showed that d(A⁻¹)=–A⁻¹ dA A⁻¹. (For example, if you have a matrix A(t) that depends on a scalar t∈ℝ, this tells you that d(A⁻¹)/dt = –A⁻¹ dA/dt A⁻¹.) Applied this to computing ∇ₚf for scalar functions f(A⁻¹b) where A(p) depends on some parameters p∈ℝⁿ, and showed that ∂f/∂pᵢ = -(∇ₓf)ᵀA⁻¹(∂A/∂pᵢ)x. Moreover, showed that it is critically important to evaluate this expression from *left to right*, i.e. first computing vᵀ=-(∇ₓf)ᵀA⁻¹m, which corresponds to solving the "adjoint (transposed) equation" Aᵀv=–∇ₓf. By doing so, you can compute both f and ∇ₚf at the cost of only "two solves" with matrices A and Aᵀ. This is critically important in enabling engineering/physics optimization, where you often want to optimize a property of the solution A⁻¹b of a physical model A depending on many design variables p (e.g. the distribution of materials in space).
|
||||
|
||||
|
169
psets/pset4.ipynb
Normal file
169
psets/pset4.ipynb
Normal file
@ -0,0 +1,169 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73fdabd6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 18.065 Problem Set 4\n",
|
||||
"\n",
|
||||
"Due Friday 4/7 at 1pm."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0fb69bf7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem 1\n",
|
||||
"\n",
|
||||
"A \"convex set\" $S$ is one such that for any two points $x,y$ in $S$, the straight line connecting them is also in $S$. That is, $\\alpha x + (1-\\alpha)y \\in S$ for all $\\alpha \\in [0,1]$.\n",
|
||||
"\n",
|
||||
"**(a)** If $f_i(x)$ is a [convex function](https://en.wikipedia.org/wiki/Convex_function) (as defined in e.g. lecture 16), explain why the constraint $f_i(x) \\le 0$ defines a convex set of feasible points.\n",
|
||||
"\n",
|
||||
"**(b)** Explain why the intersection of two convex sets is a convex set. (Hence, the feasible set for many convex constraints is a convex set.)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "188e8b83",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem 2\n",
|
||||
"\n",
|
||||
"Suppose we are solving the $\\ell^1$-regularized least-square problem:\n",
|
||||
"$$\n",
|
||||
"\\min_{x\\in \\mathbb{R}^n} \\left( \\Vert b - Ax \\Vert_2^2 + \\lambda \\Vert x \\Vert_1 \\right)\n",
|
||||
"$$\n",
|
||||
"where $\\lambda > 0$ is some regularization parameter and $ \\Vert x \\Vert_1 = \\sum_i |x_i|$ is the $\\ell^1$ norm.\n",
|
||||
"\n",
|
||||
"Similar to the examples in 16, show that this can be converted into a an equivalent [quadratic programming (QP)](https://en.wikipedia.org/wiki/Quadratic_programming) problem — a convex quadratic objective with affine constraints — by introducing one or more dummy variables (an \"epigraph\" formulation). This replaces the non-differentiable $\\ell^1$ problem with an equivalent *differentiable* problem with *differentiable constraints*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73427817",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem 3\n",
|
||||
"\n",
|
||||
"Let\n",
|
||||
"$$\n",
|
||||
"A(p) = A_0 + \\underbrace{\\begin{pmatrix} p_1 & & & \\\\ & p_2 & & \\\\ & & \\ddots & \\\\ & & & p_m \\end{pmatrix}}_{\\text{diagm(p)}}\n",
|
||||
"$$\n",
|
||||
"where $A_0$ is some $m \\times m$ matrix and $p \\in \\mathbb{R}^m$ are $m$ parameters.\n",
|
||||
"\n",
|
||||
"Now, suppose we compute\n",
|
||||
"$$\n",
|
||||
"f(p) = g(\\underbrace{A(p)^{-1} b}_{x(p)})\n",
|
||||
"$$\n",
|
||||
"where $g(x) = x^T G x$, $b \\in \\mathbb{R}^m$ is some vector, and $G = G^T$ is some symmetric $m \\times m$ matrix.\n",
|
||||
"\n",
|
||||
"**(a)** What is $\\nabla f$? Explain how you can compute *all m* components of $\\nabla f$ by solving only *two* $m \\times m$ systems followed by $\\Theta(m)$ additional work in *total*.\n",
|
||||
"\n",
|
||||
"**(b)** Implement your solution from (a) by filling in the function `∇f(p)` below, and check that it correctly predicts $df = f(p + dp) - f(p)$ (*approximately*) for a random small $dp$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "ac419822",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"-25.41770831865196"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"using LinearAlgebra\n",
|
||||
"\n",
|
||||
"# some A₀, b, G for m=5\n",
|
||||
"m = 5\n",
|
||||
"A₀ = randn(m,m)\n",
|
||||
"b = randn(m)\n",
|
||||
"G = randn(m,m); G = G' + G\n",
|
||||
"\n",
|
||||
"# our functions\n",
|
||||
"g(x) = x' * G * x\n",
|
||||
"f(p) = g((A₀ + Diagonal(p)) \\ b)\n",
|
||||
"\n",
|
||||
"p = randn(m) # some random parameters\n",
|
||||
"f(p) # make sure it gives a number out"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "12609b9b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# your solution to (b):\n",
|
||||
"\n",
|
||||
"function ∇f(p)\n",
|
||||
" ????\n",
|
||||
"end\n",
|
||||
"\n",
|
||||
"dp = randn(m) * 1e-8 # a random small dp\n",
|
||||
"df = f(p + dp) - f(p)\n",
|
||||
"\n",
|
||||
"# check: ∇f approximately predicts df\n",
|
||||
"# ???"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "84c4326e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem 4\n",
|
||||
"\n",
|
||||
"In class, we considered steepest descent for $f(x) = \\kappa x_1^2 + x_2^2$ in $\\mathbb{R}^2$, and argued that for an arbitrary $x = [x_1,x_2]$ starting point and $\\kappa \\gg 1$, the steepest-descent step $x \\leftarrow x - sz$ is *approximately* $sz \\approx [x_1, x_2/\\kappa]$.\n",
|
||||
"\n",
|
||||
"**(a)** If $sz = [x_1, x_2/\\kappa]$ exactly, then on the next step we would have the new $x \\leftarrow x - sx = [0, (1-\\frac{1}{\\kappa} x_2)]$. However, explain why a more careful calculation shows that the new $x - sx \\approx [O(1/\\kappa), (1-\\frac{1}{\\kappa} x_2)]$, i.e. the first component is proportional to $1/\\kappa$ to leading order in $1/\\kappa$.\n",
|
||||
"\n",
|
||||
"**(b)** If you start with an $x = [\\#/\\kappa, x_2]$, i.e. where the first component is proportional to $1/\\kappa$ and $\\#$ is some number of the same order of magnitude as $x_2$, show that after one steepest-descent step (for $\\kappa \\gg 1$) the $x_1$ component is *still* roughly order $1/\\kappa$ but of the opposite sign, and $x_2$ again subtracts a term roughly proportional to $x_2/\\kappa$.\n",
|
||||
"\n",
|
||||
"**(c)** Implement this steepest-descent process numerically for $\\kappa = 100$ and a starting point $x_1 = 0.01234567$ and $x_2 = 0.8910$. Plot $100x_1$ and $x_2$ for 100 iterations of steepest descent. A more careful analysis would show a convergence proportional to $\\left( \\frac{\\kappa - 1}{\\kappa + 1} \\right)^k$, where $k$ is the iteration number, following equation (4) in the Strang book — include this function for comparison on your plot."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9d19b287",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problems 5, 6, … coming soon"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6ac54afe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Julia 1.8.3",
|
||||
"language": "julia",
|
||||
"name": "julia-1.8"
|
||||
},
|
||||
"language_info": {
|
||||
"file_extension": ".jl",
|
||||
"mimetype": "application/julia",
|
||||
"name": "julia",
|
||||
"version": "1.8.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
Loading…
x
Reference in New Issue
Block a user