{ "hash": "cd250ccf2ec42075c4075145af124670", "result": { "markdown": "# Newton's method\n\n\n\nThis section uses these add-on packages:\n\n``` {.julia .cell-code}\nusing CalculusWithJulia\nusing Plots\nusing SymPy\nusing Roots\n```\n\n\n\n\n---\n\n\nThe Babylonian method is an algorithm to find an approximate value for $\\sqrt{k}$. It was described by the first-century Greek mathematician Hero of [Alexandria](http://en.wikipedia.org/wiki/Babylonian_method).\n\n\nThe method starts with some initial guess, called $x_0$. It then applies a formula to produce an improved guess. This is repeated until the improved guess is accurate enough or it is clear the algorithm fails to work.\n\n\nFor the Babylonian method, the next guess, $x_{i+1}$, is derived from the current guess, $x_i$. In mathematical notation, this is the updating step:\n\n\n\n$$\nx_{i+1} = \\frac{1}{2}(x_i + \\frac{k}{x_i})\n$$\n\n\nWe use this algorithm to approximate the square root of $2$, a value known to the Babylonians.\n\n\nStart with $x$, then form $x/2 + 1/x$, from this again form $x/2 + 1/x$, repeat.\n\n\nWe represent this step using a function\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nbabylon(x) = x/2 + 1/x\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```\nbabylon (generic function with 1 method)\n```\n:::\n:::\n\n\nLet's look starting with $x = 2$ as a rational number:\n\n::: {.cell hold='true' execution_count=5}\n``` {.julia .cell-code}\nx₁ = babylon(2//1)\nx₁, x₁^2.0\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```\n(3//2, 2.25)\n```\n:::\n:::\n\n\nOur estimate improved from something which squared to $4$ down to something which squares to $2.25.$ A big improvement, but there is still more to come. Had we done one more step:\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\nx₂ = (babylon ∘ babylon)(2//1)\nx₂, x₂^2.0\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```\n(17//12, 2.0069444444444446)\n```\n:::\n:::\n\n\nWe now see accuracy until the third decimal point.\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nx₃ = (babylon ∘ babylon ∘ babylon)(2//1)\nx₃, x₃^2.0\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```\n(577//408, 2.000006007304883)\n```\n:::\n:::\n\n\nThis is now accurate to the sixth decimal point. That is about as far as we, or the Bablyonians, would want to go by hand. Using rational numbers quickly grows out of hand. The next step shows the explosion.\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\nreduce((x,step) -> babylon(x), 1:4, init=2//1)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```\n665857//470832\n```\n:::\n:::\n\n\n(In the above, we used `reduce` to repeat a function call $4$ times, as an alternative to the composition operation. In this section we show a few styles to do this repetition before introducing a packaged function.)\n\n\nHowever, with the advent of floating point numbers, the method stays quite manageable:\n\n::: {.cell hold='true' execution_count=9}\n``` {.julia .cell-code}\nxₙ = reduce((x, step) -> babylon(x), 1:6, init=2.0)\nxₙ, xₙ^2\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```\n(1.414213562373095, 1.9999999999999996)\n```\n:::\n:::\n\n\nWe can see that the algorithm - to the precision offered by floating point numbers - has resulted in an answer `1.414213562373095`. This answer is an *approximation* to the actual answer. Approximation is necessary, as $\\sqrt{2}$ is an irrational number and so can never be exactly represented in floating point. That being said, we can see that the value of $f(x)$ is accurate to the last decimal place, so our approximation is very close and is achieved in a few steps.\n\n\n## Newton's generalization\n\n\nLet $f(x) = x^3 - 2x -5$. The value of $2$ is almost a zero, but not quite, as $f(2) = -1$. We can check that there are no *rational* roots. Though there is a method to solve the cubic it may be difficult to compute and will not be as generally applicable as some algorithm like the Babylonian method to produce an approximate answer.\n\n\nIs there some generalization to the Babylonian method?\n\n\nWe know that the tangent line is a good approximation to the function at the point. Looking at this graph gives a hint as to an algorithm:\n\n::: {.cell hold='true' execution_count=10}\n\n::: {.cell-output .cell-output-display execution_count=11}\n{}\n:::\n:::\n\n\nThe tangent line and the function nearly agree near $2$. So much so, that the intersection point of the tangent line with the $x$ axis nearly hides the actual zero of $f(x)$ that is near $2.1$.\n\n\nThat is, it seems that the intersection of the tangent line and the $x$ axis should be an improved approximation for the zero of the function.\n\n\nLet $x_0$ be $2$, and $x_1$ be the intersection point of the tangent line at $(x_0, f(x_0))$ with the $x$ axis. Then by the definition of the tangent line:\n\n\n\n$$\nf'(x_0) = \\frac{\\Delta y }{\\Delta x} = \\frac{f(x_0)}{x_0 - x_1}.\n$$\n\n\nThis can be solved for $x_1$ to give $x_1 = x_0 - f(x_0)/f'(x_0)$. In general, if we had $x_i$ and used the intersection point of the tangent line to produce $x_{i+1}$ we would have Newton's method:\n\n\n\n$$\nx_{i+1} = x_i - \\frac{f(x_i)}{f'(x_i)}.\n$$\n\n\nUsing automatic derivatives, as brought in with the `CalculusWithJulia` package, we can implement this algorithm.\n\n\nThe algorithm above starts at $2$ and then becomes:\n\n::: {.cell execution_count=11}\n``` {.julia .cell-code}\nf(x) = x^3 - 2x - 5\nx0 = 2.0\nx1 = x0 - f(x0) / f'(x0)\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```\n2.1\n```\n:::\n:::\n\n\nWe can see we are closer to a zero:\n\n::: {.cell execution_count=12}\n``` {.julia .cell-code}\nf(x0), f(x1)\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```\n(-1.0, 0.06100000000000083)\n```\n:::\n:::\n\n\nTrying again, we have\n\n::: {.cell execution_count=13}\n``` {.julia .cell-code}\nx2 = x1 - f(x1)/ f'(x1)\nx2, f(x2), f(x1)\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```\n(2.094568121104185, 0.00018572317327247845, 0.06100000000000083)\n```\n:::\n:::\n\n\nAnd again:\n\n::: {.cell execution_count=14}\n``` {.julia .cell-code}\nx3 = x2 - f(x2)/ f'(x2)\nx3, f(x3), f(x2)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```\n(2.094551481698199, 1.7397612239733462e-9, 0.00018572317327247845)\n```\n:::\n:::\n\n\n::: {.cell execution_count=15}\n``` {.julia .cell-code}\nx4 = x3 - f(x3)/ f'(x3)\nx4, f(x4), f(x3)\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```\n(2.0945514815423265, -8.881784197001252e-16, 1.7397612239733462e-9)\n```\n:::\n:::\n\n\nWe see now that $f(x_4)$ is within machine tolerance of $0$, so we call $x_4$ an *approximate zero* of $f(x)$.\n\n\n> **Newton's method:** Let $x_0$ be an initial guess for a zero of $f(x)$. Iteratively define $x_{i+1}$ in terms of the just generated $x_i$ by:\n>\n> $$\n> x_{i+1} = x_i - f(x_i) / f'(x_i).\n> $$\n>\n> Then for reasonable functions and reasonable initial guesses, the sequence of points converges to a zero of $f$.\n\n\n\nOn the computer, we know that actual convergence will likely never occur, but accuracy to a certain tolerance can often be achieved.\n\n\nIn the example above, we kept track of the previous values. This is unnecessary if only the answer is sought. In that case, the update step could use the same variable. Here we use `reduce`:\n\n::: {.cell hold='true' execution_count=16}\n``` {.julia .cell-code}\nxₙ = reduce((x, step) -> x - f(x)/f'(x), 1:4, init=2)\nxₙ, f(xₙ)\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```\n(2.0945514815423265, -8.881784197001252e-16)\n```\n:::\n:::\n\n\nIn practice, the algorithm is implemented not by repeating the update step a fixed number of times, rather by repeating the step until either we converge or it is clear we won't converge. For good guesses and most functions, convergence happens quickly.\n\n\n:::{.callout-note}\n## Note\nNewton looked at this same example in 1699 (B.T. Polyak, *Newton's method and its use in optimization*, European Journal of Operational Research. 02/2007; 181(3):1086-1096.) though his technique was slightly different as he did not use the derivative, *per se*, but rather an approximation based on the fact that his function was a polynomial (though identical to the derivative). Raphson (1690) proposed the general form, hence the usual name of the Newton-Raphson method.\n\n:::\n\n#### Examples\n\n\n##### Example: visualizing convergence\n\n\nThis graphic demonstrates the method and the rapid convergence:\n\n\n\n::: {.cell cache='true' hold='true' execution_count=18}\n\n::: {.cell-output .cell-output-display execution_count=19}\n```{=html}\n
Illustration of Newton's Method converging to a zero of a function.
\nIllustration of Newton's Method converging to a zero of a function, but slowly as the initial guess, is very poor, and not close to the zero. The algorithm does converge in this illustration, but not quickly and not to the nearest root from the initial guess.
\nIllustration of Newton's method failing to coverge as for some \\(x_i\\), \\(f'(x_i)\\) is too close to \\(0\\). In this instance after a few steps, the algorithm just cycles around the local minimum near \\(0.66\\). The values of \\(x_i\\) repeat in the pattern: \\(1.0002, 0.7503, -0.0833, 1.0002, \\dots\\). This is also an illustration of a poor initial guess. If there is a local minimum or maximum between the guess and the zero, such cycles can occur.
\nIllustration of Newton's Method not converging. Here the second derivative is too big near the zero - it blows up near \\(0\\) - and the convergence does not occur. Rather the iterates increase in their distance from the zero.
\nThe function \\(f(x) = x^{20} - 1\\) has two bad behaviours for Newton's method: for \\(x < 1\\) the derivative is nearly \\(0\\) and for \\(x>1\\) the second derivative is very big. In this illustration, we have an initial guess of \\(x_0=8/9\\). As the tangent line is fairly flat, the next approximation is far away, \\(x_1 = 1.313\\dots\\). As this guess is is much bigger than \\(1\\), the ratio \\(f(x)/f'(x) \\approx x^{20}/(20x^{19}) = x/20\\), so \\(x_i - x_{i-1} \\approx (19/20)x_i\\) yielding slow, linear convergence until \\(f''(x_i)\\) is moderate. For this function, starting at \\(x_0=8/9\\) takes 11 steps, at \\(x_0=7/8\\) takes 13 steps, at \\(x_0=3/4\\) takes \\(55\\) steps, and at \\(x_0=1/2\\) it takes \\(204\\) steps.
\n