From 1898b763a636f9d9b89a31400019e79720941dc7 Mon Sep 17 00:00:00 2001 From: "Steven G. Johnson" Date: Mon, 30 Sep 2024 09:42:46 -0400 Subject: [PATCH] link --- notes/Least-Square Fitting.ipynb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/notes/Least-Square Fitting.ipynb b/notes/Least-Square Fitting.ipynb index a100ea4..fa520eb 100644 --- a/notes/Least-Square Fitting.ipynb +++ b/notes/Least-Square Fitting.ipynb @@ -1386,6 +1386,8 @@ "* **Training** data is what we use to form our fit/model — it goes into $A$ and $b$ in a least-square fit. This is usually *most* of the data\n", "* **Test** data is a subset of the data used to *check* whether the fit is actually doing a good job on the underlying problem. Sometimes, this is further subdivided into \"validation\" data that is used *during* training while tuning hyperparameters like the degree of polynomial being fitted, versus \"test\" data that is used as a final check after everything is done.\n", "\n", + "Similar concepts are known by many different names. The ability of a model to predict previously unseen test data is sometimes called [\"generalizability\"](https://en.wikipedia.org/wiki/Generalization_error). Splitting the data into training and test/validation data is also called [\"cross-validation\"](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).\n", + "\n", "Here, let's consider the same problem as above: 50 data points from a degree-3 polynomial $1 + 2a + 3a^2 + 4a^3$ plus noise, but\n", "* We'll use a random subset of **20% of the data for testing**, with the remaining 80% for training/fitting.\n", "* For each degree $n$, we'll do a least-squares fit on the training data, and then evaluate the **root-mean-square error on the test data**." @@ -1483,7 +1485,7 @@ "\n", "* As we increase the degree (i.e. add more fit parameters), the **error on the training data decreases**. We can \"fit all the wiggles\" — if you have enough parameters [\"you can fit an elephant\"](https://en.wikipedia.org/wiki/Von_Neumann%27s_elephant).\n", "* However, as the degree increases beyond 3 (the \"ground truth\" underlying model), the **error on the training data stops decreasing** and soon **begins to increase**. By \"fitting the wiggles\" of the training data, we have begun to \"overfit\" the problem, and the fit actually gets *worse* in between the training points.\n", - "* The test error is rather fragile, and susceptible to large random fluctuations if we repeat our experiment. (Though these fluctuations get smaller if we generate more data.) **Precisely detecting when overfitting begins can be difficult**.\n", + "* The test error is rather fragile, and susceptible to large random fluctuations if we repeat our experiment. (Though these fluctuations get smaller if we generate more data. Many cross-validation methods proceed through multiple \"rounds\" in which different subsets are used as training and test data.) **Precisely detecting when overfitting begins can be difficult**.\n", "* Overfitting is especially likely **when the number of parameters becomes comparable to the size of the training set**. Here, we see significant overfitting even when the number of parameters is 1/4 the size of the training set! (This is a huge challenge for neural-network models, which often have a vast number of parameters.)" ] },