link

2024-09-30 09:42:46 -04:00 · 2024-09-30 09:42:46 -04:00 · 1898b763a6
commit 1898b763a6
parent 0a0e166616
1 changed files with 3 additions and 1 deletions
--- a/notes/Least-Square
+++ b/notes/Least-Square
@ -1386,6 +1386,8 @@
    "* **Training** data is what we use to form our fit/model — it goes into $A$ and $b$ in a least-square fit.  This is usually *most* of the data\n",
    "* **Test** data is a subset of the data used to *check* whether the fit is actually doing a good job on the underlying problem.    Sometimes, this is further subdivided into \"validation\" data that is used *during* training while tuning hyperparameters like the degree of polynomial being fitted, versus \"test\" data that is used as a final check after everything is done.\n",
    "\n",
+    "Similar concepts are known by many different names.  The ability of a model to predict previously unseen test data is sometimes called [\"generalizability\"](https://en.wikipedia.org/wiki/Generalization_error).  Splitting the data into training and test/validation data is also called [\"cross-validation\"](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).\n",
+    "\n",
    "Here, let's consider the same problem as above: 50 data points from a degree-3 polynomial $1 + 2a + 3a^2 + 4a^3$ plus noise, but\n",
    "* We'll use a random subset of **20% of the data for testing**, with the remaining 80% for training/fitting.\n",
    "* For each degree $n$, we'll do a least-squares fit on the training data, and then evaluate the **root-mean-square error on the test data**."
@ -1483,7 +1485,7 @@
    "\n",
    "* As we increase the degree (i.e. add more fit parameters), the **error on the training data decreases**.   We can \"fit all the wiggles\" — if you have enough parameters [\"you can fit an elephant\"](https://en.wikipedia.org/wiki/Von_Neumann%27s_elephant).\n",
    "* However, as the degree increases beyond 3 (the \"ground truth\" underlying model), the **error on the training data stops decreasing** and soon **begins to increase**.   By \"fitting the wiggles\" of the training data, we have begun to \"overfit\" the problem, and the fit actually gets *worse* in between the training points.\n",
-    "* The test error is rather fragile, and susceptible to large random fluctuations if we repeat our experiment.  (Though these fluctuations get smaller if we generate more data.)   **Precisely detecting when overfitting begins can be difficult**.\n",
+    "* The test error is rather fragile, and susceptible to large random fluctuations if we repeat our experiment.  (Though these fluctuations get smaller if we generate more data.  Many cross-validation methods proceed through multiple \"rounds\" in which different subsets are used as training and test data.)   **Precisely detecting when overfitting begins can be difficult**.\n",
    "* Overfitting is especially likely **when the number of parameters becomes comparable to the size of the training set**.   Here, we see significant overfitting even when the number of parameters is 1/4 the size of the training set!  (This is a huge challenge for neural-network models, which often have a vast number of parameters.)"
   ]
  },