diff --git a/09_tabular.ipynb b/09_tabular.ipynb
index e34984b..4352039 100644
--- a/09_tabular.ipynb
+++ b/09_tabular.ipynb
@@ -6,7 +6,15 @@
"metadata": {
"hide_input": true
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/sgugger/.kaggle/kaggle.json'\n"
+ ]
+ }
+ ],
"source": [
"#hide\n",
"from utils import *\n",
@@ -40,7 +48,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Tabular modelling takes data in the form of a table (like a spreadsheet or CSV--comma separated values). The objective is to predict the value in one column, based on the values in the other columns."
+ "Tabular modelling takes data in the form of a table (like a spreadsheet or CSV--comma separated values). The objective is to predict the value in one column, based on the values in the other columns. In this chapter we will not only look at deep learning but also more general machine learning techniques like random forests, as they can give better results depending on your problem.\n",
+ "\n",
+ "We will look at how we should preprocess and clean the data, how to interpret the result of our models after training, but first, we will see how we can feed columns that contain categories into a model that espects numbers by using embeddings."
]
},
{
@@ -93,7 +103,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "
"
+ "
"
]
},
{
@@ -111,7 +121,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "
"
+ "
"
]
},
{
@@ -127,7 +137,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "
"
+ "
"
]
},
{
@@ -141,7 +151,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "
"
+ "
"
]
},
{
@@ -314,7 +324,7 @@
{
"data": {
"text/plain": [
- "Path('/home/jhoward/.fastai/archive/bluebook')"
+ "Path('/home/sgugger/.fastai/archive/bluebook')"
]
},
"execution_count": null,
@@ -352,7 +362,7 @@
{
"data": {
"text/plain": [
- "(#7) [Path('TrainAndValid.csv'),Path('Machine_Appendix.csv'),Path('random_forest_benchmark_test.csv'),Path('Test.csv'),Path('median_benchmark.csv'),Path('ValidSolution.csv'),Path('Valid.csv')]"
+ "(#7) [Path('Valid.csv'),Path('Machine_Appendix.csv'),Path('ValidSolution.csv'),Path('TrainAndValid.csv'),Path('random_forest_benchmark_test.csv'),Path('Test.csv'),Path('median_benchmark.csv')]"
]
},
"execution_count": null,
@@ -539,10 +549,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Decision tree ensembles, as the name suggests, rely on decision trees. So let's start there! A decision tree asks a series of binary (that is, yes or no) questions about the data. After each question the data at that part of the tree is split between a \"yes\" and a \"no\" branch. After one or more questions, either a prediction can be made on the basis of all previous answers or another question is required.\n",
- "\n",
- "TK: Adding a figure here might be useful\n",
- "\n",
+ "Decision tree ensembles, as the name suggests, rely on decision trees. So let's start there! A decision tree asks a series of binary (that is, yes or no) questions about the data. After each question the data at that part of the tree is split between a \"yes\" and a \"no\" branch as shown in <>. After one or more questions, either a prediction can be made on the basis of all previous answers or another question is required."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
"This sequence of questions is now a procedure for taking any data item, whether an item from the training set or a new one, and assigning that item to a group. Namely, after asking and answering the questions, we can say the item belongs to the group of all the other training data items which yielded the same set of answers to the questions. But what good is this? the goal of our model is to predict values for items, not to assign them into groups from the training dataset. The value of this is that we can now assign a prediction value for each of these groups--for regression, we take the target mean of the items in the group.\n",
"\n",
"Let's consider how we find the right questions to ask. Of course, we wouldn't want to have to create all these questions ourselves — that's what computers are for! The basic steps to train a decision tree can be written down very easily:\n",
@@ -763,7 +783,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "We can see that the data still is displayed as strings for categories..."
+ "We can see that the data still is displayed as strings for categories (we only show a few columns because the fulltable is too big to fit on a page)..."
]
},
{
@@ -828,20 +848,8 @@
" saleIs_quarter_start | \n",
" saleIs_year_end | \n",
" saleIs_year_start | \n",
- " SalesID_na | \n",
- " MachineID_na | \n",
- " ModelID_na | \n",
- " datasource_na | \n",
" auctioneerID_na | \n",
- " YearMade_na | \n",
" MachineHoursCurrentMeter_na | \n",
- " saleYear_na | \n",
- " saleMonth_na | \n",
- " saleWeek_na | \n",
- " saleDay_na | \n",
- " saleDayofweek_na | \n",
- " saleDayofyear_na | \n",
- " saleElapsed_na | \n",
" SalesID | \n",
" MachineID | \n",
" ModelID | \n",
@@ -914,32 +922,20 @@
" False | \n",
" False | \n",
" False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " 1139246 | \n",
- " 999089 | \n",
- " 3157 | \n",
- " 121 | \n",
+ " 1139246.0 | \n",
+ " 999089.0 | \n",
+ " 3157.0 | \n",
+ " 121.0 | \n",
" 3.0 | \n",
- " 2004 | \n",
+ " 2004.0 | \n",
" 68.0 | \n",
- " 2006 | \n",
- " 11 | \n",
- " 46 | \n",
- " 16 | \n",
- " 3 | \n",
- " 320 | \n",
- " 1163635200 | \n",
+ " 2006.0 | \n",
+ " 11.0 | \n",
+ " 46.0 | \n",
+ " 16.0 | \n",
+ " 3.0 | \n",
+ " 320.0 | \n",
+ " 1.163635e+09 | \n",
" 11.097410 | \n",
" \n",
" \n",
@@ -996,32 +992,20 @@
" False | \n",
" False | \n",
" False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " 1139248 | \n",
- " 117657 | \n",
- " 77 | \n",
- " 121 | \n",
+ " 1139248.0 | \n",
+ " 117657.0 | \n",
+ " 77.0 | \n",
+ " 121.0 | \n",
" 3.0 | \n",
- " 1996 | \n",
+ " 1996.0 | \n",
" 4640.0 | \n",
- " 2004 | \n",
- " 3 | \n",
- " 13 | \n",
- " 26 | \n",
- " 4 | \n",
- " 86 | \n",
- " 1080259200 | \n",
+ " 2004.0 | \n",
+ " 3.0 | \n",
+ " 13.0 | \n",
+ " 26.0 | \n",
+ " 4.0 | \n",
+ " 86.0 | \n",
+ " 1.080259e+09 | \n",
" 10.950807 | \n",
"
\n",
" \n",
@@ -1078,32 +1062,20 @@
" False | \n",
" False | \n",
" False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " 1139249 | \n",
- " 434808 | \n",
- " 7009 | \n",
- " 121 | \n",
+ " 1139249.0 | \n",
+ " 434808.0 | \n",
+ " 7009.0 | \n",
+ " 121.0 | \n",
" 3.0 | \n",
- " 2001 | \n",
+ " 2001.0 | \n",
" 2838.0 | \n",
- " 2004 | \n",
- " 2 | \n",
- " 9 | \n",
- " 26 | \n",
- " 3 | \n",
- " 57 | \n",
- " 1077753600 | \n",
+ " 2004.0 | \n",
+ " 2.0 | \n",
+ " 9.0 | \n",
+ " 26.0 | \n",
+ " 3.0 | \n",
+ " 57.0 | \n",
+ " 1.077754e+09 | \n",
" 9.210340 | \n",
"
\n",
" \n",
@@ -1118,14 +1090,69 @@
}
],
"source": [
+ "#hide_output\n",
"to.show(3)"
]
},
{
- "cell_type": "markdown",
+ "cell_type": "code",
+ "execution_count": null,
"metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " \n",
+ " | \n",
+ " state | \n",
+ " ProductGroup | \n",
+ " Drive_System | \n",
+ " Enclosure | \n",
+ " SalePrice | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Alabama | \n",
+ " WL | \n",
+ " #na# | \n",
+ " EROPS w AC | \n",
+ " 11.097410 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " North Carolina | \n",
+ " WL | \n",
+ " #na# | \n",
+ " EROPS w AC | \n",
+ " 10.950807 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " New York | \n",
+ " SSL | \n",
+ " #na# | \n",
+ " OROPS | \n",
+ " 9.210340 | \n",
+ "
\n",
+ " \n",
+ "
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
"source": [
- "TK too big to fit"
+ "#hide_input\n",
+ "to1 = TabularPandas(df, procs, ['state', 'ProductGroup', 'Drive_System', 'Enclosure'], [], y_names=dep_var, splits=splits)\n",
+ "to1.show(3)"
]
},
{
@@ -1234,9 +1261,85 @@
}
],
"source": [
+ "#hide_output\n",
"to.items.head(3)"
]
},
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " state | \n",
+ " ProductGroup | \n",
+ " Drive_System | \n",
+ " Enclosure | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 6 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 33 | \n",
+ " 6 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 32 | \n",
+ " 3 | \n",
+ " 0 | \n",
+ " 6 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " state ProductGroup Drive_System Enclosure\n",
+ "0 1 6 0 3\n",
+ "1 33 6 0 3\n",
+ "2 32 3 0 6"
+ ]
+ },
+ "execution_count": null,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#hide_input\n",
+ "to1.items[['state', 'ProductGroup', 'Drive_System', 'Enclosure']].head(3)"
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -1502,13 +1605,6 @@
"Returning back to the top node after the first decision point, we can see that a second binary decision split has been made, based on asking whether `YearMade` is less than or equal to 1991.5. For the group where this is true (remember, this is now following two binary decisions, both `coupler_system`, and `YearMade`) the average value is 9.97, and there are 155,724 auction records in this group. For the group of auctions where this decision is false, the average value is 10.4, and there are 205,123 records. So again, we can see that the decision tree algorithm has successfully split our more expensive auction records into two more groups which differ in value significantly."
]
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "**(TK AG: I think it would be useful here to have a figure which showed a circle or blob shape, which is carved by bisecting lines first into two groups, then into three, then four, as new bisections are introduced. This is a valuable intuition which we have not depicted.)**"
- ]
- },
{
"cell_type": "markdown",
"metadata": {},
@@ -8465,14 +8561,9 @@
"- Look for important predictors which don't make sense in practice\n",
"- Look for partial dependence plot results which don't make sense in practice.\n",
"\n",
- "Thinking back to our bear detector, this mirrors the advice that we also provided there — it is often a good idea to build a model first, and then do your data cleaning, rather than vice versa. The model can help you identify potentially problematic data issues."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "TK Add transition"
+ "Thinking back to our bear detector, this mirrors the advice that we also provided there — it is often a good idea to build a model first, and then do your data cleaning, rather than vice versa. The model can help you identify potentially problematic data issues.\n",
+ "\n",
+ "It can also help you interpret which factors influences specific predictions, with tree interpreters."
]
},
{
@@ -8618,7 +8709,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "TK add a transition"
+ "Now that we covered some classic machine learning to solve this problem, let's see how deep learning can help!"
]
},
{
@@ -8632,7 +8723,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "TK add an introduction here before stacking header"
+ "A problem with random forests, like all machine learning or deep learning algorithms, is that they don't always generalize well to new data. Random forests can help us identify out-of-domain data, and we will see in which situations neural network generalize better, but first, let's look at the extrapolation problem that random forests have."
]
},
{
@@ -9469,14 +9560,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "TK add transition of make this an aside"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### fastai's Tabular classes"
+ "### Sidebar: fastai's Tabular classes"
]
},
{
@@ -9494,7 +9578,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Tk add transition"
+ "### End sidebar"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Another thing that can help with generalization is to use several models and average their predictions, a technique known as ensembling."
]
},
{
@@ -9602,14 +9693,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "TK add transition. Or maybe make this an aside?"
+ "A last technique that has gotten great results is to use embeddings learned by a neural net in a machine learning model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Combining embeddings with other methods"
+ "### Combining embeddings with other methods"
]
},
{
diff --git a/clean/09_tabular.ipynb b/clean/09_tabular.ipynb
index 7825e04..b96b4e9 100644
--- a/clean/09_tabular.ipynb
+++ b/clean/09_tabular.ipynb
@@ -6,7 +6,15 @@
"metadata": {
"hide_input": true
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/sgugger/.kaggle/kaggle.json'\n"
+ ]
+ }
+ ],
"source": [
"#hide\n",
"from utils import *\n",
@@ -87,7 +95,7 @@
{
"data": {
"text/plain": [
- "Path('/home/jhoward/.fastai/archive/bluebook')"
+ "Path('/home/sgugger/.fastai/archive/bluebook')"
]
},
"execution_count": null,
@@ -118,7 +126,7 @@
{
"data": {
"text/plain": [
- "(#7) [Path('TrainAndValid.csv'),Path('Machine_Appendix.csv'),Path('random_forest_benchmark_test.csv'),Path('Test.csv'),Path('median_benchmark.csv'),Path('ValidSolution.csv'),Path('Valid.csv')]"
+ "(#7) [Path('Valid.csv'),Path('Machine_Appendix.csv'),Path('ValidSolution.csv'),Path('TrainAndValid.csv'),Path('random_forest_benchmark_test.csv'),Path('Test.csv'),Path('median_benchmark.csv')]"
]
},
"execution_count": null,
@@ -423,20 +431,8 @@
" saleIs_quarter_start | \n",
" saleIs_year_end | \n",
" saleIs_year_start | \n",
- " SalesID_na | \n",
- " MachineID_na | \n",
- " ModelID_na | \n",
- " datasource_na | \n",
" auctioneerID_na | \n",
- " YearMade_na | \n",
" MachineHoursCurrentMeter_na | \n",
- " saleYear_na | \n",
- " saleMonth_na | \n",
- " saleWeek_na | \n",
- " saleDay_na | \n",
- " saleDayofweek_na | \n",
- " saleDayofyear_na | \n",
- " saleElapsed_na | \n",
" SalesID | \n",
" MachineID | \n",
" ModelID | \n",
@@ -509,32 +505,20 @@
" False | \n",
" False | \n",
" False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " 1139246 | \n",
- " 999089 | \n",
- " 3157 | \n",
- " 121 | \n",
+ " 1139246.0 | \n",
+ " 999089.0 | \n",
+ " 3157.0 | \n",
+ " 121.0 | \n",
" 3.0 | \n",
- " 2004 | \n",
+ " 2004.0 | \n",
" 68.0 | \n",
- " 2006 | \n",
- " 11 | \n",
- " 46 | \n",
- " 16 | \n",
- " 3 | \n",
- " 320 | \n",
- " 1163635200 | \n",
+ " 2006.0 | \n",
+ " 11.0 | \n",
+ " 46.0 | \n",
+ " 16.0 | \n",
+ " 3.0 | \n",
+ " 320.0 | \n",
+ " 1.163635e+09 | \n",
" 11.097410 | \n",
" \n",
" \n",
@@ -591,32 +575,20 @@
" False | \n",
" False | \n",
" False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " 1139248 | \n",
- " 117657 | \n",
- " 77 | \n",
- " 121 | \n",
+ " 1139248.0 | \n",
+ " 117657.0 | \n",
+ " 77.0 | \n",
+ " 121.0 | \n",
" 3.0 | \n",
- " 1996 | \n",
+ " 1996.0 | \n",
" 4640.0 | \n",
- " 2004 | \n",
- " 3 | \n",
- " 13 | \n",
- " 26 | \n",
- " 4 | \n",
- " 86 | \n",
- " 1080259200 | \n",
+ " 2004.0 | \n",
+ " 3.0 | \n",
+ " 13.0 | \n",
+ " 26.0 | \n",
+ " 4.0 | \n",
+ " 86.0 | \n",
+ " 1.080259e+09 | \n",
" 10.950807 | \n",
"
\n",
" \n",
@@ -673,32 +645,20 @@
" False | \n",
" False | \n",
" False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " False | \n",
- " 1139249 | \n",
- " 434808 | \n",
- " 7009 | \n",
- " 121 | \n",
+ " 1139249.0 | \n",
+ " 434808.0 | \n",
+ " 7009.0 | \n",
+ " 121.0 | \n",
" 3.0 | \n",
- " 2001 | \n",
+ " 2001.0 | \n",
" 2838.0 | \n",
- " 2004 | \n",
- " 2 | \n",
- " 9 | \n",
- " 26 | \n",
- " 3 | \n",
- " 57 | \n",
- " 1077753600 | \n",
+ " 2004.0 | \n",
+ " 2.0 | \n",
+ " 9.0 | \n",
+ " 26.0 | \n",
+ " 3.0 | \n",
+ " 57.0 | \n",
+ " 1.077754e+09 | \n",
" 9.210340 | \n",
"
\n",
" \n",
@@ -716,6 +676,66 @@
"to.show(3)"
]
},
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " \n",
+ " | \n",
+ " state | \n",
+ " ProductGroup | \n",
+ " Drive_System | \n",
+ " Enclosure | \n",
+ " SalePrice | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Alabama | \n",
+ " WL | \n",
+ " #na# | \n",
+ " EROPS w AC | \n",
+ " 11.097410 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " North Carolina | \n",
+ " WL | \n",
+ " #na# | \n",
+ " EROPS w AC | \n",
+ " 10.950807 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " New York | \n",
+ " SSL | \n",
+ " #na# | \n",
+ " OROPS | \n",
+ " 9.210340 | \n",
+ "
\n",
+ " \n",
+ "
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "to1 = TabularPandas(df, procs, ['state', 'ProductGroup', 'Drive_System', 'Enclosure'], [], y_names=dep_var, splits=splits)\n",
+ "to1.show(3)"
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -818,6 +838,80 @@
"to.items.head(3)"
]
},
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " state | \n",
+ " ProductGroup | \n",
+ " Drive_System | \n",
+ " Enclosure | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 6 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 33 | \n",
+ " 6 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 32 | \n",
+ " 3 | \n",
+ " 0 | \n",
+ " 6 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " state ProductGroup Drive_System Enclosure\n",
+ "0 1 6 0 3\n",
+ "1 33 6 0 3\n",
+ "2 32 3 0 6"
+ ]
+ },
+ "execution_count": null,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "to1.items[['state', 'ProductGroup', 'Drive_System', 'Enclosure']].head(3)"
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -8203,7 +8297,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "### fastai's Tabular classes"
+ "### Sidebar: fastai's Tabular classes"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### End sidebar"
]
},
{
@@ -8254,7 +8355,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Combining embeddings with other methods"
+ "### Combining embeddings with other methods"
]
},
{
diff --git a/images/decision_tree.PNG b/images/decision_tree.PNG
new file mode 100644
index 0000000..5cef2c7
Binary files /dev/null and b/images/decision_tree.PNG differ