Revised with 1.6.1

2021-06-27 16:34:46 +02:00 · 2021-06-27 16:34:46 +02:00 · 85271e0ea6
commit 85271e0ea6
parent a3b5cd7deb
3 changed files with 3169 additions and 0 deletions
--- a/DataFrames/04__Grouping_data_frames.ipynb
+++ b/DataFrames/04__Grouping_data_frames.ipynb
--- a/DataFrames/05__Collecting_experiments_data_in_a_data_frame.ipynb
+++ b/DataFrames/05__Collecting_experiments_data_in_a_data_frame.ipynb
--- a/DataFrames/06__Next_steps.ipynb
+++ b/DataFrames/06__Next_steps.ipynb
@ -0,0 +1,479 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Final examples\n",
+    "\n",
+    "### Bogumił Kamiński"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let us wrap up our tutorial with examples of joining and reshaping data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Joining and reshaping data frames"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "using DataFrames\n",
+    "using CSV\n",
+    "using Pipe\n",
+    "using Unitful\n",
+    "using Dates"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Load the weather forecast data from two cities from Poland."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table class=\"data-frame\"><thead><tr><th></th><th>city</th><th>date</th><th>rainfall</th></tr><tr><th></th><th>String</th><th>Date</th><th>Float64</th></tr></thead><tbody><p>10 rows × 3 columns</p><tr><th>1</th><td>Olecko</td><td>2020-11-16</td><td>2.9</td></tr><tr><th>2</th><td>Olecko</td><td>2020-11-17</td><td>4.1</td></tr><tr><th>3</th><td>Olecko</td><td>2020-11-19</td><td>4.3</td></tr><tr><th>4</th><td>Olecko</td><td>2020-11-20</td><td>2.0</td></tr><tr><th>5</th><td>Olecko</td><td>2020-11-21</td><td>0.6</td></tr><tr><th>6</th><td>Olecko</td><td>2020-11-22</td><td>1.0</td></tr><tr><th>7</th><td>Ełk</td><td>2020-11-16</td><td>3.9</td></tr><tr><th>8</th><td>Ełk</td><td>2020-11-19</td><td>1.2</td></tr><tr><th>9</th><td>Ełk</td><td>2020-11-20</td><td>2.0</td></tr><tr><th>10</th><td>Ełk</td><td>2020-11-22</td><td>2.0</td></tr></tbody></table>"
+      ],
+      "text/latex": [
+       "\\begin{tabular}{r|ccc}\n",
+       "\t& city & date & rainfall\\\\\n",
+       "\t\\hline\n",
+       "\t& String & Date & Float64\\\\\n",
+       "\t\\hline\n",
+       "\t1 & Olecko & 2020-11-16 & 2.9 \\\\\n",
+       "\t2 & Olecko & 2020-11-17 & 4.1 \\\\\n",
+       "\t3 & Olecko & 2020-11-19 & 4.3 \\\\\n",
+       "\t4 & Olecko & 2020-11-20 & 2.0 \\\\\n",
+       "\t5 & Olecko & 2020-11-21 & 0.6 \\\\\n",
+       "\t6 & Olecko & 2020-11-22 & 1.0 \\\\\n",
+       "\t7 & Ełk & 2020-11-16 & 3.9 \\\\\n",
+       "\t8 & Ełk & 2020-11-19 & 1.2 \\\\\n",
+       "\t9 & Ełk & 2020-11-20 & 2.0 \\\\\n",
+       "\t10 & Ełk & 2020-11-22 & 2.0 \\\\\n",
+       "\\end{tabular}\n"
+      ],
+      "text/plain": [
+       "\u001b[1m10×3 DataFrame\u001b[0m\n",
+       "\u001b[1m Row \u001b[0m│\u001b[1m city   \u001b[0m\u001b[1m date       \u001b[0m\u001b[1m rainfall \u001b[0m\n",
+       "\u001b[1m     \u001b[0m│\u001b[90m String \u001b[0m\u001b[90m Date       \u001b[0m\u001b[90m Float64  \u001b[0m\n",
+       "─────┼──────────────────────────────\n",
+       "   1 │ Olecko  2020-11-16       2.9\n",
+       "   2 │ Olecko  2020-11-17       4.1\n",
+       "   3 │ Olecko  2020-11-19       4.3\n",
+       "   4 │ Olecko  2020-11-20       2.0\n",
+       "   5 │ Olecko  2020-11-21       0.6\n",
+       "   6 │ Olecko  2020-11-22       1.0\n",
+       "   7 │ Ełk     2020-11-16       3.9\n",
+       "   8 │ Ełk     2020-11-19       1.2\n",
+       "   9 │ Ełk     2020-11-20       2.0\n",
+       "  10 │ Ełk     2020-11-22       2.0"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "rainfall_long = CSV.File(\"data/rainfall_forecast.csv\") |> DataFrame"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that we collect rainfall information, so it would be nice to add units to the measured values. This is not a problem with `Unitful.jl`. We take advantage of the fact that `DataFrame` can store vectors of any Julia objects."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table class=\"data-frame\"><thead><tr><th></th><th>city</th><th>date</th><th>rainfall</th></tr><tr><th></th><th>String</th><th>Date</th><th>Quantit…</th></tr></thead><tbody><p>10 rows × 3 columns</p><tr><th>1</th><td>Olecko</td><td>2020-11-16</td><td>2.9 mm</td></tr><tr><th>2</th><td>Olecko</td><td>2020-11-17</td><td>4.1 mm</td></tr><tr><th>3</th><td>Olecko</td><td>2020-11-19</td><td>4.3 mm</td></tr><tr><th>4</th><td>Olecko</td><td>2020-11-20</td><td>2.0 mm</td></tr><tr><th>5</th><td>Olecko</td><td>2020-11-21</td><td>0.6 mm</td></tr><tr><th>6</th><td>Olecko</td><td>2020-11-22</td><td>1.0 mm</td></tr><tr><th>7</th><td>Ełk</td><td>2020-11-16</td><td>3.9 mm</td></tr><tr><th>8</th><td>Ełk</td><td>2020-11-19</td><td>1.2 mm</td></tr><tr><th>9</th><td>Ełk</td><td>2020-11-20</td><td>2.0 mm</td></tr><tr><th>10</th><td>Ełk</td><td>2020-11-22</td><td>2.0 mm</td></tr></tbody></table>"
+      ],
+      "text/latex": [
+       "\\begin{tabular}{r|ccc}\n",
+       "\t& city & date & rainfall\\\\\n",
+       "\t\\hline\n",
+       "\t& String & Date & Quantit…\\\\\n",
+       "\t\\hline\n",
+       "\t1 & Olecko & 2020-11-16 & 2.9 mm \\\\\n",
+       "\t2 & Olecko & 2020-11-17 & 4.1 mm \\\\\n",
+       "\t3 & Olecko & 2020-11-19 & 4.3 mm \\\\\n",
+       "\t4 & Olecko & 2020-11-20 & 2.0 mm \\\\\n",
+       "\t5 & Olecko & 2020-11-21 & 0.6 mm \\\\\n",
+       "\t6 & Olecko & 2020-11-22 & 1.0 mm \\\\\n",
+       "\t7 & Ełk & 2020-11-16 & 3.9 mm \\\\\n",
+       "\t8 & Ełk & 2020-11-19 & 1.2 mm \\\\\n",
+       "\t9 & Ełk & 2020-11-20 & 2.0 mm \\\\\n",
+       "\t10 & Ełk & 2020-11-22 & 2.0 mm \\\\\n",
+       "\\end{tabular}\n"
+      ],
+      "text/plain": [
+       "\u001b[1m10×3 DataFrame\u001b[0m\n",
+       "\u001b[1m Row \u001b[0m│\u001b[1m city   \u001b[0m\u001b[1m date       \u001b[0m\u001b[1m rainfall  \u001b[0m\n",
+       "\u001b[1m     \u001b[0m│\u001b[90m String \u001b[0m\u001b[90m Date       \u001b[0m\u001b[90m Quantity… \u001b[0m\n",
+       "─────┼───────────────────────────────\n",
+       "   1 │ Olecko  2020-11-16     2.9 mm\n",
+       "   2 │ Olecko  2020-11-17     4.1 mm\n",
+       "   3 │ Olecko  2020-11-19     4.3 mm\n",
+       "   4 │ Olecko  2020-11-20     2.0 mm\n",
+       "   5 │ Olecko  2020-11-21     0.6 mm\n",
+       "   6 │ Olecko  2020-11-22     1.0 mm\n",
+       "   7 │ Ełk     2020-11-16     3.9 mm\n",
+       "   8 │ Ełk     2020-11-19     1.2 mm\n",
+       "   9 │ Ełk     2020-11-20     2.0 mm\n",
+       "  10 │ Ełk     2020-11-22     2.0 mm"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "transform!(rainfall_long, :rainfall => x -> x .* u\"mm\", renamecols=false)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "With `renamecols=false` we left the name of the transformed column unchanged when we did an in-place update of the data frame using the `transform!` function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It would be nice to see the data in a wide format, so that each city is represented by a single column. We can achieve this using the `unstack` function:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table class=\"data-frame\"><thead><tr><th></th><th>date</th><th>Olecko</th><th>Ełk</th></tr><tr><th></th><th>Date</th><th>Quantit…?</th><th>Quantit…?</th></tr></thead><tbody><p>6 rows × 3 columns</p><tr><th>1</th><td>2020-11-16</td><td>2.9 mm</td><td>3.9 mm</td></tr><tr><th>2</th><td>2020-11-17</td><td>4.1 mm</td><td><em>missing</em></td></tr><tr><th>3</th><td>2020-11-19</td><td>4.3 mm</td><td>1.2 mm</td></tr><tr><th>4</th><td>2020-11-20</td><td>2.0 mm</td><td>2.0 mm</td></tr><tr><th>5</th><td>2020-11-21</td><td>0.6 mm</td><td><em>missing</em></td></tr><tr><th>6</th><td>2020-11-22</td><td>1.0 mm</td><td>2.0 mm</td></tr></tbody></table>"
+      ],
+      "text/latex": [
+       "\\begin{tabular}{r|ccc}\n",
+       "\t& date & Olecko & Ełk\\\\\n",
+       "\t\\hline\n",
+       "\t& Date & Quantit…? & Quantit…?\\\\\n",
+       "\t\\hline\n",
+       "\t1 & 2020-11-16 & 2.9 mm & 3.9 mm \\\\\n",
+       "\t2 & 2020-11-17 & 4.1 mm & \\emph{missing} \\\\\n",
+       "\t3 & 2020-11-19 & 4.3 mm & 1.2 mm \\\\\n",
+       "\t4 & 2020-11-20 & 2.0 mm & 2.0 mm \\\\\n",
+       "\t5 & 2020-11-21 & 0.6 mm & \\emph{missing} \\\\\n",
+       "\t6 & 2020-11-22 & 1.0 mm & 2.0 mm \\\\\n",
+       "\\end{tabular}\n"
+      ],
+      "text/plain": [
+       "\u001b[1m6×3 DataFrame\u001b[0m\n",
+       "\u001b[1m Row \u001b[0m│\u001b[1m date       \u001b[0m\u001b[1m Olecko     \u001b[0m\u001b[1m Ełk          \u001b[0m\n",
+       "\u001b[1m     \u001b[0m│\u001b[90m Date       \u001b[0m\u001b[90m Quantity…? \u001b[0m\u001b[90m Quantity…?   \u001b[0m\n",
+       "─────┼──────────────────────────────────────\n",
+       "   1 │ 2020-11-16      2.9 mm        3.9 mm\n",
+       "   2 │ 2020-11-17      4.1 mm \u001b[90m missing      \u001b[0m\n",
+       "   3 │ 2020-11-19      4.3 mm        1.2 mm\n",
+       "   4 │ 2020-11-20      2.0 mm        2.0 mm\n",
+       "   5 │ 2020-11-21      0.6 mm \u001b[90m missing      \u001b[0m\n",
+       "   6 │ 2020-11-22      1.0 mm        2.0 mm"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "rainfall_wide = unstack(rainfall_long, :date, :city, :rainfall)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can see that the \"gaps\" in the rainfall information for `\"Ełk\"` column got automatically filled by `missing`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There is also a `stack` function that does the reverse: transforms a data frame from wide to long format."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Also note that one of the cities is `\"Ełk\"`, which has a non standard character `ł` in its name. It is not a problem with `DataFrames.jl`. Let us e.g. extract this column as an exercise:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "6-element Vector{Union{Missing, Quantity{Float64, 𝐋, Unitful.FreeUnits{(mm,), 𝐋, nothing}}}}:\n",
+       " 3.9 mm\n",
+       "       missing\n",
+       " 1.2 mm\n",
+       " 2.0 mm\n",
+       "       missing\n",
+       " 2.0 mm"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "rainfall_wide.Ełk"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "6-element Vector{Union{Missing, Quantity{Float64, 𝐋, Unitful.FreeUnits{(mm,), 𝐋, nothing}}}}:\n",
+       " 3.9 mm\n",
+       "       missing\n",
+       " 1.2 mm\n",
+       " 2.0 mm\n",
+       "       missing\n",
+       " 2.0 mm"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "rainfall_wide.\"Ełk\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When we read the data, we note that still there are gaps in the passed information --- one of the days is missing as there is no forecasted rainfall for it.\n",
+    "\n",
+    "It would be nice to have information for all days in the considered period. Here is the way to do it:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table class=\"data-frame\"><thead><tr><th></th><th>date</th></tr><tr><th></th><th>Date</th></tr></thead><tbody><p>7 rows × 1 columns</p><tr><th>1</th><td>2020-11-16</td></tr><tr><th>2</th><td>2020-11-17</td></tr><tr><th>3</th><td>2020-11-18</td></tr><tr><th>4</th><td>2020-11-19</td></tr><tr><th>5</th><td>2020-11-20</td></tr><tr><th>6</th><td>2020-11-21</td></tr><tr><th>7</th><td>2020-11-22</td></tr></tbody></table>"
+      ],
+      "text/latex": [
+       "\\begin{tabular}{r|c}\n",
+       "\t& date\\\\\n",
+       "\t\\hline\n",
+       "\t& Date\\\\\n",
+       "\t\\hline\n",
+       "\t1 & 2020-11-16 \\\\\n",
+       "\t2 & 2020-11-17 \\\\\n",
+       "\t3 & 2020-11-18 \\\\\n",
+       "\t4 & 2020-11-19 \\\\\n",
+       "\t5 & 2020-11-20 \\\\\n",
+       "\t6 & 2020-11-21 \\\\\n",
+       "\t7 & 2020-11-22 \\\\\n",
+       "\\end{tabular}\n"
+      ],
+      "text/plain": [
+       "\u001b[1m7×1 DataFrame\u001b[0m\n",
+       "\u001b[1m Row \u001b[0m│\u001b[1m date       \u001b[0m\n",
+       "\u001b[1m     \u001b[0m│\u001b[90m Date       \u001b[0m\n",
+       "─────┼────────────\n",
+       "   1 │ 2020-11-16\n",
+       "   2 │ 2020-11-17\n",
+       "   3 │ 2020-11-18\n",
+       "   4 │ 2020-11-19\n",
+       "   5 │ 2020-11-20\n",
+       "   6 │ 2020-11-21\n",
+       "   7 │ 2020-11-22"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "all_days = DataFrame(date=Date.(2020,11, 16:22))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table class=\"data-frame\"><thead><tr><th></th><th>date</th><th>Olecko</th><th>Ełk</th></tr><tr><th></th><th>Date</th><th>Quantit…</th><th>Quantit…</th></tr></thead><tbody><p>7 rows × 3 columns</p><tr><th>1</th><td>2020-11-16</td><td>2.9 mm</td><td>3.9 mm</td></tr><tr><th>2</th><td>2020-11-17</td><td>4.1 mm</td><td>0.0 mm</td></tr><tr><th>3</th><td>2020-11-19</td><td>4.3 mm</td><td>1.2 mm</td></tr><tr><th>4</th><td>2020-11-20</td><td>2.0 mm</td><td>2.0 mm</td></tr><tr><th>5</th><td>2020-11-21</td><td>0.6 mm</td><td>0.0 mm</td></tr><tr><th>6</th><td>2020-11-22</td><td>1.0 mm</td><td>2.0 mm</td></tr><tr><th>7</th><td>2020-11-18</td><td>0.0 mm</td><td>0.0 mm</td></tr></tbody></table>"
+      ],
+      "text/latex": [
+       "\\begin{tabular}{r|ccc}\n",
+       "\t& date & Olecko & Ełk\\\\\n",
+       "\t\\hline\n",
+       "\t& Date & Quantit… & Quantit…\\\\\n",
+       "\t\\hline\n",
+       "\t1 & 2020-11-16 & 2.9 mm & 3.9 mm \\\\\n",
+       "\t2 & 2020-11-17 & 4.1 mm & 0.0 mm \\\\\n",
+       "\t3 & 2020-11-19 & 4.3 mm & 1.2 mm \\\\\n",
+       "\t4 & 2020-11-20 & 2.0 mm & 2.0 mm \\\\\n",
+       "\t5 & 2020-11-21 & 0.6 mm & 0.0 mm \\\\\n",
+       "\t6 & 2020-11-22 & 1.0 mm & 2.0 mm \\\\\n",
+       "\t7 & 2020-11-18 & 0.0 mm & 0.0 mm \\\\\n",
+       "\\end{tabular}\n"
+      ],
+      "text/plain": [
+       "\u001b[1m7×3 DataFrame\u001b[0m\n",
+       "\u001b[1m Row \u001b[0m│\u001b[1m date       \u001b[0m\u001b[1m Olecko    \u001b[0m\u001b[1m Ełk       \u001b[0m\n",
+       "\u001b[1m     \u001b[0m│\u001b[90m Date       \u001b[0m\u001b[90m Quantity… \u001b[0m\u001b[90m Quantity… \u001b[0m\n",
+       "─────┼──────────────────────────────────\n",
+       "   1 │ 2020-11-16     2.9 mm     3.9 mm\n",
+       "   2 │ 2020-11-17     4.1 mm     0.0 mm\n",
+       "   3 │ 2020-11-19     4.3 mm     1.2 mm\n",
+       "   4 │ 2020-11-20     2.0 mm     2.0 mm\n",
+       "   5 │ 2020-11-21     0.6 mm     0.0 mm\n",
+       "   6 │ 2020-11-22     1.0 mm     2.0 mm\n",
+       "   7 │ 2020-11-18     0.0 mm     0.0 mm"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "@pipe leftjoin(all_days, rainfall_wide, on=:date) |>\n",
+    "      coalesce.(_, 0.0u\"mm\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that we additionally used a broadcasted `coalesce` operation on the whole data frame returned from `leftjoin` to replace all `missing` values by `0.0u\"mm\"` in it, as in this case `missing` meant that there is no rain forecasted for that day.\n",
+    "\n",
+    "It was safe to do here, as we knew that `:date` column does not contain missings. In particular note that `leftjoin` would error by default if we tried to perfrom join on a column that contains `missing` values (use `matchmissing` keyword argument in joins to change this behavior)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Conclusions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Before we finish let us summarize the major functions that `DataFrames.jl` provides:\n",
+    "1. data frame is a matrix-like data structure. You can index it just like a matrix. The differences are\n",
+    "   - you can use strings or `Symbol`s to select columns\n",
+    "   - if you select rows with `!` it selects you whole column of a data frame and passes it to you without copying\n",
+    "2. You can quickly summarize the contents of a data frame using the `describe` function\n",
+    "3. You can add rows to a data frame in-place using `push!` (similarly `append!` allows you to add multiple rows at the same time) (also `repeat`/`repeat!`, `hcat` and `vcat` are provided)\n",
+    "4. You can work on a grouped data frame that is created using the `groupby` function. It is a view and works as-if you have created a lookup index to a data frame.\n",
+    "5. There are `select`/`select!`/`transform`/`transform!`/`combine` functions that allow you to quickly transform/aggregate columns of a data frame or grouped data frame; there is also `mapcols`/`mapcols!` functions for quick aggregation of columns of a data frame\n",
+    "6. You can filter rows of a data frame using `filter` and `filter!` functions (also `subset` and `subset!` starting from version 1.0)\n",
+    "7. Use `sort` and `sort!` functions to sort data frames\n",
+    "8. You can join multiple data frames using `innerjoin`, `outerjoin`, `leftjoin`, `rightjoin`, `semijoin`, `antijoin`, and `crossjoin` functions (they work as you would expect them if you know SQL)\n",
+    "9. If you want to iterate rows or columns of a data frame use `eachrow` and `eachcol` functions (we have not discussed them, but they work exactly like in Julia Base)\n",
+    "10. You can change names of columns in a data frame using `rename` and `rename!` functions; to get names of columns of a data frame use `names` (strings) or `propertynames` (`Symbol`s)\n",
+    "11. To get number of rows and columns of a data frame use `nrow` and `ncol` functions\n",
+    "12. To flatten nested columns of a data frame use `flatten`\n",
+    "13. You can easily allow/disallow missing values in columns of a data frame using `allowmising`/`allowmissing!`/`disallowmising`/`disallowmissing!` functions\n",
+    "14. You can drop rows with missing data with `dropmissing`/`dropmissing!` functions\n",
+    "15. You can switch between [long and wide](https://en.wikipedia.org/wiki/Wide_and_narrow_data) representation of a data frame using `stack` and `unstack`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Additionally we have covered `freqtable` from FreqTables.jl, `@pipe` from Pipe.jl, and `lm` from GLM.jl packages that are often useful when wrangling data.\n",
+    "\n",
+    "You can use many formats to store and read data frames, we have discussed CSV.jl and Arrow.jl packages that provide such functionality.\n",
+    "\n",
+    "Finally we have shown how to integrate DataFrames.jl with plotting using PyPlot.jl and Unitful.jl."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Of course this course was just an introduction.\n",
+    "\n",
+    "You can find reviews of functionality of DataFrames.jl in:\n",
+    "* an official manual at https://juliadata.github.io/DataFrames.jl/stable/\n",
+    "* a tutorial going through all functionalities of DataFrames.jl at https://github.com/bkamins/Julia-DataFrames-Tutorial\n",
+    "* documentation strings of the respective funcions"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Julia 1.6.1",
+   "language": "julia",
+   "name": "julia-1.6"
+  },
+  "language_info": {
+   "file_extension": ".jl",
+   "mimetype": "application/julia",
+   "name": "julia",
+   "version": "1.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}