update layout of all exercises

2022-10-14 13:43:12 +02:00 · 2022-10-14 13:43:12 +02:00 · 31d8428f6a
commit 31d8428f6a
parent 38398729ce
11 changed files with 1042 additions and 925 deletions
--- a/exercises/exercises03.md
+++ b/exercises/exercises03.md
@ -11,64 +11,8 @@
 Check what methods does the `repeat` function have.
 Are they all covered in help for this function?

-### Exercise 2
-
-Write a function `fun2` that takes any vector and returns the difference between
-the largest and the smallest element in this vector.
-
-### Exercise 3
-
-Generate a vector of one million random numbers from `[0, 1]` interval.
-Check what is a faster way to get a maximum and minimum element in it. One
-option is by using the `maximum` and `minimum` functions and the other is by
-using the `extrema` function.
-
-### Exercise 4
-
-Assume you have accidentally typed `+x = 1` when wanting to assign `1` to
-variable `x`. What effects can this operation have?
-
-### Exercise 5
-
-What is the result of calling the `subtypes` on `Union{Bool, Missing}` and why?
-
-### Exercise 6
-
-Define two identical anonymous functions `x -> x + 1` in global scope? Do they
-have the same type?
-
-### Exercise 7
-
-Define the `wrap` function taking one argument `i` and returning the anonymous
-function `x -> x + i`. Is the type of such anonymous function the same across
-calls to `wrap` function?
-
-### Exercise 8
-
-You want to write a function that accepts any `Integer` except `Bool` and returns
-the passed value. If `Bool` is passed an error should be thrown.
-
-### Exercise 9
-
-The `@time` macro measures time taken by an expression run and prints it,
-but returns the value of the expression.
-The `@elapsed` macro works differently - it does not print anything, but returns
-time taken to evaluate an expression. Test the `@elapsed` macro by to see how
-long it takes to shuffle a vector of one million floats. Use the `shuffle` function
-from `Random` module.
-
-### Exercise 10
-
-Using the `@btime` macro benchmark the time of calculating the sum of one million
-random floats.
-
-# Solutions
-
 <details>
-
-<summary>Show!</summary>
-
-### Exercise 1
+<summary>Solution</summary>

 Write:
 ```
@ -93,8 +37,16 @@ and `repeat(c::Char, r::Integer)` is its faster version
 that accepts values that have `Char` type only (and it is invoked by Julia
 if value of type `Char` is passed as an argument to `repeat`).

+</details>
+
 ### Exercise 2

+Write a function `fun2` that takes any vector and returns the difference between
+the largest and the smallest element in this vector.
+
+<details>
+<summary>Solution</summary>
+
 You can define is as follows:
 ```
 fun2(x::AbstractVector) = maximum(x) - minimum(x)
@ -109,8 +61,18 @@ end
 Note that these two functions will work with vectors of any elements that
 are ordered and support subtraction (they do not have to be numbers).

+</details>
+
 ### Exercise 3

+Generate a vector of one million random numbers from `[0, 1]` interval.
+Check what is a faster way to get a maximum and minimum element in it. One
+option is by using the `maximum` and `minimum` functions and the other is by
+using the `extrema` function.
+
+<details>
+<summary>Solution</summary>
+
 Here is a way to compare the performance of both options:
 ```
 julia> using BenchmarkTools
@ -130,8 +92,16 @@ As you can see in this situation, although `extrema` does the operation
 in a single pass over `x` it is slower than computing `minimum` and `maximum`
 in two passes.

+</details>
+
 ### Exercise 4

+Assume you have accidentally typed `+x = 1` when wanting to assign `1` to
+variable `x`. What effects can this operation have?
+
+<details>
+<summary>Solution</summary>
+
 If it is a fresh Julia session you define a new function in `Main` for `+` operator:

 ```
@ -167,8 +137,15 @@ julia> +x=1
 ERROR: error in method definition: function Base.+ must be explicitly imported to be extended
 ```

+</details>
+
 ### Exercise 5

+What is the result of calling the `subtypes` on `Union{Bool, Missing}` and why?
+
+<details>
+<summary>Solution</summary>
+
 You get an empty vector:
 ```
 julia> subtypes(Union{Float64, Missing})
@ -181,8 +158,16 @@ declared types that have names (type of such types is `DataType` in Julia).
 *Extra* for this reason `subtypes` has a limited use. To check if one type
 is a subtype of some other type use the `<:` operator.

+</details>
+
 ### Exercise 6

+Define two identical anonymous functions `x -> x + 1` in global scope? Do they
+have the same type?
+
+<details>
+<summary>Solution</summary>
+
 No, each of them has a different type:
 ```
 julia> f1 = x -> x + 1
@ -215,8 +200,17 @@ julia> @time sum(x -> x^2, 1:10)
 385
 ```

+</details>
+
 ### Exercise 7

+Define the `wrap` function taking one argument `i` and returning the anonymous
+function `x -> x + i`. Is the type of such anonymous function the same across
+calls to `wrap` function?
+
+<details>
+<summary>Solution</summary>
+
 Yes, the type is the same:

 ```
@ -252,8 +246,16 @@ julia> @time sumi(3)
 3025
 ```

+</details>
+
 ### Exercise 8

+You want to write a function that accepts any `Integer` except `Bool` and returns
+the passed value. If `Bool` is passed an error should be thrown.
+
+<details>
+<summary>Solution</summary>
+
 We check subtypes of `Integer`:

 ```
@ -292,8 +294,20 @@ julia> fun2(true)
 ERROR: ArgumentError: Bool is not supported
 ```

+</details>
+
 ### Exercise 9

+The `@time` macro measures time taken by an expression run and prints it,
+but returns the value of the expression.
+The `@elapsed` macro works differently - it does not print anything, but returns
+time taken to evaluate an expression. Test the `@elapsed` macro by to see how
+long it takes to shuffle a vector of one million floats. Use the `shuffle` function
+from `Random` module.
+
+<details>
+<summary>Solution</summary>
+
 Here is the code that performs the task:
 ```
 julia> using Random # needed to get access to shuffle
@ -312,8 +326,16 @@ julia> @elapsed shuffle(x)

 Note that the first time we run `shuffle` it takes longer due to compilation.

+</details>
+
 ### Exercise 10

+Using the `@btime` macro benchmark the time of calculating the sum of one million
+random floats.
+
+<details>
+<summary>Solution</summary>
+
 The code you can use is:

 ```
--- a/exercises/exercises04.md
+++ b/exercises/exercises04.md
@ -12,11 +12,81 @@ Create a matrix of shape 2x3 containing numbers from 1 to 6 (fill the matrix
 columnwise with consecutive numbers). Next calculate sum, mean and standard
 deviation of each row and each column of this matrix.

+<details>
+<summary>Solution</summary>
+
+Write:
+```
+julia> using Statistics
+
+julia> mat = [1 3 5
+              2 4 6]
+2×3 Matrix{Int64}:
+ 1  3  5
+ 2  4  6
+
+julia> sum(mat, dims=1)
+1×3 Matrix{Int64}:
+ 3  7  11
+
+julia> sum(mat, dims=2)
+2×1 Matrix{Int64}:
+  9
+ 12
+
+julia> mean(mat, dims=1)
+1×3 Matrix{Float64}:
+ 1.5  3.5  5.5
+
+julia> mean(mat, dims=2)
+2×1 Matrix{Float64}:
+ 3.0
+ 4.0
+
+julia> std(mat, dims=1)
+1×3 Matrix{Float64}:
+ 0.707107  0.707107  0.707107
+
+julia> std(mat, dims=2)
+2×1 Matrix{Float64}:
+ 2.0
+ 2.0
+```
+
+Observe that the returned statistics are also stored in matrices.
+If we compute them for columns (`dims=1`) then the produced matrix has one row.
+If we compute them for rows (`dims=2`) then the produced matrix has one column.
+
+</details>
+
 ### Exercise 2

 For each column of the matrix created in exercise 1 compute its range
 (i.e. the difference between maximum and minimum element stored in it).

+<details>
+<summary>Solution</summary>
+
+Here are some ways you can do it:
+```
+julia> [maximum(x) - minimum(x) for x in eachcol(mat)]
+3-element Vector{Int64}:
+ 1
+ 1
+ 1
+
+julia> map(x -> maximum(x) - minimum(x), eachcol(mat))
+3-element Vector{Int64}:
+ 1
+ 1
+ 1
+```
+
+Observe that if we used `eachcol` the produced result is a vector (not a matrix
+like in exercise 1).
+
+</details>
+
 ### Exercise 3

 This is data for car speed (mph) and distance taken to stop (ft)
@ -79,127 +149,8 @@ speed  dist
 Load this data into Julia (this is part of the exercise) and fit a linear
 regression where speed is a feature and distance is target variable.

-### Exercise 4
-
-Plot the data loaded in exercise 4. Additionally plot the fitted regression
-(you need to check Plots.jl documentation to find a way to do this).
-
-### Exercise 5
-
-A simple code for calculation of Fibonacci numbers for positive
-arguments is as follows:
-
-```
-fib(n) =n < 3 ? 1 : fib(n-1) + fib(n-2)
-```
-
-Using the BenchmarkTools.jl package measure runtime of this function for
-`n` ranging from `1` to `20`.
-
-### Exercise 6
-
-Improve the speed of code from exercise 5 by using a dictionary where you
-store a mapping of `n` to `fib(n)`. Measure the performance of this function
-for the same range of values as in exercise 5.
-
-### Exercise 7
-
-Create a vector containing named tuples representing elements of a 4x4 grid.
-So the first element of this vector should be `(x=1, y=1)` and last should be
-`(x=4, y=4)`. Store the vector in variable `v`.
-
-### Exercise 8
-
-The `filter` function allows you to select some values of an input collection.
-Check its documentation first. Next, use it to keep from the vector `v` from
-exercise 7 only elements whose sum is even.
-
-### Exercise 9
-
-Check the documentation of the `filter!` function. Perform the same operation
-as asked in exercise 8 but using `filter!`. What is the difference?
-
-### Exercise 10
-
-Write a function that takes a number `n`. Next it generates two independent
-random vectors of length `n` and returns their correlation coefficient.
-Run this function `10000` times for `n` equal to `10`, `100`, `1000`,
-and `10000`.
-Create a plot with four histograms of distribution of computed Pearson
-correlation coefficient. Check in the Plots.jl package which function can be
-used to plot histograms.
-
-# Solutions
-
 <details>
-
-<summary>Show!</summary>
-
-### Exercise 1
-
-Write:
-```
-julia> using Statistics
-
-julia> mat = [1 3 5
-              2 4 6]
-2×3 Matrix{Int64}:
- 1  3  5
- 2  4  6
-
-julia> sum(mat, dims=1)
-1×3 Matrix{Int64}:
- 3  7  11
-
-julia> sum(mat, dims=2)
-2×1 Matrix{Int64}:
-  9
- 12
-
-julia> mean(mat, dims=1)
-1×3 Matrix{Float64}:
- 1.5  3.5  5.5
-
-julia> mean(mat, dims=2)
-2×1 Matrix{Float64}:
- 3.0
- 4.0
-
-julia> std(mat, dims=1)
-1×3 Matrix{Float64}:
- 0.707107  0.707107  0.707107
-
-julia> std(mat, dims=2)
-2×1 Matrix{Float64}:
- 2.0
- 2.0
-```
-
-Observe that the returned statistics are also stored in matrices.
-If we compute them for columns (`dims=1`) then the produced matrix has one row.
-If we compute them for rows (`dims=2`) then the produced matrix has one column.
-
-### Exercise 2
-
-Here are some ways you can do it:
-```
-julia> [maximum(x) - minimum(x) for x in eachcol(mat)]
-3-element Vector{Int64}:
- 1
- 1
- 1
-
-julia> map(x -> maximum(x) - minimum(x), eachcol(mat))
-3-element Vector{Int64}:
- 1
- 1
- 1
-```
-
-Observe that if we used `eachcol` the produced result is a vector (not a matrix
-like in exercise 1).
-
-### Exercise 3
+<summary>Solution</summary>

 First create a matrix with source data by copy pasting it from the exercise
 like this:
@ -285,8 +236,16 @@ julia> [ones(50) data[:, 1]] \ data[:, 2]
   3.9324087591240877
 ```

+</details>
+
 ### Exercise 4

+Plot the data loaded in exercise 4. Additionally plot the fitted regression
+(you need to check Plots.jl documentation to find a way to do this).
+
+<details>
+<summary>Solution</summary>
+
 Run the following:
 ```
 using Plots
@ -296,8 +255,23 @@ scatter(data[:, 1], data[:, 2];

 The `smooth=true` keyword argument adds the linear regression line to the plot.

+</details>
+
 ### Exercise 5

+A simple code for calculation of Fibonacci numbers for positive
+arguments is as follows:
+
+```
+fib(n) =n < 3 ? 1 : fib(n-1) + fib(n-2)
+```
+
+Using the BenchmarkTools.jl package measure runtime of this function for
+`n` ranging from `1` to `20`.
+
+<details>
+<summary>Solution</summary>
+
 Use the following code:
 ```
 julia> using BenchmarkTools
@ -331,8 +305,17 @@ julia> for i in 1:40
 Notice that execution time for number `n` is roughly sum of ececution times
 for numbers `n-1` and `n-2`.

+</details>
+
 ### Exercise 6

+Improve the speed of code from exercise 5 by using a dictionary where you
+store a mapping of `n` to `fib(n)`. Measure the performance of this function
+for the same range of values as in exercise 5.
+
+<details>
+<summary>Solution</summary>
+
 Use the following code:

 ```
@ -422,8 +405,17 @@ julia> @time fib2(200)

 As you can see the code does less allocations and is faster now.

+</details>
+
 ### Exercise 7

+Create a vector containing named tuples representing elements of a 4x4 grid.
+So the first element of this vector should be `(x=1, y=1)` and last should be
+`(x=4, y=4)`. Store the vector in variable `v`.
+
+<details>
+<summary>Solution</summary>
+
 Since we are asked to create a vector we can write:

 ```
@ -470,8 +462,17 @@ julia> [(; x, y) for x in 1:4, y in 1:4]
 (x = 4, y = 1)  (x = 4, y = 2)  (x = 4, y = 3)  (x = 4, y = 4)
 ```

+</details>
+
 ### Exercise 8

+The `filter` function allows you to select some values of an input collection.
+Check its documentation first. Next, use it to keep from the vector `v` from
+exercise 7 only elements whose sum is even.
+
+<details>
+<summary>Solution</summary>
+
 To get help on the `filter` function write `?filter`. Next run:

 ```
@ -487,8 +488,16 @@ julia> filter(e -> iseven(e.x + e.y), v)
 (x = 4, y = 4)
 ```

+</details>
+
 ### Exercise 9

+Check the documentation of the `filter!` function. Perform the same operation
+as asked in exercise 8 but using `filter!`. What is the difference?
+
+<details>
+<summary>Solution</summary>
+
 To get help on the `filter!` function write `?filter!`. Next run:

 ```
@ -518,8 +527,21 @@ julia> v
 Notice that `filter` allocated a new vector, while `filter!` updated the `v`
 vector in place.

+</details>
+
 ### Exercise 10

+Write a function that takes a number `n`. Next it generates two independent
+random vectors of length `n` and returns their correlation coefficient.
+Run this function `10000` times for `n` equal to `10`, `100`, `1000`,
+and `10000`.
+Create a plot with four histograms of distribution of computed Pearson
+correlation coefficient. Check in the Plots.jl package which function can be
+used to plot histograms.
+
+<details>
+<summary>Solution</summary>
+
 You can use for example the following code:

 ```
--- a/exercises/exercises05.md
+++ b/exercises/exercises05.md
@ -10,93 +10,8 @@

 Create a matrix containing truth table for `&&` and `||` operations.

-### Exercise 2
-
-The `issubset` function checks if one collection is a subset of other
-collection.
-
-Now take a range `4:6` and check if it is a subset of ranges `4+k:4-k` for
-`k` varying from `1` to `3`. Store the result in a vector.
-
-### Exercise 3
-
-Write a function that accepts two vectors and returns `true` if they have equal
-length and otherwise returns `false`.
-
-### Exercise 4
-
-Consider the vectors `x = [1, 2, 1, 2, 1, 2]`,
-`y = ["a", "a", "b", "b", "b", "a"]`, and `z = [1, 2, 1, 2, 1, 3]`.
-Calculate their Adjusted Mutual Information using scikit-learn.
-
-### Exercise 5
-
-Using Adjusted Mutual Information function from exercise 4 generate
-a pair of random vectors of length 100 containing integer numbers from the
-range `1:5`. Repeat this exercise 1000 times and plot a histogram of AMI.
-Check in the documentation of the `rand` function how you can draw a sample
-from a collection of values.
-
-### Exercise 6
-
-Adjust the code from exercise 5 but replace first 50 elements of each vector
-with zero. Repeat the experiment.
-
-### Exercise 7
-
-Write a function that takes a vector of integer values and returns a dictionary
-giving information how many times each integer was present in the passed vector.
-
-Test this function on vectors `v1 = [1, 2, 3, 2, 3, 3]`, `v2 = [true, false]`,
-and `v3 = 3:5`.
-
-### Exercise 8
-
-Write code that creates a `Bool` diagonal matrix of size 5x5.
-
-### Exercise 9
-
-Write a code comparing performance of calculation of sum of logarithms of
-elements of a vector `1:100` using broadcasting and the `sum` function vs only
-the `sum` function taking a function as a first argument.
-
-### Exercise 10
-
-Create a dictionary in which for each number from `1` to `10` you will store
-a vector of its positive divisors. You can check the reminder of division
-of two values using the `rem` function.
-
-Additionally (not covered in the book), you can drop elements
-from a comprehension if you add an `if` clause after the `for` clause, for
-example to keep only odd numbers from range `1:10` do:
-
-```
-julia> [i for i in 1:10 if isodd(i)]
-5-element Vector{Int64}:
- 1
- 3
- 5
- 7
- 9
-```
-
-You can populate a dictionary by passing a vector of pairs to it (not covered in
-the book), for example:
-
-```
-julia> Dict(["a" => 1, "b" => 2])
-Dict{String, Int64} with 2 entries:
-  "b" => 2
-  "a" => 1
-```
-
-# Solutions
-
 <details>
-
-<summary>Show!</summary>
-
-### Exercise 1
+<summary>Solution</summary>

 You can do it as follows:
 ```
@ -113,8 +28,19 @@ julia> [true, false] .|| [true false]

 Note that the first array is a vector, while the second array is a 1-row matrix.

+</details>
+
 ### Exercise 2

+The `issubset` function checks if one collection is a subset of other
+collection.
+
+Now take a range `4:6` and check if it is a subset of ranges `4+k:4-k` for
+`k` varying from `1` to `3`. Store the result in a vector.
+
+<details>
+<summary>Solution</summary>
+
 You can do it like this using broadcasting:
 ```
 julia> issubset.(Ref(4:6), [4-k:4+k for k in 1:3])
@ -125,16 +51,33 @@ julia> issubset.(Ref(4:6), [4-k:4+k for k in 1:3])
 ```
 Note that you need to use `Ref` to protect `4:6` from being broadcasted over.

+</details>
+
 ### Exercise 3

+Write a function that accepts two vectors and returns `true` if they have equal
+length and otherwise returns `false`.
+
+<details>
+<summary>Solution</summary>
+
 This function can be written as follows:

 ```
 function equallength(x::AbstractVector, y::AbstractVector) = length(x) == length(y)
 ```

+</details>
+
 ### Exercise 4

+Consider the vectors `x = [1, 2, 1, 2, 1, 2]`,
+`y = ["a", "a", "b", "b", "b", "a"]`, and `z = [1, 2, 1, 2, 1, 3]`.
+Calculate their Adjusted Mutual Information using scikit-learn.
+
+<details>
+<summary>Solution</summary>
+
 You can do this exercise as follows:
 ```
 julia> using PyCall
@ -151,8 +94,19 @@ julia> metrics.adjusted_mutual_info_score(y, z)
 -0.21267989848846763
 ```

+</details>
+
 ### Exercise 5

+Using Adjusted Mutual Information function from exercise 4 generate
+a pair of random vectors of length 100 containing integer numbers from the
+range `1:5`. Repeat this exercise 1000 times and plot a histogram of AMI.
+Check in the documentation of the `rand` function how you can draw a sample
+from a collection of values.
+
+<details>
+<summary>Solution</summary>
+
 You can create such a plot using the following commands:

 ```
@ -163,8 +117,16 @@ histogram([metrics.adjusted_mutual_info_score(rand(1:5, 100), rand(1:5, 100))

 You can check that AMI oscillates around 0.

+</details>
+
 ### Exercise 6

+Adjust the code from exercise 5 but replace first 50 elements of each vector
+with zero. Repeat the experiment.
+
+<details>
+<summary>Solution</summary>
+
 This time it is convenient to write a helper function. Note that we use
 broadcasting to update values in the vectors.

@ -182,8 +144,19 @@ histogram([exampleAMI() for i in 1:1000], label="AMI")
 Note that this time AMI is a bit below 0.5, which shows a better match between
 vectors.

+</details>
+
 ### Exercise 7

+Write a function that takes a vector of integer values and returns a dictionary
+giving information how many times each integer was present in the passed vector.
+
+Test this function on vectors `v1 = [1, 2, 3, 2, 3, 3]`, `v2 = [true, false]`,
+and `v3 = 3:5`.
+
+<details>
+<summary>Solution</summary>
+
 ```
 julia> function counter(v::AbstractVector{<:Integer})
           d = Dict{eltype(v), Int}()
@ -219,8 +192,15 @@ Dict{Int64, Int64} with 3 entries:
 Note that we used the `eltype` function to set a proper key type for
 dictionary `d`.

+</details>
+
 ### Exercise 8

+Write code that creates a `Bool` diagonal matrix of size 5x5.
+
+<details>
+<summary>Solution</summary>
+
 This is a way to do it:
 ```
 julia> 1:5 .== (1:5)'
@ -246,8 +226,17 @@ julia> I(5)
 ⋅  ⋅  ⋅  ⋅  1
 ```

+</details>
+
 ### Exercise 9

+Write a code comparing performance of calculation of sum of logarithms of
+elements of a vector `1:100` using broadcasting and the `sum` function vs only
+the `sum` function taking a function as a first argument.
+
+<details>
+<summary>Solution</summary>
+
 Here is how you can do it:

 ```
@ -265,8 +254,41 @@ julia> @btime sum(log, 1:100)
 As you can see using the `sum` function with `log` as its first argument
 is a bit faster as it is not allocating.

+</details>
+
 ### Exercise 10

+Create a dictionary in which for each number from `1` to `10` you will store
+a vector of its positive divisors. You can check the reminder of division
+of two values using the `rem` function.
+
+Additionally (not covered in the book), you can drop elements
+from a comprehension if you add an `if` clause after the `for` clause, for
+example to keep only odd numbers from range `1:10` do:
+
+```
+julia> [i for i in 1:10 if isodd(i)]
+5-element Vector{Int64}:
+ 1
+ 3
+ 5
+ 7
+ 9
+```
+
+You can populate a dictionary by passing a vector of pairs to it (not covered in
+the book), for example:
+
+```
+julia> Dict(["a" => 1, "b" => 2])
+Dict{String, Int64} with 2 entries:
+  "b" => 2
+  "a" => 1
+```
+
+<details>
+<summary>Solution</summary>
+
 Here is how you can do it:

 ```
--- a/exercises/exercises06.md
+++ b/exercises/exercises06.md
@ -11,16 +11,47 @@
 Interpolate the expression `1 + 2` into a string `"I have apples worth 3USD"`
 (replace `3` by a proper interpolation expression) and replace `USD` by `$`.

+<details>
+<summary>Solution</summary>
+
+```
+julia> "I have apples worth $(1+2)\$"
+"I have apples worth 3\$"
+```
+
+</details>
+
 ### Exercise 2

 Download the file `https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data`
 as `iris.csv` to your local folder.

+<details>
+<summary>Solution</summary>
+
+```
+import Downloads
+Downloads.download("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",
+                   "iris.csv")
+```
+
+</details>
+
 ### Exercise 3

 Write the string `"https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"`
 in two lines so that it takes less horizontal space.

+<details>
+<summary>Solution</summary>
+
+```
+"https://archive.ics.uci.edu/ml/\
+ machine-learning-databases/iris/iris.data"
+```
+
+</details>
+
 ### Exercise 4

 Load data stored in `iris.csv` file into a `data` vector where each element
@ -28,73 +59,9 @@ should be a named tuple of the form `(sl=1.0, sw=2.0, pl=3.0, pw=4.0, c="x")` if
 the source line had data `1.0,2.0,3.0,4.0,x` (note that first four elements are parsed
 as floats).

-### Exercise 5
-
-The `data` structure is a vector of named tuples, change it to a named tuple
-of vectors (with the same field names) and call it `data2`.
-
-### Exercise 6
-
-Calculate the frequency of each type of Iris type (`c` field in `data2`).
-
-### Exercise 7
-
-Create a vector `c2` that is derived from `c` in `data2` but holds inline strings,
-vector `c3` that is a `PooledVector`, and vector `c4` that holds `Symbol`s.
-Compare sizes of the three objects.
-
-### Exercise 8
-
-You know that `refs` field of `PooledArray` stores an integer index of a given
-value in it. Using this information make a scatter plot of `pl` vs `pw` vectors
-in `data2`, but for each Iris type give a different point color (check the
-`color` keyword argument meaning in the Plots.jl manual; you can use the
-`plot_color` function).
-
-### Exercise 9
-
-Type the following string `"a²=b² ⟺ a=b ∨ a=-b"` in your terminal and bind it to
-`str` variable (do not copy paste the string, but type it).
-
-### Exercise 10
-
-In the `str` string from exercise 9 find all matches of a pattern where `a`
-is followed by `b` but there can be some characters between them.
-
-# Solutions
-
 <details>
+<summary>Solution</summary>

-<summary>Show!</summary>
-
-### Exercise 1
-
-Solution:
-```
-julia> "I have apples worth $(1+2)\$"
-"I have apples worth 3\$"
-```
-
-### Exercise 2
-
-Solution:
-```
-import Downloads
-Downloads.download("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",
-                   "iris.csv")
-```
-
-### Exercise 3
-
-Solution:
-```
-"https://archive.ics.uci.edu/ml/\
- machine-learning-databases/iris/iris.data"
-```
-
-### Exercise 4
-
-Solution:
 ```
 julia> function line_parser(line)
           elements = split(line, ",")
@ -125,8 +92,16 @@ Note that we used `1:end-1` selector to drop last element from the read lines
 since it is empty. This is the reason why adding the
 `@assert length(elements) == 5` check in the `line_parser` function is useful.

+</details>
+
 ### Exercise 5

+The `data` structure is a vector of named tuples, change it to a named tuple
+of vectors (with the same field names) and call it `data2`.
+
+<details>
+<summary>Solution</summary>
+
 Later in the book you will learn more advanced ways to do it. Here let us
 use a most basic approach:

@ -138,9 +113,15 @@ data2 = (sl=[d.sl for d in data],
         c=[d.c for d in data])
 ```

+</details>
+
 ### Exercise 6

-Solution:
+Calculate the frequency of each type of Iris type (`c` field in `data2`).
+
+<details>
+<summary>Solution</summary>
+
 ```
 julia> using FreqTables

@ -153,9 +134,17 @@ Dim1              │
 "Iris-virginica"  │ 50
 ```

+</details>
+
 ### Exercise 7

-Solution:
+Create a vector `c2` that is derived from `c` in `data2` but holds inline strings,
+vector `c3` that is a `PooledVector`, and vector `c4` that holds `Symbol`s.
+Compare sizes of the three objects.
+
+<details>
+<summary>Solution</summary>
+
 ```
 julia> using InlineStrings

@ -213,16 +202,34 @@ julia> Base.summarysize(c4)
 1240
 ```

+</details>
+
 ### Exercise 8

-Solution:
+You know that `refs` field of `PooledArray` stores an integer index of a given
+value in it. Using this information make a scatter plot of `pl` vs `pw` vectors
+in `data2`, but for each Iris type give a different point color (check the
+`color` keyword argument meaning in the Plots.jl manual; you can use the
+`plot_color` function).
+
+<details>
+<summary>Solution</summary>
+
 ```
 using Plots
 scatter(data2.pl, data2.pw, color=plot_color(c3.refs), legend=false)
 ```

+</details>
+
 ### Exercise 9

+Type the following string `"a²=b² ⟺ a=b ∨ a=-b"` in your terminal and bind it to
+`str` variable (do not copy paste the string, but type it).
+
+<details>
+<summary>Solution</summary>
+
 The hard part is typing `²`, `⟺` and `∨`. You can check how to do it using help:
 ```
 help?> ²
@ -237,8 +244,16 @@ help?> ∨

 Save the string in the `str` variable as we will use it in the next exercise.

+</details>
+
 ### Exercise 10

+In the `str` string from exercise 9 find all matches of a pattern where `a`
+is followed by `b` but there can be some characters between them.
+
+<details>
+<summary>Show!</summary>
+
 The exercise does not specify how the matching should be done. If we
 want it to be eager (match as much as possible), we write:

--- a/exercises/exercises07.md
+++ b/exercises/exercises07.md
@ -19,75 +19,10 @@ If you want to understand all the parameters plese check their meaning
 For us it is enough that this request generates 10 random integers in the range
 from 1 to 6. Run this query in Julia and parse the result.

-### Exercise 2
-
-Write a function that tries to parse a string as an integer.
-If it succeeds it should return the integer, otherwise it should return `0`
-but print error message.
-
-### Exercise 3
-
-Create a matrix containing truth table for `&&` operation including `missing`.
-If some operation errors store `"error"` in the table. As an extra feature (this
-is harder so you can skip it) in each cell store both inputs and output to make
-reading the table easier.
-
-### Exercise 4
-
-Take a vector `v = [1.5, 2.5, missing, 4.5, 5.5, missing]` and replace all
-missing values in it by the mean of the non-missing values.
-
-### Exercise 5
-
-Take a vector `s = ["1.5", "2.5", missing, "4.5", "5.5", missing]` and parse
-strings stored in it as `Float64`, while keeping `missing` values unchanged.
-
-### Exercise 6
-
-Print to the terminal all days in January 2023 that are Mondays.
-
-### Exercise 7
-
-Compute the dates that are one month later than January 15, 2020, February 15
-2020, March 15, 2020, and April 15, 2020. How many days pass during this one
-month. Print the results to the screen?
-
-### Exercise 8
-
-Parse the following string as JSON:
-```
-str = """
-[{"x":1,"y":1},
- {"x":2,"y":4},
- {"x":3,"y":9},
- {"x":4,"y":16},
- {"x":5,"y":25}]
-"""
-```
-into a `json` variable.
-
-### Exercise 9
-
-Extract from the `json` variable from exercise 8 two vectors `x` and `y`
-that correspond to the fields stored in the JSON structure.
-Plot `y` as a function of `x`.
-
-### Exercise 10
-
-Given a vector `m = [missing, 1, missing, 3, missing, missing, 6, missing]`.
-Use linear interpolation for filling missing values. For the extreme values
-use nearest available observation (you will need to consult Impute.jl
-documentation to find all required functions).
-
-# Solutions
-
 <details>
+<summary>Solution</summary>

-<summary>Show!</summary>
-
-### Exercise 1
-
-Solution (example run):
+Example run:

 ```
 julia> using HTTP
@ -109,8 +44,17 @@ julia> parse.(Int, split(String(response.body)))
 6
 ```

+</details>
+
 ### Exercise 2

+Write a function that tries to parse a string as an integer.
+If it succeeds it should return the integer, otherwise it should return `0`
+but print error message.
+
+<details>
+<summary>Solution</summary>
+
 Example function:

 ```
@ -160,9 +104,17 @@ end
 ```
 But this time we do not see the cause of the error.

+</details>
+
 ### Exercise 3

-Solution:
+Create a matrix containing truth table for `&&` operation including `missing`.
+If some operation errors store `"error"` in the table. As an extra feature (this
+is harder so you can skip it) in each cell store both inputs and output to make
+reading the table easier.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> function apply_and(x, y)
@ -181,9 +133,15 @@ julia> apply_and.([true, false, missing], [true false missing])
 "missing && true = error"  "missing && false = error"  "missing && missing = error"
 ```

+</details>
+
 ### Exercise 4

-Solution:
+Take a vector `v = [1.5, 2.5, missing, 4.5, 5.5, missing]` and replace all
+missing values in it by the mean of the non-missing values.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> using Statistics
@ -198,9 +156,15 @@ julia> coalesce.(v, mean(skipmissing(v)))
 3.5
 ```

+</details>
+
 ### Exercise 5

-Solution:
+Take a vector `s = ["1.5", "2.5", missing, "4.5", "5.5", missing]` and parse
+strings stored in it as `Float64`, while keeping `missing` values unchanged.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> using Missings
@ -215,9 +179,16 @@ julia> passmissing(parse).(Float64, s)
  missing
 ```

+</details>
+
 ### Exercise 6

-Example solution:
+Print to the terminal all days in January 2023 that are Mondays.
+
+<details>
+<summary>Solution</summary>
+
+Example:

 ```
 julia> using Dates
@ -232,9 +203,18 @@ julia> for day in Date.(2023, 01, 1:31)
 2023-01-30
 ```

+</details>
+
 ### Exercise 7

-Example solution:
+Compute the dates that are one month later than January 15, 2020, February 15
+2020, March 15, 2020, and April 15, 2020. How many days pass during this one
+month. Print the results to the screen?
+
+<details>
+<summary>Solution</summary>
+
+Example:

 ```
 julia> for day in Date.(2023, 1:4, 15)
@ -247,9 +227,24 @@ julia> for day in Date.(2023, 1:4, 15)
 2023-04-15 + 1 month = 2023-05-15 (difference: 30 days)
 ```

+</details>
+
 ### Exercise 8

-Solution:
+Parse the following string as JSON:
+```
+str = """
+[{"x":1,"y":1},
+ {"x":2,"y":4},
+ {"x":3,"y":9},
+ {"x":4,"y":16},
+ {"x":5,"y":25}]
+"""
+```
+into a `json` variable.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> using JSON3
@ -278,9 +273,16 @@ julia> json = JSON3.read(str)
 }
 ```

+</details>
+
 ### Exercise 9

-Solution:
+Extract from the `json` variable from exercise 8 two vectors `x` and `y`
+that correspond to the fields stored in the JSON structure.
+Plot `y` as a function of `x`.
+
+<details>
+<summary>Solution</summary>

 ```
 using Plots
@ -289,9 +291,17 @@ y = [el.y for el in json]
 plot(x, y, xlabel="x", ylabel="y", legend=false)
 ```

+</details>
+
 ### Exercise 10

-Solution:
+Given a vector `m = [missing, 1, missing, 3, missing, missing, 6, missing]`.
+Use linear interpolation for filling missing values. For the extreme values
+use nearest available observation (you will need to consult Impute.jl
+documentation to find all required functions).
+
+<details>
+<summary>Solution</summary>

 ```
 julia> using Impute
--- a/exercises/exercises08.md
+++ b/exercises/exercises08.md
@ -11,63 +11,8 @@
 Read data stored in a gzip-compressed file `example8.csv.gz` into a `DataFrame`
 called `df`.

-### Exercise 2
-
-Get number of rows, columns, column names and summary statistics of the
-`df` data frame from exercise 1.
-
-### Exercise 3
-
-Make a plot of `number` against `square` columns of `df` data frame.
-
-### Exercise 4
-
-Add a column to `df` data frame with name `name string` containing string
-representation of numbers in column `number`, i.e.
-`["one", "two", "three", "four"]`.
-
-### Exercise 5
-
-Check if `df` contains column `square2`.
-
-### Exercise 6
-
-Extract column `number` from `df` and empty it (recall `empty!` function
-discussed in chapter 4).
-
-### Exercise 7
-
-In `Random` module the `randexp` function is defined that samples numbers
-from exponential distribution with scale 1.
-Draw two 100,000 element samples from this distribution store them
-in `x` and `y` vectors. Plot histograms of maximum of pairs of sampled values
-and sum of vector `x` and half of vector `y`.
-
-### Exercise 8
-
-Using vectors `x` and `y` from exercise 7 create the `df` data frame storing them,
-and maximum of pairs of sampled values and sum of vector `x` and half of vector `y`.
-Compute all standard descriptive statistics of columns of this data frame.
-
-### Exercise 9
-
-Store the `df` data frame from exercise 8 in Apache Arrow file and CSV file.
-Compare the size of created files using the `filesize` function.
-
-### Exercise 10
-
-Write the `df` data frame into SQLite database. Next find information about
-tables in this database. Run a query against a table representing the `df` data
-frame to calculate the mean of column `x`. Does it match the result we got in
-exercise 8?
-
-# Solutions
-
 <details>
-
-<summary>Show!</summary>
-
-### Exercise 1
+<summary>Solution</summary>

 CSV.jl supports reading gzip-compressed files so you can just do:

@ -106,9 +51,15 @@ julia> df = CSV.read(plain, DataFrame)
   4 │      4      16
 ```

+</details>
+
 ### Exercise 2

-Solution:
+Get number of rows, columns, column names and summary statistics of the
+`df` data frame from exercise 1.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> nrow(df)
@ -131,17 +82,30 @@ julia> describe(df)
   2 │ square       7.75      2      6.5     16         0  Int64
 ```

+</details>
+
 ### Exercise 3

-Solution:
+Make a plot of `number` against `square` columns of `df` data frame.
+
+<details>
+<summary>Solution</summary>
+
 ```
 using Plots
 plot(df.number, df.square, xlabel="number", ylabel="square", legend=false)
 ```

+</details>
+
 ### Exercise 4

-Solution:
+Add a column to `df` data frame with name `name string` containing string
+representation of numbers in column `number`, i.e.
+`["one", "two", "three", "four"]`.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> df."name string" = ["one", "two", "three", "four"]
@ -164,8 +128,15 @@ julia> df

 Note that we needed to use a string as we have space in column name.

+</details>
+
 ### Exercise 5

+Check if `df` contains column `square2`.
+
+<details>
+<summary>Solution</summary>
+
 You can use either `hasproperty` or `columnindex`:

 ```
@ -184,9 +155,15 @@ julia> df.square2
 ERROR: ArgumentError: column name :square2 not found in the data frame; existing most similar names are: :square
 ```

+</details>
+
 ### Exercise 6

-Solution:
+Extract column `number` from `df` and empty it (recall `empty!` function
+discussed in chapter 4).
+
+<details>
+<summary>Solution</summary>

 ```
 julia> empty!(df[:, :number])
@ -198,9 +175,19 @@ as it would corrupt the `df` data frame (these operations do non-copying
 extraction of a column from a data frame as opposed to `df[:, :number]`
 which makes a copy).

+</details>
+
 ### Exercise 7

-Solution:
+In `Random` module the `randexp` function is defined that samples numbers
+from exponential distribution with scale 1.
+Draw two 100,000 element samples from this distribution store them
+in `x` and `y` vectors. Plot histograms of maximum of pairs of sampled values
+and sum of vector `x` and half of vector `y`.
+
+<details>
+<summary>Solution</summary>
+
 ```
 using Random
 using Plots
@ -212,10 +199,19 @@ histogram!(max.(x, y), label="maximum")

 I have put both histograms on the same plot to show that they overlap.

+</details>
+
 ### Exercise 8

-Solution (you might get slightly different results because we did not set
-the seed of random number generator when creating `x` and `y` vectors):
+Using vectors `x` and `y` from exercise 7 create the `df` data frame storing them,
+and maximum of pairs of sampled values and sum of vector `x` and half of vector `y`.
+Compute all standard descriptive statistics of columns of this data frame.
+
+<details>
+<summary>Solution</summary>
+
+You might get slightly different results because we did not set
+the seed of random number generator when creating `x` and `y` vectors:

 ```
 julia> df = DataFrame(x=x, y=y);
@ -238,8 +234,16 @@ julia> describe(df, :all)
 We indeed see that `x+y/2` and `max.(x,y)` columns have very similar summary
 statistics except `first` and `last` as expected.

+</details>
+
 ### Exercise 9

+Store the `df` data frame from exercise 8 in Apache Arrow file and CSV file.
+Compare the size of created files using the `filesize` function.
+
+<details>
+<summary>Solution</summary>
+
 ```
 julia> using Arrow

@ -258,8 +262,18 @@ julia> filesize("df.arrow")

 In this case Apache Arrow file is smaller.

+</details>
+
 ### Exercise 10

+Write the `df` data frame into SQLite database. Next find information about
+tables in this database. Run a query against a table representing the `df` data
+frame to calculate the mean of column `x`. Does it match the result we got in
+exercise 8?
+
+<details>
+<summary>Solution</summary>
+
 ```
 julia> using SQLite

--- a/exercises/exercises09.md
+++ b/exercises/exercises09.md
@ -22,69 +22,8 @@ Create `matein2` data frame that will have only puzzles that have `"mateIn2"`
 in the `Themes` column.
 Use the `contains` function (check its documentation first).

-### Exercise 2
-
-What is the fraction of puzzles that are mate in 2 in relation to all puzzles
-in the `puzzles` data frame?
-
-### Exercise 3
-
-Create `small` data frame that holds first 10 rows of `matein2` data frame
-and columns `Rating`, `RatingDeviation`, and `NbPlays`.
-
-### Exercise 4
-
-Iterate rows of `small` data frame and print the ratio of
-`RatingDeviation` and `NbPlays` for each row.
-
-### Exercise 5
-
-Get names of columns from the `matein2` data frame that end with `n` (ignore case).
-
-### Exercise 6
-
-Write a function `collatz` that runs the following process. Start with a
-positive number `n`. If it is even divide it by two. If it is odd multiply
-it by 3 and add one. The function should return the number of steps needed to
-reach 1.
-
-Create a `d` dictionary that maps number of steps needed to a list of numbers from
-the range `1:100` that required this number of steps.
-
-### Exercise 7
-
-Using the `d` dictionary make a scatter plot of number of steps required
-vs average value of numbers that require this number of steps.
-
-### Exercise 8
-
-Repeat the process from exercises 6 and 7, but this time use a data frame
-and try to write an appropriate expression using the `combine` and `groupby`
-functions (as it was explained in the last part of chapter 9). This time
-perform computations for numbers ranging from one to one million.
-
-### Exercise 9
-
-Set seed of random number generator to `1234`. Draw 100 random points
-from the interval `[0, 1]`. Store this vector in a data frame as `x` column.
-Now compute `y` column using a formula `4 * (x - 0.5) ^ 2`.
-Add random noise to column `y` that has normal distribution with mean 0 and
-standard deviation 0.25. Call this column `z`.
-Make a scatter plot with `x` on x-axis and `y` and `z` on y-axis.
-
-### Exercise 10
-
-Add a line of LOESS regression of `x` explaining `z` plot to figure produced in exercise 10.
-
-# Solutions
-
 <details>
-
-<summary>Show!</summary>
-
-### Exercise 1
-
-Solution:
+<summary>Solution</summary>

 ```
 julia> matein2 = puzzles[contains.(puzzles.Themes, "mateIn2"), :]
@ -104,9 +43,17 @@ julia> matein2 = puzzles[contains.(puzzles.Themes, "mateIn2"), :]
                                                                                                                                         1 column and 274127 rows omitted
 ```

+</details>
+
 ### Exercise 2

-Solution (two ways to do it):
+What is the fraction of puzzles that are mate in 2 in relation to all puzzles
+in the `puzzles` data frame?
+
+<details>
+<summary>Solution</summary>
+
+Two ways to do it:

 ```
 julia> using Statistics
@ -118,9 +65,15 @@ julia> mean(contains.(puzzles.Themes, "mateIn2"))
 0.12852152542746353
 ```

+</details>
+
 ### Exercise 3

-Solution:
+Create `small` data frame that holds first 10 rows of `matein2` data frame
+and columns `Rating`, `RatingDeviation`, and `NbPlays`.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> small = matein2[1:10, ["Rating", "RatingDeviation", "NbPlays"]]
@ -140,9 +93,15 @@ julia> small = matein2[1:10, ["Rating", "RatingDeviation", "NbPlays"]]
  10 │    979              144       14
 ```

+</details>
+
 ### Exercise 4

-Solution:
+Iterate rows of `small` data frame and print the ratio of
+`RatingDeviation` and `NbPlays` for each row.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> for row in eachrow(small)
@ -160,9 +119,16 @@ julia> for row in eachrow(small)
 10.285714285714286
 ```

+</details>
+
 ### Exercise 5

-Solution (several options):
+Get names of columns from the `matein2` data frame that end with `n` (ignore case).
+
+<details>
+<summary>Solution</summary>
+
+Several options:
 ```
 julia> names(matein2, Cols(col -> uppercase(col[end]) == 'N'))
 2-element Vector{String}:
@ -180,9 +146,20 @@ julia> names(matein2, r"[nN]$")
 "RatingDeviation"
 ```

+</details>
+
 ### Exercise 6

-Solution:
+Write a function `collatz` that runs the following process. Start with a
+positive number `n`. If it is even divide it by two. If it is odd multiply
+it by 3 and add one. The function should return the number of steps needed to
+reach 1.
+
+Create a `d` dictionary that maps number of steps needed to a list of numbers from
+the range `1:100` that required this number of steps.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> function collatz(n)
@ -232,9 +209,15 @@ Dict{Int64, Vector{Int64}} with 45 entries:
 As we can see even for small `n` the number of steps required to reach `1`
 can get quite large.

+</details>
+
 ### Exercise 7

-Solution:
+Using the `d` dictionary make a scatter plot of number of steps required
+vs average value of numbers that require this number of steps.
+
+<details>
+<summary>Solution</summary>

 ```
 using Plots
@ -247,9 +230,17 @@ scatter(steps, mean_number, xlabel="steps", ylabel="mean of numbers", legend=fal
 Note that we needed to use `collect` on `keys` as `scatter` expects an array
 not just an iterator.

+</details>
+
 ### Exercise 8

-Solution:
+Repeat the process from exercises 6 and 7, but this time use a data frame
+and try to write an appropriate expression using the `combine` and `groupby`
+functions (as it was explained in the last part of chapter 9). This time
+perform computations for numbers ranging from one to one million.
+
+<details>
+<summary>Solution</summary>

 ```
 df = DataFrame(n=1:10^6);
@ -258,6 +249,8 @@ agg = combine(groupby(df, :collatz), :n => mean);
 scatter(agg.collatz, agg.n_mean, xlabel="steps", ylabel="mean of numbers", legend=false)
 ```

+</details>
+
 ### Exercise 9

 Set seed of random number generator to `1234`. Draw 100 random points
@ -267,7 +260,8 @@ Add random noise to column `y` that has normal distribution with mean 0 and
 standard deviation 0.25. Call this column `z`.
 Make a scatter plot with `x` on x-axis and `y` and `z` on y-axis.

-Solution:
+<details>
+<summary>Solution</summary>

 ```
 using Random
@ -278,9 +272,14 @@ df.z = df.y + randn(100) / 4
 scatter(df.x, [df.y df.z], labels=["y" "z"])
 ```

+</details>
+
 ### Exercise 10

-Solution:
+Add a line of LOESS regression of `x` explaining `z` plot to figure produced in exercise 10.
+
+<details>
+<summary>Solution</summary>

 ```
 using Loess
--- a/exercises/exercises10.md
+++ b/exercises/exercises10.md
@ -13,89 +13,8 @@ independently and uniformly from the [0,1[ interval.
 Create a data frame using data from this matrix using auto-generated
 column names.

-### Exercise 2
-
-Now, using matrix `mat` create a data frame with randomly generated
-column names. Use the `randstring` function from the `Random` module
-to generate them. Store this data frame in `df` variable.
-
-### Exercise 3
-
-Create a new data frame, taking `df` as a source that will have the same
-columns but its column names will be `y1`, `y2`, `y3`, `y4`.
-
-### Exercise 4
-
-Create a dictionary holding `column_name => column_vector` pairs
-using data stored in data frame `df`. Save this dictionary in variable `d`.
-
-### Exercise 5
-
-Create a data frame back from dictionary `d` from exercise 4. Compare it
-with `df`.
-
-### Exercise 6
-
-For data frame `df` compute the dot product between all pairs of its columns.
-Use the `dot` function from the `LinearAlgebra` module.
-
-### Exercise 7
-
-Given two data frames:
-
-```
-julia> df1 = DataFrame(a=1:2, b=11:12)
-2×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1     11
-   2 │     2     12
-
-julia> df2 = DataFrame(a=1:2, c=101:102)
-2×2 DataFrame
- Row │ a      c
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1    101
-   2 │     2    102
-```
-
-vertically concatenate them so that only columns that are present in both
-data frames are kept. Check the documentation of `vcat` to see how to
-do it.
-
-### Exercise 8
-
-Now append to `df1` table `df2`, but add only the columns from `df2` that
-are present in `df1`. Check the documentation of `append!` to see how to
-do it.
-
-### Exercise 9
-
-Create a `circle` data frame, using the `push!` function that will store
-1000 samples of the following process:
-* draw `x` and `y` uniformly and independently from the [-1,1[ interval;
-* compute a binary variable `inside` that is `true` if `x^2+y^2 < 1`
-  and is `false` otherwise.
-
-Compute summary statistics of this data frame.
-
-### Exercise 10
-
-Create a scatterplot of `circle` data frame where its `x` and `y` axis
-will be the plotted points and `inside` variable will determine the color
-of the plotted point.
-
-# Solutions
-
 <details>
-
-<summary>Show!</summary>
-
-### Exercise 1
-
-Solution:
+<summary>Solution</summary>

 ```
 julia> using DataFrames
@ -120,9 +39,16 @@ julia> DataFrame(mat, :auto)
   5 │ 0.714515  0.861872  0.971521   0.176768
 ```

+</details>
+
 ### Exercise 2

-Solution:
+Now, using matrix `mat` create a data frame with randomly generated
+column names. Use the `randstring` function from the `Random` module
+to generate them. Store this data frame in `df` variable.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> using Random
@ -139,10 +65,16 @@ julia> df = DataFrame(mat, [randstring() for _ in 1:4])
   5 │ 0.714515  0.861872  0.971521   0.176768
 ```

+</details>

 ### Exercise 3

-Solution:
+Create a new data frame, taking `df` as a source that will have the same
+columns but its column names will be `y1`, `y2`, `y3`, `y4`.
+
+<details>
+<summary>Solution</summary>
+
 ```
 julia> DataFrame(["y$i" => df[!, i] for i in 1:4])
 5×4 DataFrame
@ -170,9 +102,15 @@ julia> rename(df, string.("y", 1:4))
   5 │ 0.714515  0.861872  0.971521   0.176768
 ```

+</details>
+
 ### Exercise 4

-Solution:
+Create a dictionary holding `column_name => column_vector` pairs
+using data stored in data frame `df`. Save this dictionary in variable `d`.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> d = Dict([n => df[:, n] for n in names(df)])
@ -194,9 +132,15 @@ Dict{Symbol, AbstractVector} with 4 entries:
  Symbol("5Caz55k0") => [0.0353994, 0.0691152, 0.980079, 0.0697535, 0.971521]
 ```

+</details>
+
 ### Exercise 5

-Solution:
+Create a data frame back from dictionary `d` from exercise 4. Compare it
+with `df`.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> DataFrame(d)
@ -215,9 +159,15 @@ Note that columns of a data frame are now sorted by their names.
 This is done for `Dict` objects because such dictionaries do not have
 a defined order of keys.

+</details>
+
 ### Exercise 6

-Solution:
+For data frame `df` compute the dot product between all pairs of its columns.
+Use the `dot` function from the `LinearAlgebra` module.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> using LinearAlgebra
@ -232,9 +182,36 @@ julia> pairwise(dot, eachcol(df))
 1.50558  1.18411  0.909744  1.47431
 ```

+</details>
+
 ### Exercise 7

-Solution:
+Given two data frames:
+
+```
+julia> df1 = DataFrame(a=1:2, b=11:12)
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1     11
+   2 │     2     12
+
+julia> df2 = DataFrame(a=1:2, c=101:102)
+2×2 DataFrame
+ Row │ a      c
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1    101
+   2 │     2    102
+```
+
+vertically concatenate them so that only columns that are present in both
+data frames are kept. Check the documentation of `vcat` to see how to
+do it.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> vcat(df1, df2, cols=:intersect)
@ -255,9 +232,16 @@ julia> vcat(df1, df2)
 ERROR: ArgumentError: column(s) c are missing from argument(s) 1, and column(s) b are missing from argument(s) 2
 ```

+</details>
+
 ### Exercise 8

-Solution:
+Now append to `df1` table `df2`, but add only the columns from `df2` that
+are present in `df1`. Check the documentation of `append!` to see how to
+do it.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> append!(df1, df2, cols=:subset)
@ -271,9 +255,20 @@ julia> append!(df1, df2, cols=:subset)
   4 │     2  missing
 ```

+</details>
+
 ### Exercise 9

-Solution
+Create a `circle` data frame, using the `push!` function that will store
+1000 samples of the following process:
+* draw `x` and `y` uniformly and independently from the [-1,1[ interval;
+* compute a binary variable `inside` that is `true` if `x^2+y^2 < 1`
+  and is `false` otherwise.
+
+Compute summary statistics of this data frame.
+
+<details>
+<summary>Solution</summary>

 ```
 circle=DataFrame()
@ -287,9 +282,16 @@ describe(circle)

 We note that mean of variable `inside` is approximately π.

+</details>
+
 ### Exercise 10

-Solution:
+Create a scatterplot of `circle` data frame where its `x` and `y` axis
+will be the plotted points and `inside` variable will determine the color
+of the plotted point.
+
+<details>
+<summary>Solution</summary>

 ```
 using Plots
--- a/exercises/exercises11.md
+++ b/exercises/exercises11.md
@ -13,83 +13,8 @@ sampled from uniform distribution on [0, 1[ interval.
 Serialize it to disk, and next deserialize. Check if the deserialized
 object is the same as the source data frame.

-### Exercise 2
-
-Add a column `n` to the `df` data frame that in each row will hold the
-number of observations in column `x` that have distance less than `0.1` to
-a value stored in a given row of `x`.
-
-### Exercise 3
-
-Investigate visually how does `n` depend on `x` in data frame `df`.
-
-### Exercise 4
-
-Someone has prepared the following test data for you:
-```
-teststr = """
-"x","sinx"
-0.139279,0.138829
-0.456779,0.441059
-0.344034,0.337287
-0.140253,0.139794
-0.848344,0.750186
-0.977512,0.829109
-0.032737,0.032731
-0.702750,0.646318
-0.422339,0.409895
-0.393878,0.383772
-"""
-```
-
-Load this data into `testdf` data frame.
-
-### Exercise 5
-
-Check the accuracy of computations of sinus of `x` in `testdf`.
-Print all rows for which the absolute difference is greater than `5e-7`.
-In this case display `x`, `sinx`, the exact value of `sin(x)` and the absolute
-difference.
-
-### Exercise 6
-
-Group data in data frame `df` into buckets of 0.1 width and store the result in
-`gdf` data frame (sort the groups). Use the `cut` function from
-CategoricalArrays.jl to do it (check its documentation to learn how to do it).
-Check the number of values in each group.
-
-### Exercise 7
-
-Display the grouping keys in `gdf` grouped data frame. Show them as named tuples.
-Check what would be the group order if you asked not to sort them.
-
-### Exercise 8
-
-Compute average `n` for each group in `gdf`.
-
-### Exercise 9
-
-Fit a linear model explaining `n` by `x` separately for each group in `gdf`.
-Use the `\` operator to fit it (recall it from chapter 4).
-For each group produce the result as named tuple having fields `α₀` and `αₓ`.
-
-### Exercise 10
-
-Repeat exercise 9 but using the GLM.jl package. This time
-extract the p-value for the slope of estimated coefficient for `x` variable.
-Use the `coeftable` function from GLM.jl to get this information.
-Check the documentation of this function to learn how to do it (it will be
-easiest for you to first convert its result to a `DataFrame`).
-
-# Solutions
-
 <details>
-
-<summary>Show!</summary>
-
-### Exercise 1
-
-Solution:
+<summary>Solution</summary>

 ```
 julia> using DataFrames
@ -104,9 +29,16 @@ julia> deserialize("df.bin") == df
 true
 ```

+</details>
+
 ### Exercise 2

-Solution:
+Add a column `n` to the `df` data frame that in each row will hold the
+number of observations in column `x` that have distance less than `0.1` to
+a value stored in a given row of `x`.
+
+<details>
+<summary>Solution</summary>

 A simple approach is:
 ```
@ -151,9 +83,14 @@ df.n = f2(df.x)
 In this solution the fact that we used function barrier is even more relevant
 as we explicitly use loops inside.

+</details>
+
 ### Exercise 3

-Solution:
+Investigate visually how does `n` depend on `x` in data frame `df`.
+
+<details>
+<summary>Solution</summary>

 ```
 using Plots
@ -162,9 +99,31 @@ scatter(df.x, df.n, xlabel="x", ylabel="neighbors", legend=false)

 As expected on the border of the domain number of neighbors drops.

+</details>
+
 ### Exercise 4

-Solution:
+Someone has prepared the following test data for you:
+```
+teststr = """
+"x","sinx"
+0.139279,0.138829
+0.456779,0.441059
+0.344034,0.337287
+0.140253,0.139794
+0.848344,0.750186
+0.977512,0.829109
+0.032737,0.032731
+0.702750,0.646318
+0.422339,0.409895
+0.393878,0.383772
+"""
+```
+
+Load this data into `testdf` data frame.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> using CSV
@ -188,8 +147,18 @@ julia> testdf = CSV.read(IOBuffer(teststr), DataFrame)
  10 │ 0.393878  0.383772
 ```

+</details>
+
 ### Exercise 5

+Check the accuracy of computations of sinus of `x` in `testdf`.
+Print all rows for which the absolute difference is greater than `5e-7`.
+In this case display `x`, `sinx`, the exact value of `sin(x)` and the absolute
+difference.
+
+<details>
+<summary>Solution</summary>
+
 Since data frame is small we can use `eachrow`:

 ```
@ -202,9 +171,18 @@ julia> for row in eachrow(testdf)
 (x = 0.70275, computed = 0.6463185646550751, data = 0.646318, dev = 5.646550751414736e-7)
 ```

+</details>
+
 ### Exercise 6

-Solution:
+Group data in data frame `df` into buckets of 0.1 width and store the result in
+`gdf` data frame (sort the groups). Use the `cut` function from
+CategoricalArrays.jl to do it (check its documentation to learn how to do it).
+Check the number of values in each group.
+
+<details>
+<summary>Solution</summary>
+
 ```
 julia> using CategoricalArrays

@ -244,9 +222,15 @@ julia> combine(gdf, nrow) # alternative way to do it

 You might get a bit different numbers but all should be around 10,000.

+</details>
+
 ### Exercise 7

-Solution:
+Display the grouping keys in `gdf` grouped data frame. Show them as named tuples.
+Check what would be the group order if you asked not to sort them.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> NamedTuple.(keys(gdf))
@ -282,9 +266,14 @@ the resulting group order could depend on the type of grouping column, so if
 you want to depend on the order of groups always spass `sort` keyword argument
 explicitly.

+</details>
+
 ### Exercise 8

-Solution:
+Compute average `n` for each group in `gdf`.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> using Statistics
@ -319,9 +308,16 @@ julia> combine(gdf, :n => mean) # alternative way to do it
  10 │ [0.9, 1.0)  14944.5
 ```

+</details>
+
 ### Exercise 9

-Solution:
+Fit a linear model explaining `n` by `x` separately for each group in `gdf`.
+Use the `\` operator to fit it (recall it from chapter 4).
+For each group produce the result as named tuple having fields `α₀` and `αₓ`.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> function fitmodel(x, n)
@ -364,9 +360,18 @@ julia> combine(gdf, [:x, :n] => fitmodel => AsTable) # alternative syntax that y
 We note that indeed in the first and last group the regression has a significant
 slope.

+</details>
+
 ### Exercise 10

-Solution:
+Repeat exercise 9 but using the GLM.jl package. This time
+extract the p-value for the slope of estimated coefficient for `x` variable.
+Use the `coeftable` function from GLM.jl to get this information.
+Check the documentation of this function to learn how to do it (it will be
+easiest for you to first convert its result to a `DataFrame`).
+
+<details>
+<summary>Solution</summary>

 ```
 julia> using GLM
--- a/exercises/exercises12.md
+++ b/exercises/exercises12.md
@ -14,86 +14,8 @@ is `sha = "2ce930d70a931de660fdaf271d70192793b1b240272645bf0275779f6704df6b"`.
 Download this file and check if it indeed has this checksum.
 You might need to read documentation of `string` and `join` functions.

-### Exercise 2
-
-Download the file http://snap.stanford.edu/data/deezer_ego_nets.zip
-that contains the ego-nets of Eastern European users collected from the music
-streaming service Deezer in February 2020. Nodes are users and edges are mutual
-follower relationships.
-
-From the file extract deezer_edges.json and deezer_target.csv files and
-save them to disk.
-
-### Exercise 3
-
-Load deezer_edges.json and deezer_target.csv files to Julia.
-The JSON file should be loaded as JSON3.jl object `edges_json`.
-The CSV file should be loaded into a data frame `target_df`.
-
-### Exercise 4
-
-Check that keys in the `edges_json` are in the same order as `id` column
-in `target_df`.
-
-### Exercise 5
-
-From every value stored in `edges_json` create a graph representing
-ego-net of the given node. Store these graphs in a vector that will make the
-`egonet` column of in the `target_df` data frame.
-
-### Exercise 6
-
-Ego-net in our data set is a subgraph of a full Deezer graph where for some
-node all its neighbors are included, but also it contains all edges between the
-neighbors.
-Therefore we expect that diameter of every ego-net is at most 2 (as every
-two nodes are either connected directly or by a common friend).
-Check if this is indeed the case. Use the `diameter` function.
-
-### Exercise 7
-
-For each ego-net find a central node that is connected to every other node
-in this network. Use the `degree` and `findall` functions to achieve this.
-Add `center` column with numbers of nodes that are connected to all other
-nodes in the ego-net to `target_df` data frame.
-
-Next add a column `center_len` that gives the number of such nodes.
-
-Check how many times different numbers of center nodes are found.
-
-### Exercise 8
-
-Add the following ego-net features to the `target_df` data frame:
-* `size`: number of nodes in ego-net
-* `mean_degree`: average node degree in ego-net
-
-Check mean values of these two columns by `target` column.
-
-### Exercise 9
-
-Continuing to work with `target_df` data frame create a logistic regression
-explaining `target` by `size` and `mean_degree`.
-
-### Exercise 10
-
-Continuing to work with `target_df` create a scatterplot where `size` will be on
-one axis and `mean_degree` rounded to nearest integer on the other axis.
-Plot the mean of `target` for each point being a combination of `size` and
-rounded `mean_degree`.
-
-Additionally fit a LOESS model explaining `target` by `size`. Make a prediction
-for values in range from 5% to 95% quantile (to concentrate on typical values
-of size).
-
-# Solutions
-
 <details>
-
-<summary>Show!</summary>
-
-### Exercise 1
-
-Solution:
+<summary>Solution</summary>

 ```
 using Downloads
@ -106,9 +28,20 @@ sha == shastr

 The last line should produce `true`.

+</details>
+
 ### Exercise 2

-Solution:
+Download the file http://snap.stanford.edu/data/deezer_ego_nets.zip
+that contains the ego-nets of Eastern European users collected from the music
+streaming service Deezer in February 2020. Nodes are users and edges are mutual
+follower relationships.
+
+From the file extract deezer_edges.json and deezer_target.csv files and
+save them to disk.
+
+<details>
+<summary>Solution</summary>

 ```
 Downloads.download("http://snap.stanford.edu/data/deezer_ego_nets.zip", "ego.zip")
@ -125,9 +58,16 @@ end
 close(archive)
 ```

+</details>
+
 ### Exercise 3

-Solution:
+Load deezer_edges.json and deezer_target.csv files to Julia.
+The JSON file should be loaded as JSON3.jl object `edges_json`.
+The CSV file should be loaded into a data frame `target_df`.
+
+<details>
+<summary>Solution</summary>

 ```
 using CSV
@ -137,17 +77,32 @@ edges_json = JSON3.read(read("deezer_edges.json"))
 target_df = CSV.read("deezer_target.csv", DataFrame)
 ```

+</details>
+
 ### Exercise 4

-Solution (short, but you need to have a good understanding of Julia types
-and standar functions to properly write it):
+Check that keys in the `edges_json` are in the same order as `id` column
+in `target_df`.
+
+<details>
+<summary>Solution</summary>
+
+This is short, but you need to have a good understanding of Julia types
+and standar functions to properly write it:
 ```
 Symbol.(target_df.id) == keys(edges_json)
 ```

+</details>
+
 ### Exercise 5

-Solution:
+From every value stored in `edges_json` create a graph representing
+ego-net of the given node. Store these graphs in a vector that will make the
+`egonet` column of in the `target_df` data frame.
+
+<details>
+<summary>Solution</summary>

 ```
 using Graphs
@ -163,9 +118,19 @@ end
 target_df.egonet = edgelist2graph(values(edges_json))
 ```

+</details>
+
 ### Exercise 6

-Solution:
+Ego-net in our data set is a subgraph of a full Deezer graph where for some
+node all its neighbors are included, but also it contains all edges between the
+neighbors.
+Therefore we expect that diameter of every ego-net is at most 2 (as every
+two nodes are either connected directly or by a common friend).
+Check if this is indeed the case. Use the `diameter` function.
+
+<details>
+<summary>Solution</summary>

 ```
 julia> extrema(diameter.(target_df.egonet))
@ -174,9 +139,21 @@ julia> extrema(diameter.(target_df.egonet))

 Indeed we see that for each ego-net diameter is 2.

+</details>
+
 ### Exercise 7

-Solution:
+For each ego-net find a central node that is connected to every other node
+in this network. Use the `degree` and `findall` functions to achieve this.
+Add `center` column with numbers of nodes that are connected to all other
+nodes in the ego-net to `target_df` data frame.
+
+Next add a column `center_len` that gives the number of such nodes.
+
+Check how many times different numbers of center nodes are found.
+
+<details>
+<summary>Solution</summary>

 ```
 target_df.center = map(target_df.egonet) do g
@ -192,9 +169,18 @@ the condition we want to check.
 We notice that in some cases it is impossible to identify the center of the
 ego-net uniquely.

+</details>
+
 ### Exercise 8

-Solution:
+Add the following ego-net features to the `target_df` data frame:
+* `size`: number of nodes in ego-net
+* `mean_degree`: average node degree in ego-net
+
+Check mean values of these two columns by `target` column.
+
+<details>
+<summary>Solution</summary>

 ```
 using Statistics
@ -206,9 +192,15 @@ combine(groupby(target_df, :target, sort=true), [:size, :mean_degree] .=> mean)
 It seems that for target equal to `0` size and average degree in the network are
 a bit larger.

+</details>
+
 ### Exercise 9

-Solution:
+Continuing to work with `target_df` data frame create a logistic regression
+explaining `target` by `size` and `mean_degree`.
+
+<details>
+<summary>Solution</summary>

 ```
 using GLM
@ -217,9 +209,21 @@ glm(@formula(target~size+mean_degree), target_df, Binomial(), LogitLink())

 We see that only `size` is statistically significant.

+</details>
+
 ### Exercise 10

-Solution:
+Continuing to work with `target_df` create a scatterplot where `size` will be on
+one axis and `mean_degree` rounded to nearest integer on the other axis.
+Plot the mean of `target` for each point being a combination of `size` and
+rounded `mean_degree`.
+
+Additionally fit a LOESS model explaining `target` by `size`. Make a prediction
+for values in range from 5% to 95% quantile (to concentrate on typical values
+of size).
+
+<details>
+<summary>Solution</summary>

 ```
 using Plots
@ -242,6 +246,6 @@ plot(size_predict, target_predict;
     xlabel="size", ylabel="predicted target", legend=false)
 ```

-Between quantiles 5% and 95% we see a downward shaped relationship.
+Between quantiles 5% and 95% of `size` we see a downward shaped relationship.

 </details>
--- a/exercises/exercises13.md
+++ b/exercises/exercises13.md
@ -13,12 +13,47 @@ https://archive.ics.uci.edu/ml/machine-learning-databases/00615/MushroomDataset.
 archive and extract primary_data.csv and secondary_data.csv files from it.
 Save the files to disk.

+<details>
+<summary>Solution</summary>
+
+```
+using Downloads
+import ZipFile
+Downloads.download("https://archive.ics.uci.edu/ml/machine-learning-databases/00615/MushroomDataset.zip", "MushroomDataset.zip")
+archive = ZipFile.Reader("MushroomDataset.zip")
+idx = only(findall(x -> contains(x.name, "primary_data.csv"), archive.files))
+open("primary_data.csv", "w") do io
+    write(io, read(archive.files[idx]))
+end
+idx = only(findall(x -> contains(x.name, "secondary_data.csv"), archive.files))
+open("secondary_data.csv", "w") do io
+    write(io, read(archive.files[idx]))
+end
+close(archive)
+```
+
+</details>
+
 ### Exercise 2

 Load primary_data.csv into the `primary` data frame.
 Load secondary_data.csv into the `secondary` data frame.
 Describe the contents of both data frames.

+<details>
+<summary>Solution</summary>
+
+```
+using CSV
+using DataFrames
+primary = CSV.read("primary_data.csv", DataFrame; delim=';')
+secondary = CSV.read("secondary_data.csv", DataFrame; delim=';')
+describe(primary)
+describe(secondary)
+```
+
+</details>
+
 ### Exercise 3

 Start with `primary` data. Note that columns starting from column 4 have
@ -32,6 +67,25 @@ three columns just after `class` column in the `parsed_primary` data frame.
 Check `renamecols` keyword argument of `select` to
 avoid renaming of the produced columns.

+<details>
+<summary>Solution</summary>
+
+```
+parse_nominal(s::AbstractString) = split(strip(s, ['[', ']']), ", ")
+parse_nominal(::Missing) = missing
+parse_numeric(s::AbstractString) = parse.(Float64, split(strip(s, ['[', ']']), ", "))
+parse_numeric(::Missing) = missing
+idcols = ["family", "name", "class"]
+numericcols = ["cap-diameter", "stem-height", "stem-width"]
+parsed_primary = select(primary,
+                        idcols,
+                        numericcols .=> ByRow(parse_numeric),
+                        Not([idcols; numericcols]) .=> ByRow(parse_nominal);
+                        renamecols=false)
+```
+
+</details>
+
 ### Exercise 4

 In `parsed_primary` data frame find all pairs of mushrooms (rows) that might be
@ -49,119 +103,8 @@ Use the following rules:

 For each found pair print to the screen the row number, family, name, and class.

-### Exercise 5
-
-Still using `parsed_primary` find what is the average probability of class being
-`p` by `family`. Additionally add number of observations in each group. Sort
-these results by the probability. Try using DataFramesMeta.jl to do this
-exercise (this requirement is optional).
-
-Store the result in `agg_primary` data frame.
-
-### Exercise 6
-
-Now using `agg_primary` data frame collapse it so that for each unique `pr_p`
-it gives us a total number of rows that had this probability and a tuple
-of mushroom family names.
-
-Optionally: try to display the produced table so that the tuple containing the
-list of families for each group is not cropped (this will require large
-terminal).
-
-### Exercise 7
-
-From our preliminary analysis of `primary` data we see that `missing` value in
-the primary data is non-informative, so in `secondary` data we should be
-cautious when building a model if we allowed for missing data (in practice
-if we were investigating some real mushroom we most likely would know its
-characteristics).
-
-Therefore as a first step drop in-place all columns in `secondary` data frame
-that have missing values.
-
-### Exercise 8
-
-Create a logistic regression predicting `class` based on all remaining features
-in the data frame. You might need to check the `Term` usage in StatsModels.jl
-documentation.
-
-You will notice that for `stem-color` and `habitat` columns you get strange
-estimation results (large absolute values of estimated parameters and even
-larger standard errors). Explain why this happens by analyzing frequency tables
-of these variables against `class` column.
-
-### Exercise 9
-
-Add `class_p` column to `secondary` as a second column that will contain
-predicted probability from the model created in exercise 8 of a given
-observation having class `p`.
-
-Print descriptive statistics of column `class_p` by `class`.
-
-### Exercise 10
-
-Plot FPR-TPR ROC curve for our model and compute associated AUC value.
-
-# Solutions
-
 <details>
-
-<summary>Show!</summary>
-
-### Exercise 1
-
-Solution:
-
-```
-using Downloads
-import ZipFile
-Downloads.download("https://archive.ics.uci.edu/ml/machine-learning-databases/00615/MushroomDataset.zip", "MushroomDataset.zip")
-archive = ZipFile.Reader("MushroomDataset.zip")
-idx = only(findall(x -> contains(x.name, "primary_data.csv"), archive.files))
-open("primary_data.csv", "w") do io
-    write(io, read(archive.files[idx]))
-end
-idx = only(findall(x -> contains(x.name, "secondary_data.csv"), archive.files))
-open("secondary_data.csv", "w") do io
-    write(io, read(archive.files[idx]))
-end
-close(archive)
-```
-
-### Exercise 2
-
-Solution:
-
-```
-using CSV
-using DataFrames
-primary = CSV.read("primary_data.csv", DataFrame; delim=';')
-secondary = CSV.read("secondary_data.csv", DataFrame; delim=';')
-describe(primary)
-describe(secondary)
-```
-
-### Exercise 3
-
-Solution:
-
-```
-parse_nominal(s::AbstractString) = split(strip(s, ['[', ']']), ", ")
-parse_nominal(::Missing) = missing
-parse_numeric(s::AbstractString) = parse.(Float64, split(strip(s, ['[', ']']), ", "))
-parse_numeric(::Missing) = missing
-idcols = ["family", "name", "class"]
-numericcols = ["cap-diameter", "stem-height", "stem-width"]
-parsed_primary = select(primary,
-                        idcols,
-                        numericcols .=> ByRow(parse_numeric),
-                        Not([idcols; numericcols]) .=> ByRow(parse_nominal);
-                        renamecols=false)
-```
-
-### Exercise 4
-
-Solution:
+<summary>Solution</summary>

 ```
 function overlap_numeric(v1, v2)
@ -200,9 +143,19 @@ end
 Note that in this exercise using `eachrow` is not a problem
 (although it is not type stable) because the data is small.

+</details>
+
 ### Exercise 5

-Solution:
+Still using `parsed_primary` find what is the average probability of class being
+`p` by `family`. Additionally add number of observations in each group. Sort
+these results by the probability. Try using DataFramesMeta.jl to do this
+exercise (this requirement is optional).
+
+Store the result in `agg_primary` data frame.
+
+<details>
+<summary>Solution</summary>

 ```
 using Statistics
@ -214,17 +167,40 @@ agg_primary = @chain parsed_primary begin
 end
 ```

+</details>
+
 ### Exercise 6

-Solution:
+Now using `agg_primary` data frame collapse it so that for each unique `pr_p`
+it gives us a total number of rows that had this probability and a tuple
+of mushroom family names.
+
+Optionally: try to display the produced table so that the tuple containing the
+list of families for each group is not cropped (this will require large
+terminal).
+
+<details>
+<summary>Solution</summary>

 ```
-show(combine(groupby(agg_primary, :pr_p), :nrow => sum => :nrow, :family => Tuple => :families), truncate=140)
+show(combine(groupby(agg_primary, :pr_p), :nrow => sum => :nrow, :family => Tuple => :families); truncate=140)
 ```

+</details>
+
 ### Exercise 7

-Solution:
+From our preliminary analysis of `primary` data we see that `missing` value in
+the primary data is non-informative, so in `secondary` data we should be
+cautious when building a model if we allowed for missing data (in practice
+if we were investigating some real mushroom we most likely would know its
+characteristics).
+
+Therefore as a first step drop in-place all columns in `secondary` data frame
+that have missing values.
+
+<details>
+<summary>Solution</summary>

 ```
 select!(secondary, [!any(ismissing, col) for col in eachcol(secondary)])
@ -233,9 +209,21 @@ select!(secondary, [!any(ismissing, col) for col in eachcol(secondary)])
 Note that we select based on actual contents of the columns and not by their
 element type (column could allow for missing values but not have them).

+</details>
+
 ### Exercise 8

-Solution:
+Create a logistic regression predicting `class` based on all remaining features
+in the data frame. You might need to check the `Term` usage in StatsModels.jl
+documentation.
+
+You will notice that for `stem-color` and `habitat` columns you get strange
+estimation results (large absolute values of estimated parameters and even
+larger standard errors). Explain why this happens by analyzing frequency tables
+of these variables against `class` column.
+
+<details>
+<summary>Solution</summary>

 ```
 using GLM
@ -247,12 +235,21 @@ freqtable(secondary, "stem-color", "class")
 freqtable(secondary, "habitat", "class")
 ```

-We can see that for cetrain levels of `stem-color` and `habitat` variables
+We can see that for certain levels of `stem-color` and `habitat` variables
 there is a perfect separation of classes.

+</details>
+
 ### Exercise 9

-Solution:
+Add `class_p` column to `secondary` as a second column that will contain
+predicted probability from the model created in exercise 8 of a given
+observation having class `p`.
+
+Print descriptive statistics of column `class_p` by `class`.
+
+<details>
+<summary>Solution</summary>

 ```
 insertcols!(secondary, 2, :class_p => predict(model))
@ -264,9 +261,14 @@ end
 We can see that the model has some discriminatory power, but there
 is still a significant overlap between classes.

+</details>
+
 ### Exercise 10

-Solution:
+Plot FPR-TPR ROC curve for our model and compute associated AUC value.
+
+<details>
+<summary>Solution</summary>

 ```
 using Plots