update layout of all exercises
This commit is contained in:
@@ -13,89 +13,8 @@ independently and uniformly from the [0,1[ interval.
|
||||
Create a data frame using data from this matrix using auto-generated
|
||||
column names.
|
||||
|
||||
### Exercise 2
|
||||
|
||||
Now, using matrix `mat` create a data frame with randomly generated
|
||||
column names. Use the `randstring` function from the `Random` module
|
||||
to generate them. Store this data frame in `df` variable.
|
||||
|
||||
### Exercise 3
|
||||
|
||||
Create a new data frame, taking `df` as a source that will have the same
|
||||
columns but its column names will be `y1`, `y2`, `y3`, `y4`.
|
||||
|
||||
### Exercise 4
|
||||
|
||||
Create a dictionary holding `column_name => column_vector` pairs
|
||||
using data stored in data frame `df`. Save this dictionary in variable `d`.
|
||||
|
||||
### Exercise 5
|
||||
|
||||
Create a data frame back from dictionary `d` from exercise 4. Compare it
|
||||
with `df`.
|
||||
|
||||
### Exercise 6
|
||||
|
||||
For data frame `df` compute the dot product between all pairs of its columns.
|
||||
Use the `dot` function from the `LinearAlgebra` module.
|
||||
|
||||
### Exercise 7
|
||||
|
||||
Given two data frames:
|
||||
|
||||
```
|
||||
julia> df1 = DataFrame(a=1:2, b=11:12)
|
||||
2×2 DataFrame
|
||||
Row │ a b
|
||||
│ Int64 Int64
|
||||
─────┼──────────────
|
||||
1 │ 1 11
|
||||
2 │ 2 12
|
||||
|
||||
julia> df2 = DataFrame(a=1:2, c=101:102)
|
||||
2×2 DataFrame
|
||||
Row │ a c
|
||||
│ Int64 Int64
|
||||
─────┼──────────────
|
||||
1 │ 1 101
|
||||
2 │ 2 102
|
||||
```
|
||||
|
||||
vertically concatenate them so that only columns that are present in both
|
||||
data frames are kept. Check the documentation of `vcat` to see how to
|
||||
do it.
|
||||
|
||||
### Exercise 8
|
||||
|
||||
Now append to `df1` table `df2`, but add only the columns from `df2` that
|
||||
are present in `df1`. Check the documentation of `append!` to see how to
|
||||
do it.
|
||||
|
||||
### Exercise 9
|
||||
|
||||
Create a `circle` data frame, using the `push!` function that will store
|
||||
1000 samples of the following process:
|
||||
* draw `x` and `y` uniformly and independently from the [-1,1[ interval;
|
||||
* compute a binary variable `inside` that is `true` if `x^2+y^2 < 1`
|
||||
and is `false` otherwise.
|
||||
|
||||
Compute summary statistics of this data frame.
|
||||
|
||||
### Exercise 10
|
||||
|
||||
Create a scatterplot of `circle` data frame where its `x` and `y` axis
|
||||
will be the plotted points and `inside` variable will determine the color
|
||||
of the plotted point.
|
||||
|
||||
# Solutions
|
||||
|
||||
<details>
|
||||
|
||||
<summary>Show!</summary>
|
||||
|
||||
### Exercise 1
|
||||
|
||||
Solution:
|
||||
<summary>Solution</summary>
|
||||
|
||||
```
|
||||
julia> using DataFrames
|
||||
@@ -120,9 +39,16 @@ julia> DataFrame(mat, :auto)
|
||||
5 │ 0.714515 0.861872 0.971521 0.176768
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Exercise 2
|
||||
|
||||
Solution:
|
||||
Now, using matrix `mat` create a data frame with randomly generated
|
||||
column names. Use the `randstring` function from the `Random` module
|
||||
to generate them. Store this data frame in `df` variable.
|
||||
|
||||
<details>
|
||||
<summary>Solution</summary>
|
||||
|
||||
```
|
||||
julia> using Random
|
||||
@@ -139,10 +65,16 @@ julia> df = DataFrame(mat, [randstring() for _ in 1:4])
|
||||
5 │ 0.714515 0.861872 0.971521 0.176768
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Exercise 3
|
||||
|
||||
Solution:
|
||||
Create a new data frame, taking `df` as a source that will have the same
|
||||
columns but its column names will be `y1`, `y2`, `y3`, `y4`.
|
||||
|
||||
<details>
|
||||
<summary>Solution</summary>
|
||||
|
||||
```
|
||||
julia> DataFrame(["y$i" => df[!, i] for i in 1:4])
|
||||
5×4 DataFrame
|
||||
@@ -170,9 +102,15 @@ julia> rename(df, string.("y", 1:4))
|
||||
5 │ 0.714515 0.861872 0.971521 0.176768
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Exercise 4
|
||||
|
||||
Solution:
|
||||
Create a dictionary holding `column_name => column_vector` pairs
|
||||
using data stored in data frame `df`. Save this dictionary in variable `d`.
|
||||
|
||||
<details>
|
||||
<summary>Solution</summary>
|
||||
|
||||
```
|
||||
julia> d = Dict([n => df[:, n] for n in names(df)])
|
||||
@@ -194,9 +132,15 @@ Dict{Symbol, AbstractVector} with 4 entries:
|
||||
Symbol("5Caz55k0") => [0.0353994, 0.0691152, 0.980079, 0.0697535, 0.971521]
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Exercise 5
|
||||
|
||||
Solution:
|
||||
Create a data frame back from dictionary `d` from exercise 4. Compare it
|
||||
with `df`.
|
||||
|
||||
<details>
|
||||
<summary>Solution</summary>
|
||||
|
||||
```
|
||||
julia> DataFrame(d)
|
||||
@@ -215,9 +159,15 @@ Note that columns of a data frame are now sorted by their names.
|
||||
This is done for `Dict` objects because such dictionaries do not have
|
||||
a defined order of keys.
|
||||
|
||||
</details>
|
||||
|
||||
### Exercise 6
|
||||
|
||||
Solution:
|
||||
For data frame `df` compute the dot product between all pairs of its columns.
|
||||
Use the `dot` function from the `LinearAlgebra` module.
|
||||
|
||||
<details>
|
||||
<summary>Solution</summary>
|
||||
|
||||
```
|
||||
julia> using LinearAlgebra
|
||||
@@ -232,9 +182,36 @@ julia> pairwise(dot, eachcol(df))
|
||||
1.50558 1.18411 0.909744 1.47431
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Exercise 7
|
||||
|
||||
Solution:
|
||||
Given two data frames:
|
||||
|
||||
```
|
||||
julia> df1 = DataFrame(a=1:2, b=11:12)
|
||||
2×2 DataFrame
|
||||
Row │ a b
|
||||
│ Int64 Int64
|
||||
─────┼──────────────
|
||||
1 │ 1 11
|
||||
2 │ 2 12
|
||||
|
||||
julia> df2 = DataFrame(a=1:2, c=101:102)
|
||||
2×2 DataFrame
|
||||
Row │ a c
|
||||
│ Int64 Int64
|
||||
─────┼──────────────
|
||||
1 │ 1 101
|
||||
2 │ 2 102
|
||||
```
|
||||
|
||||
vertically concatenate them so that only columns that are present in both
|
||||
data frames are kept. Check the documentation of `vcat` to see how to
|
||||
do it.
|
||||
|
||||
<details>
|
||||
<summary>Solution</summary>
|
||||
|
||||
```
|
||||
julia> vcat(df1, df2, cols=:intersect)
|
||||
@@ -255,9 +232,16 @@ julia> vcat(df1, df2)
|
||||
ERROR: ArgumentError: column(s) c are missing from argument(s) 1, and column(s) b are missing from argument(s) 2
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Exercise 8
|
||||
|
||||
Solution:
|
||||
Now append to `df1` table `df2`, but add only the columns from `df2` that
|
||||
are present in `df1`. Check the documentation of `append!` to see how to
|
||||
do it.
|
||||
|
||||
<details>
|
||||
<summary>Solution</summary>
|
||||
|
||||
```
|
||||
julia> append!(df1, df2, cols=:subset)
|
||||
@@ -271,9 +255,20 @@ julia> append!(df1, df2, cols=:subset)
|
||||
4 │ 2 missing
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Exercise 9
|
||||
|
||||
Solution
|
||||
Create a `circle` data frame, using the `push!` function that will store
|
||||
1000 samples of the following process:
|
||||
* draw `x` and `y` uniformly and independently from the [-1,1[ interval;
|
||||
* compute a binary variable `inside` that is `true` if `x^2+y^2 < 1`
|
||||
and is `false` otherwise.
|
||||
|
||||
Compute summary statistics of this data frame.
|
||||
|
||||
<details>
|
||||
<summary>Solution</summary>
|
||||
|
||||
```
|
||||
circle=DataFrame()
|
||||
@@ -287,9 +282,16 @@ describe(circle)
|
||||
|
||||
We note that mean of variable `inside` is approximately π.
|
||||
|
||||
</details>
|
||||
|
||||
### Exercise 10
|
||||
|
||||
Solution:
|
||||
Create a scatterplot of `circle` data frame where its `x` and `y` axis
|
||||
will be the plotted points and `inside` variable will determine the color
|
||||
of the plotted point.
|
||||
|
||||
<details>
|
||||
<summary>Solution</summary>
|
||||
|
||||
```
|
||||
using Plots
|
||||
|
||||
Reference in New Issue
Block a user