2022-10-14 12:27:04 +02:00
|
|
|
|
# Julia for Data Analysis
|
|
|
|
|
|
|
|
|
|
## Bogumił Kamiński, Daniel Kaszyński
|
|
|
|
|
|
|
|
|
|
# Chapter 7
|
|
|
|
|
|
|
|
|
|
# Problems
|
|
|
|
|
|
|
|
|
|
### Exercise 1
|
|
|
|
|
|
|
|
|
|
Random.org provides a service that returns random numbers. One of the ways
|
|
|
|
|
how you can use it is by sending HTTP GET reguests. Here is an example request:
|
|
|
|
|
|
|
|
|
|
> https://www.random.org/integers/?num=10&min=1&max=6&col=1&base=10&format=plain&rnd=new
|
|
|
|
|
|
|
|
|
|
If you want to understand all the parameters plese check their meaning
|
|
|
|
|
[here](https://www.random.org/clients/http/).
|
|
|
|
|
|
|
|
|
|
For us it is enough that this request generates 10 random integers in the range
|
|
|
|
|
from 1 to 6. Run this query in Julia and parse the result.
|
|
|
|
|
|
|
|
|
|
<details>
|
2022-10-14 13:43:12 +02:00
|
|
|
|
<summary>Solution</summary>
|
2022-10-14 12:27:04 +02:00
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
Example run:
|
2022-10-14 12:27:04 +02:00
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
julia> using HTTP
|
|
|
|
|
|
|
|
|
|
julia> response = HTTP.get("https://www.random.org/integers/?\
|
|
|
|
|
num=10&min=1&max=6&col=1&base=10&format=plain&rnd=new");
|
|
|
|
|
|
|
|
|
|
julia> parse.(Int, split(String(response.body)))
|
|
|
|
|
10-element Vector{Int64}:
|
|
|
|
|
6
|
|
|
|
|
2
|
|
|
|
|
6
|
|
|
|
|
3
|
|
|
|
|
4
|
|
|
|
|
2
|
|
|
|
|
5
|
|
|
|
|
2
|
|
|
|
|
3
|
|
|
|
|
6
|
|
|
|
|
```
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
</details>
|
|
|
|
|
|
2022-10-14 12:27:04 +02:00
|
|
|
|
### Exercise 2
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
Write a function that tries to parse a string as an integer.
|
|
|
|
|
If it succeeds it should return the integer, otherwise it should return `0`
|
|
|
|
|
but print error message.
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary>Solution</summary>
|
|
|
|
|
|
2022-10-14 12:27:04 +02:00
|
|
|
|
Example function:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
function str2int(s::AbstractString)
|
|
|
|
|
try
|
|
|
|
|
return parse(Int, s)
|
|
|
|
|
catch e
|
|
|
|
|
println(e)
|
|
|
|
|
end
|
|
|
|
|
return 0
|
|
|
|
|
end
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Let us check it:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
julia> str2int("10")
|
|
|
|
|
10
|
|
|
|
|
|
|
|
|
|
julia> str2int(" -1 ")
|
|
|
|
|
-1
|
|
|
|
|
|
|
|
|
|
julia> str2int("12345678901234567890")
|
|
|
|
|
OverflowError("overflow parsing \"12345678901234567890\"")
|
|
|
|
|
0
|
|
|
|
|
|
|
|
|
|
julia> str2int("1.3")
|
|
|
|
|
ArgumentError("invalid base 10 digit '.' in \"1.3\"")
|
|
|
|
|
0
|
|
|
|
|
|
|
|
|
|
julia> str2int("a")
|
|
|
|
|
ArgumentError("invalid base 10 digit 'a' in \"a\"")
|
|
|
|
|
0
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
An alternative solution would use `tryparse` (not covered in the book):
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
function str2int(s::AbstractString)
|
|
|
|
|
v = tryparse(Int, s)
|
|
|
|
|
if isnothing(v)
|
|
|
|
|
println("error while parsing")
|
|
|
|
|
return 0
|
|
|
|
|
end
|
|
|
|
|
return v
|
|
|
|
|
end
|
|
|
|
|
```
|
|
|
|
|
But this time we do not see the cause of the error.
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
</details>
|
|
|
|
|
|
2022-10-14 12:27:04 +02:00
|
|
|
|
### Exercise 3
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
Create a matrix containing truth table for `&&` operation including `missing`.
|
|
|
|
|
If some operation errors store `"error"` in the table. As an extra feature (this
|
|
|
|
|
is harder so you can skip it) in each cell store both inputs and output to make
|
|
|
|
|
reading the table easier.
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary>Solution</summary>
|
2022-10-14 12:27:04 +02:00
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
julia> function apply_and(x, y)
|
|
|
|
|
try
|
|
|
|
|
return "$x && $y = $(x && y)"
|
|
|
|
|
catch e
|
|
|
|
|
return "$x && $y = error"
|
|
|
|
|
end
|
|
|
|
|
end
|
|
|
|
|
apply_and (generic function with 2 methods)
|
|
|
|
|
|
|
|
|
|
julia> apply_and.([true, false, missing], [true false missing])
|
|
|
|
|
3×3 Matrix{String}:
|
|
|
|
|
"true && true = true" "true && false = false" "true && missing = missing"
|
|
|
|
|
"false && true = false" "false && false = false" "false && missing = false"
|
|
|
|
|
"missing && true = error" "missing && false = error" "missing && missing = error"
|
|
|
|
|
```
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
</details>
|
|
|
|
|
|
2022-10-14 12:27:04 +02:00
|
|
|
|
### Exercise 4
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
Take a vector `v = [1.5, 2.5, missing, 4.5, 5.5, missing]` and replace all
|
|
|
|
|
missing values in it by the mean of the non-missing values.
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary>Solution</summary>
|
2022-10-14 12:27:04 +02:00
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
julia> using Statistics
|
|
|
|
|
|
|
|
|
|
julia> coalesce.(v, mean(skipmissing(v)))
|
|
|
|
|
6-element Vector{Float64}:
|
|
|
|
|
1.5
|
|
|
|
|
2.5
|
|
|
|
|
3.5
|
|
|
|
|
4.5
|
|
|
|
|
5.5
|
|
|
|
|
3.5
|
|
|
|
|
```
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
</details>
|
|
|
|
|
|
2022-10-14 12:27:04 +02:00
|
|
|
|
### Exercise 5
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
Take a vector `s = ["1.5", "2.5", missing, "4.5", "5.5", missing]` and parse
|
|
|
|
|
strings stored in it as `Float64`, while keeping `missing` values unchanged.
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary>Solution</summary>
|
2022-10-14 12:27:04 +02:00
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
julia> using Missings
|
|
|
|
|
|
|
|
|
|
julia> passmissing(parse).(Float64, s)
|
|
|
|
|
6-element Vector{Union{Missing, Float64}}:
|
|
|
|
|
1.5
|
|
|
|
|
2.5
|
|
|
|
|
missing
|
|
|
|
|
4.5
|
|
|
|
|
5.5
|
|
|
|
|
missing
|
|
|
|
|
```
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
</details>
|
|
|
|
|
|
2022-10-14 12:27:04 +02:00
|
|
|
|
### Exercise 6
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
Print to the terminal all days in January 2023 that are Mondays.
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary>Solution</summary>
|
|
|
|
|
|
|
|
|
|
Example:
|
2022-10-14 12:27:04 +02:00
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
julia> using Dates
|
|
|
|
|
|
|
|
|
|
julia> for day in Date.(2023, 01, 1:31)
|
|
|
|
|
dayofweek(day) == 1 && println(day)
|
|
|
|
|
end
|
|
|
|
|
2023-01-02
|
|
|
|
|
2023-01-09
|
|
|
|
|
2023-01-16
|
|
|
|
|
2023-01-23
|
|
|
|
|
2023-01-30
|
|
|
|
|
```
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
</details>
|
|
|
|
|
|
2022-10-14 12:27:04 +02:00
|
|
|
|
### Exercise 7
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
Compute the dates that are one month later than January 15, 2020, February 15
|
|
|
|
|
2020, March 15, 2020, and April 15, 2020. How many days pass during this one
|
|
|
|
|
month. Print the results to the screen?
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary>Solution</summary>
|
|
|
|
|
|
|
|
|
|
Example:
|
2022-10-14 12:27:04 +02:00
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
julia> for day in Date.(2023, 1:4, 15)
|
|
|
|
|
day_next = day + Month(1)
|
|
|
|
|
println("$day + 1 month = $day_next (difference: $(day_next - day))")
|
|
|
|
|
end
|
|
|
|
|
2023-01-15 + 1 month = 2023-02-15 (difference: 31 days)
|
|
|
|
|
2023-02-15 + 1 month = 2023-03-15 (difference: 28 days)
|
|
|
|
|
2023-03-15 + 1 month = 2023-04-15 (difference: 31 days)
|
|
|
|
|
2023-04-15 + 1 month = 2023-05-15 (difference: 30 days)
|
|
|
|
|
```
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
</details>
|
|
|
|
|
|
2022-10-14 12:27:04 +02:00
|
|
|
|
### Exercise 8
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
Parse the following string as JSON:
|
|
|
|
|
```
|
|
|
|
|
str = """
|
|
|
|
|
[{"x":1,"y":1},
|
|
|
|
|
{"x":2,"y":4},
|
|
|
|
|
{"x":3,"y":9},
|
|
|
|
|
{"x":4,"y":16},
|
|
|
|
|
{"x":5,"y":25}]
|
|
|
|
|
"""
|
|
|
|
|
```
|
|
|
|
|
into a `json` variable.
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary>Solution</summary>
|
2022-10-14 12:27:04 +02:00
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
julia> using JSON3
|
|
|
|
|
|
|
|
|
|
julia> json = JSON3.read(str)
|
|
|
|
|
5-element JSON3.Array{JSON3.Object, Base.CodeUnits{UInt8, String}, Vector{UInt64}}:
|
|
|
|
|
{
|
|
|
|
|
"x": 1,
|
|
|
|
|
"y": 1
|
|
|
|
|
}
|
|
|
|
|
{
|
|
|
|
|
"x": 2,
|
|
|
|
|
"y": 4
|
|
|
|
|
}
|
|
|
|
|
{
|
|
|
|
|
"x": 3,
|
|
|
|
|
"y": 9
|
|
|
|
|
}
|
|
|
|
|
{
|
|
|
|
|
"x": 4,
|
|
|
|
|
"y": 16
|
|
|
|
|
}
|
|
|
|
|
{
|
|
|
|
|
"x": 5,
|
|
|
|
|
"y": 25
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
</details>
|
|
|
|
|
|
2022-10-14 12:27:04 +02:00
|
|
|
|
### Exercise 9
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
Extract from the `json` variable from exercise 8 two vectors `x` and `y`
|
|
|
|
|
that correspond to the fields stored in the JSON structure.
|
|
|
|
|
Plot `y` as a function of `x`.
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary>Solution</summary>
|
2022-10-14 12:27:04 +02:00
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
using Plots
|
|
|
|
|
x = [el.x for el in json]
|
|
|
|
|
y = [el.y for el in json]
|
|
|
|
|
plot(x, y, xlabel="x", ylabel="y", legend=false)
|
|
|
|
|
```
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
</details>
|
|
|
|
|
|
2022-10-14 12:27:04 +02:00
|
|
|
|
### Exercise 10
|
|
|
|
|
|
2022-10-14 13:43:12 +02:00
|
|
|
|
Given a vector `m = [missing, 1, missing, 3, missing, missing, 6, missing]`.
|
|
|
|
|
Use linear interpolation for filling missing values. For the extreme values
|
|
|
|
|
use nearest available observation (you will need to consult Impute.jl
|
|
|
|
|
documentation to find all required functions).
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary>Solution</summary>
|
2022-10-14 12:27:04 +02:00
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
julia> using Impute
|
|
|
|
|
|
|
|
|
|
julia> Impute.nocb!(Impute.locf!(Impute.interp(m)))
|
|
|
|
|
8-element Vector{Union{Missing, Int64}}:
|
|
|
|
|
1
|
|
|
|
|
1
|
|
|
|
|
2
|
|
|
|
|
3
|
|
|
|
|
4
|
|
|
|
|
5
|
|
|
|
|
6
|
|
|
|
|
6
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Note that we use the `locf!` and `nocb!` functions (with `!`) to perform
|
|
|
|
|
operation in place (a new vector was already allocated by `Impute.interp`).
|
|
|
|
|
|
|
|
|
|
</details>
|