JuliaForDataAnalysis/exercises/exercises07.md
2022-12-05 18:27:43 +01:00

5.9 KiB
Raw Blame History

Julia for Data Analysis

Bogumił Kamiński, Daniel Kaszyński

Chapter 7

Problems

Exercise 1

Random.org provides a service that returns random numbers. One of the ways how you can use it is by sending HTTP GET requests. Here is an example request:

https://www.random.org/integers/?num=10&min=1&max=6&col=1&base=10&format=plain&rnd=new

If you want to understand all the parameters please check their meaning here.

For us it is enough that this request generates 10 random integers in the range from 1 to 6. Run this query in Julia and parse the result.

Solution

Example run:

julia> using HTTP

julia> response = HTTP.get("https://www.random.org/integers/?\
                            num=10&min=1&max=6&col=1&base=10&format=plain&rnd=new");

julia> parse.(Int, split(String(response.body)))
10-element Vector{Int64}:
 6
 2
 6
 3
 4
 2
 5
 2
 3
 6

Exercise 2

Write a function that tries to parse a string as an integer. If it succeeds it should return the integer, otherwise it should return 0 but print error message.

Solution

Example function:

function str2int(s::AbstractString)
    try
        return parse(Int, s)
    catch e
        println(e)
    end
    return 0
end

Let us check it:

julia> str2int("10")
10

julia> str2int("  -1  ")
-1

julia> str2int("12345678901234567890")
OverflowError("overflow parsing \"12345678901234567890\"")
0

julia> str2int("1.3")
ArgumentError("invalid base 10 digit '.' in \"1.3\"")
0

julia> str2int("a")
ArgumentError("invalid base 10 digit 'a' in \"a\"")
0

An alternative solution would use tryparse (not covered in the book):

function str2int(s::AbstractString)
    v = tryparse(Int, s)
    if isnothing(v)
        println("error while parsing")
        return 0
    end
    return v
end

But this time we do not see the cause of the error.

Exercise 3

Create a matrix containing truth table for && operation including missing. If some operation errors store "error" in the table. As an extra feature (this is harder so you can skip it) in each cell store both inputs and output to make reading the table easier.

Solution
julia> function apply_and(x, y)
           try
               return "$x && $y = $(x && y)"
           catch e
               return "$x && $y = error"
           end
       end
apply_and (generic function with 2 methods)

julia> apply_and.([true, false, missing], [true false missing])
3×3 Matrix{String}:
 "true && true = true"      "true && false = false"     "true && missing = missing"
 "false && true = false"    "false && false = false"    "false && missing = false"
 "missing && true = error"  "missing && false = error"  "missing && missing = error"

Exercise 4

Take a vector v = [1.5, 2.5, missing, 4.5, 5.5, missing] and replace all missing values in it by the mean of the non-missing values.

Solution
julia> using Statistics

julia> coalesce.(v, mean(skipmissing(v)))
6-element Vector{Float64}:
 1.5
 2.5
 3.5
 4.5
 5.5
 3.5

Exercise 5

Take a vector s = ["1.5", "2.5", missing, "4.5", "5.5", missing] and parse strings stored in it as Float64, while keeping missing values unchanged.

Solution
julia> using Missings

julia> passmissing(parse).(Float64, s)
6-element Vector{Union{Missing, Float64}}:
 1.5
 2.5
  missing
 4.5
 5.5
  missing

Exercise 6

Print to the terminal all days in January 2023 that are Mondays.

Solution

Example:

julia> using Dates

julia> for day in Date.(2023, 01, 1:31)
           dayofweek(day) == 1 && println(day)
       end
2023-01-02
2023-01-09
2023-01-16
2023-01-23
2023-01-30

Exercise 7

Compute the dates that are one month later than January 15, 2020, February 15 2020, March 15, 2020, and April 15, 2020. How many days pass during this one month. Print the results to the screen?

Solution

Example:

julia> for day in Date.(2023, 1:4, 15)
           day_next = day + Month(1)
           println("$day + 1 month = $day_next (difference: $(day_next - day))")
       end
2023-01-15 + 1 month = 2023-02-15 (difference: 31 days)
2023-02-15 + 1 month = 2023-03-15 (difference: 28 days)
2023-03-15 + 1 month = 2023-04-15 (difference: 31 days)
2023-04-15 + 1 month = 2023-05-15 (difference: 30 days)

Exercise 8

Parse the following string as JSON:

str = """
[{"x":1,"y":1},
 {"x":2,"y":4},
 {"x":3,"y":9},
 {"x":4,"y":16},
 {"x":5,"y":25}]
"""

into a json variable.

Solution
julia> using JSON3

julia> json = JSON3.read(str)
5-element JSON3.Array{JSON3.Object, Base.CodeUnits{UInt8, String}, Vector{UInt64}}:
 {
   "x": 1,
   "y": 1
}
 {
   "x": 2,
   "y": 4
}
 {
   "x": 3,
   "y": 9
}
 {
   "x": 4,
   "y": 16
}
 {
   "x": 5,
   "y": 25
}

Exercise 9

Extract from the json variable from exercise 8 two vectors x and y that correspond to the fields stored in the JSON structure. Plot y as a function of x.

Solution
using Plots
x = [el.x for el in json]
y = [el.y for el in json]
plot(x, y, xlabel="x", ylabel="y", legend=false)

Exercise 10

Given a vector m = [missing, 1, missing, 3, missing, missing, 6, missing]. Use linear interpolation for filling missing values. For the extreme values use nearest available observation (you will need to consult Impute.jl documentation to find all required functions).

Solution
julia> using Impute

julia> Impute.nocb!(Impute.locf!(Impute.interp(m)))
8-element Vector{Union{Missing, Int64}}:
 1
 1
 2
 3
 4
 5
 6
 6

Note that we use the locf! and nocb! functions (with !) to perform operation in place (a new vector was already allocated by Impute.interp).