JuliaForDataAnalysis/exercises/exercises05.md
2022-10-14 13:43:12 +02:00

6.2 KiB
Raw Blame History

Julia for Data Analysis

Bogumił Kamiński, Daniel Kaszyński

Chapter 5

Problems

Exercise 1

Create a matrix containing truth table for && and || operations.

Solution

You can do it as follows:

julia> [true, false] .&& [true false]
2×2 BitMatrix:
 1  0
 0  0

julia> [true, false] .|| [true false]
2×2 BitMatrix:
 1  1
 1  0

Note that the first array is a vector, while the second array is a 1-row matrix.

Exercise 2

The issubset function checks if one collection is a subset of other collection.

Now take a range 4:6 and check if it is a subset of ranges 4+k:4-k for k varying from 1 to 3. Store the result in a vector.

Solution

You can do it like this using broadcasting:

julia> issubset.(Ref(4:6), [4-k:4+k for k in 1:3])
3-element BitVector:
 0
 1
 1

Note that you need to use Ref to protect 4:6 from being broadcasted over.

Exercise 3

Write a function that accepts two vectors and returns true if they have equal length and otherwise returns false.

Solution

This function can be written as follows:

function equallength(x::AbstractVector, y::AbstractVector) = length(x) == length(y)

Exercise 4

Consider the vectors x = [1, 2, 1, 2, 1, 2], y = ["a", "a", "b", "b", "b", "a"], and z = [1, 2, 1, 2, 1, 3]. Calculate their Adjusted Mutual Information using scikit-learn.

Solution

You can do this exercise as follows:

julia> using PyCall

julia> metrics = pyimport("sklearn.metrics");

julia> metrics.adjusted_mutual_info_score(x, y)
-0.11111111111111087

julia> metrics.adjusted_mutual_info_score(x, z)
0.7276079390930807

julia> metrics.adjusted_mutual_info_score(y, z)
-0.21267989848846763

Exercise 5

Using Adjusted Mutual Information function from exercise 4 generate a pair of random vectors of length 100 containing integer numbers from the range 1:5. Repeat this exercise 1000 times and plot a histogram of AMI. Check in the documentation of the rand function how you can draw a sample from a collection of values.

Solution

You can create such a plot using the following commands:

using Plots
histogram([metrics.adjusted_mutual_info_score(rand(1:5, 100), rand(1:5, 100))
           for i in 1:1000], label="AMI")

You can check that AMI oscillates around 0.

Exercise 6

Adjust the code from exercise 5 but replace first 50 elements of each vector with zero. Repeat the experiment.

Solution

This time it is convenient to write a helper function. Note that we use broadcasting to update values in the vectors.

function exampleAMI()
    x = rand(1:5, 100)
    y = rand(1:5, 100)
    x[1:50] .= 0
    y[1:50] .= 0
    return metrics.adjusted_mutual_info_score(x, y)
end
histogram([exampleAMI() for i in 1:1000], label="AMI")

Note that this time AMI is a bit below 0.5, which shows a better match between vectors.

Exercise 7

Write a function that takes a vector of integer values and returns a dictionary giving information how many times each integer was present in the passed vector.

Test this function on vectors v1 = [1, 2, 3, 2, 3, 3], v2 = [true, false], and v3 = 3:5.

Solution
julia> function counter(v::AbstractVector{<:Integer})
           d = Dict{eltype(v), Int}()
           for x in v
               if haskey(d, x)
                   d[x] += 1
               else
                   d[x] = 1
               end
           end
           return d
       end
counter (generic function with 1 method)

julia> counter(v1)
Dict{Int64, Int64} with 3 entries:
  2 => 2
  3 => 3
  1 => 1

julia> counter(v2)
Dict{Bool, Int64} with 2 entries:
  0 => 1
  1 => 1

julia> counter(v3)
Dict{Int64, Int64} with 3 entries:
  5 => 1
  4 => 1
  3 => 1

Note that we used the eltype function to set a proper key type for dictionary d.

Exercise 8

Write code that creates a Bool diagonal matrix of size 5x5.

Solution

This is a way to do it:

julia> 1:5 .== (1:5)'
5×5 BitMatrix:
 1  0  0  0  0
 0  1  0  0  0
 0  0  1  0  0
 0  0  0  1  0
 0  0  0  0  1

Using the LinearAlgebra module you could also write:

julia> using LinearAlgebra

julia> I(5)
5×5 Diagonal{Bool, Vector{Bool}}:
 1  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅
 ⋅  ⋅  ⋅  1  ⋅
 ⋅  ⋅  ⋅  ⋅  1

Exercise 9

Write a code comparing performance of calculation of sum of logarithms of elements of a vector 1:100 using broadcasting and the sum function vs only the sum function taking a function as a first argument.

Solution

Here is how you can do it:

julia> using BenchmarkTools

julia> @btime sum(log.(1:100))
  1.620 μs (1 allocation: 896 bytes)
363.7393755555635

julia> @btime sum(log, 1:100)
  1.570 μs (0 allocations: 0 bytes)
363.7393755555636

As you can see using the sum function with log as its first argument is a bit faster as it is not allocating.

Exercise 10

Create a dictionary in which for each number from 1 to 10 you will store a vector of its positive divisors. You can check the reminder of division of two values using the rem function.

Additionally (not covered in the book), you can drop elements from a comprehension if you add an if clause after the for clause, for example to keep only odd numbers from range 1:10 do:

julia> [i for i in 1:10 if isodd(i)]
5-element Vector{Int64}:
 1
 3
 5
 7
 9

You can populate a dictionary by passing a vector of pairs to it (not covered in the book), for example:

julia> Dict(["a" => 1, "b" => 2])
Dict{String, Int64} with 2 entries:
  "b" => 2
  "a" => 1
Solution

Here is how you can do it:

julia> Dict([i => [j for j in 1:i if rem(i, j) == 0] for i in 1:10])
Dict{Int64, Vector{Int64}} with 10 entries:
  5  => [1, 5]
  4  => [1, 2, 4]
  6  => [1, 2, 3, 6]
  7  => [1, 7]
  2  => [1, 2]
  10 => [1, 2, 5, 10]
  9  => [1, 3, 9]
  8  => [1, 2, 4, 8]
  3  => [1, 3]
  1  => [1]