6.2 KiB
Julia for Data Analysis
Bogumił Kamiński, Daniel Kaszyński
Chapter 5
Problems
Exercise 1
Create a matrix containing truth table for &&
and ||
operations.
Solution
You can do it as follows:
julia> [true, false] .&& [true false]
2×2 BitMatrix:
1 0
0 0
julia> [true, false] .|| [true false]
2×2 BitMatrix:
1 1
1 0
Note that the first array is a vector, while the second array is a 1-row matrix.
Exercise 2
The issubset
function checks if one collection is a
subset of other collection.
Now take a range 4:6
and check if it is a subset of
ranges 4+k:4-k
for k
varying from
1
to 3
. Store the result in a vector.
Solution
You can do it like this using broadcasting:
julia> issubset.(Ref(4:6), [4-k:4+k for k in 1:3])
3-element BitVector:
0
1
1
Note that you need to use Ref
to protect
4:6
from being broadcasted over.
Exercise 3
Write a function that accepts two vectors and returns
true
if they have equal length and otherwise returns
false
.
Solution
This function can be written as follows:
function equallength(x::AbstractVector, y::AbstractVector) = length(x) == length(y)
Exercise 4
Consider the vectors x = [1, 2, 1, 2, 1, 2]
,
y = ["a", "a", "b", "b", "b", "a"]
, and
z = [1, 2, 1, 2, 1, 3]
. Calculate their Adjusted Mutual
Information using scikit-learn.
Solution
You can do this exercise as follows:
julia> using PyCall
julia> metrics = pyimport("sklearn.metrics");
julia> metrics.adjusted_mutual_info_score(x, y)
-0.11111111111111087
julia> metrics.adjusted_mutual_info_score(x, z)
0.7276079390930807
julia> metrics.adjusted_mutual_info_score(y, z)
-0.21267989848846763
Exercise 5
Using Adjusted Mutual Information function from exercise 4 generate a
pair of random vectors of length 100 containing integer numbers from the
range 1:5
. Repeat this exercise 1000 times and plot a
histogram of AMI. Check in the documentation of the rand
function how you can draw a sample from a collection of values.
Solution
You can create such a plot using the following commands:
using Plots
histogram([metrics.adjusted_mutual_info_score(rand(1:5, 100), rand(1:5, 100))
for i in 1:1000], label="AMI")
You can check that AMI oscillates around 0.
Exercise 6
Adjust the code from exercise 5 but replace first 50 elements of each vector with zero. Repeat the experiment.
Solution
This time it is convenient to write a helper function. Note that we use broadcasting to update values in the vectors.
function exampleAMI()
x = rand(1:5, 100)
y = rand(1:5, 100)
x[1:50] .= 0
y[1:50] .= 0
return metrics.adjusted_mutual_info_score(x, y)
end
histogram([exampleAMI() for i in 1:1000], label="AMI")
Note that this time AMI is a bit below 0.5, which shows a better match between vectors.
Exercise 7
Write a function that takes a vector of integer values and returns a dictionary giving information how many times each integer was present in the passed vector.
Test this function on vectors v1 = [1, 2, 3, 2, 3, 3]
,
v2 = [true, false]
, and v3 = 3:5
.
Solution
julia> function counter(v::AbstractVector{<:Integer})
d = Dict{eltype(v), Int}()
for x in v
if haskey(d, x)
d[x] += 1
else
d[x] = 1
end
end
return d
end
counter (generic function with 1 method)
julia> counter(v1)
Dict{Int64, Int64} with 3 entries:
2 => 2
3 => 3
1 => 1
julia> counter(v2)
Dict{Bool, Int64} with 2 entries:
0 => 1
1 => 1
julia> counter(v3)
Dict{Int64, Int64} with 3 entries:
5 => 1
4 => 1
3 => 1
Note that we used the eltype
function to set a proper
key type for dictionary d
.
Exercise 8
Write code that creates a Bool
diagonal matrix of size
5x5.
Solution
This is a way to do it:
julia> 1:5 .== (1:5)'
5×5 BitMatrix:
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
Using the LinearAlgebra
module you could also write:
julia> using LinearAlgebra
julia> I(5)
5×5 Diagonal{Bool, Vector{Bool}}:
1 ⋅ ⋅ ⋅ ⋅
⋅ 1 ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ ⋅
⋅ ⋅ ⋅ 1 ⋅
⋅ ⋅ ⋅ ⋅ 1
Exercise 9
Write a code comparing performance of calculation of sum of
logarithms of elements of a vector 1:100
using broadcasting
and the sum
function vs only the sum
function
taking a function as a first argument.
Solution
Here is how you can do it:
julia> using BenchmarkTools
julia> @btime sum(log.(1:100))
1.620 μs (1 allocation: 896 bytes)
363.7393755555635
julia> @btime sum(log, 1:100)
1.570 μs (0 allocations: 0 bytes)
363.7393755555636
As you can see using the sum
function with
log
as its first argument is a bit faster as it is not
allocating.
Exercise 10
Create a dictionary in which for each number from 1
to
10
you will store a vector of its positive divisors. You
can check the reminder of division of two values using the
rem
function.
Additionally (not covered in the book), you can drop elements from a
comprehension if you add an if
clause after the
for
clause, for example to keep only odd numbers from range
1:10
do:
julia> [i for i in 1:10 if isodd(i)]
5-element Vector{Int64}:
1
3
5
7
9
You can populate a dictionary by passing a vector of pairs to it (not covered in the book), for example:
julia> Dict(["a" => 1, "b" => 2])
Dict{String, Int64} with 2 entries:
"b" => 2
"a" => 1
Solution
Here is how you can do it:
julia> Dict([i => [j for j in 1:i if rem(i, j) == 0] for i in 1:10])
Dict{Int64, Vector{Int64}} with 10 entries:
5 => [1, 5]
4 => [1, 2, 4]
6 => [1, 2, 3, 6]
7 => [1, 7]
2 => [1, 2]
10 => [1, 2, 5, 10]
9 => [1, 3, 9]
8 => [1, 2, 4, 8]
3 => [1, 3]
1 => [1]