8.0 KiB
Julia for Data Analysis
Bogumił Kamiński, Daniel Kaszyński
Chapter 3
Problems
Exercise 1
Check what methods does the repeat function have. Are
they all covered in help for this function?
Solution
Write:
julia> methods(repeat)
# 6 methods for generic function "repeat":
[1] repeat(A::AbstractArray; inner, outer) in Base at abstractarraymath.jl:392
[2] repeat(A::AbstractArray, counts...) in Base at abstractarraymath.jl:355
[3] repeat(c::Char, r::Integer) in Base at strings/string.jl:336
[4] repeat(c::AbstractChar, r::Integer) in Base at strings/string.jl:335
[5] repeat(s::Union{SubString{String}, String}, r::Integer) in Base at strings/substring.jl:248
[6] repeat(s::AbstractString, r::Integer) in Base at strings/basic.jl:715
Now write ?repeat and you will see that there are four
entries in help. The reason is that for Char and
AbstractChar as well as for AbstractString and
Union{SubString{String}, String} there is one help
entry.
Why do these cases have two methods defined? The reason is
performance. For example
repeat(c::AbstractChar, r::Integer) is a generic function
that accept any character values and
repeat(c::Char, r::Integer) is its faster version that
accepts values that have Char type only (and it is invoked
by Julia if value of type Char is passed as an argument to
repeat).
Exercise 2
Write a function fun2 that takes any vector and returns
the difference between the largest and the smallest element in this
vector.
Solution
You can define is as follows:
fun2(x::AbstractVector) = maximum(x) - minimum(x)
or as follows:
function fun2(x::AbstractVector)
lo, hi = extrema(x)
return hi - lo
end
Note that these two functions will work with vectors of any elements that are ordered and support subtraction (they do not have to be numbers).
Exercise 3
Generate a vector of one million random numbers from
[0, 1] interval. Check what is a faster way to get a
maximum and minimum element in it. One option is by using the
maximum and minimum functions and the other is
by using the extrema function.
Solution
Here is a way to compare the performance of both options:
julia> using BenchmarkTools
julia> x = rand(10^6);
julia> @btime minimum($x), maximum($x)
860.700 μs (0 allocations: 0 bytes)
(1.489173560242918e-6, 0.9999984347293639)
julia> @btime extrema($x)
2.185 ms (0 allocations: 0 bytes)
(1.489173560242918e-6, 0.9999984347293639)
As you can see in this situation, although extrema does
the operation in a single pass over x it is slower than
computing minimum and maximum in two
passes.
Exercise 4
Assume you have accidentally typed +x = 1 when wanting
to assign 1 to variable x. What effects can
this operation have?
Solution
If it is a fresh Julia session you define a new function in
Main for + operator:
julia> +x=1
+ (generic function with 1 method)
julia> methods(+)
# 1 method for generic function "+":
[1] +(x) in Main at REPL[1]:1
julia> +(10)
1
This will also break any further uses of + in your
programs:
julia> 1 + 2
ERROR: MethodError: no method matching +(::Int64, ::Int64)
You may have intended to import Base.:+
Closest candidates are:
+(::Any) at REPL[1]:1
If you earlier used addition in this Julia session then the operation will error. Start a fresh Julia session:
julia> 1 + 2
3
julia> +x=1
ERROR: error in method definition: function Base.+ must be explicitly imported to be extended
Exercise 5
What is the result of calling the subtypes on
Union{Bool, Missing} and why?
Solution
You get an empty vector:
julia> subtypes(Union{Float64, Missing})
Type[]
The reason is that the subtypes function returns
subtypes of explicitly declared types that have names (type of such
types is DataType in Julia).
Extra for this reason subtypes has a limited
use. To check if one type is a subtype of some other type use the
<: operator.
Exercise 6
Define two identical anonymous functions x -> x + 1
in global scope? Do they have the same type?
Solution
No, each of them has a different type:
julia> f1 = x -> x + 1
#1 (generic function with 1 method)
julia> f2 = x -> x + 1
#3 (generic function with 1 method)
julia> typeof(f1)
var"#1#2"
julia> typeof(f2)
var"#3#4"
This is the reason why function call like
sum(x -> x^2, 1:10) in global scope triggers compilation
each time:
julia> @time sum(x -> x^2, 1:10)
0.070714 seconds (167.41 k allocations: 8.815 MiB, 14.29% gc time, 93.91% compilation time)
385
julia> @time sum(x -> x^2, 1:10)
0.020971 seconds (47.82 k allocations: 2.529 MiB, 99.75% compilation time)
385
julia> @time sum(x -> x^2, 1:10)
0.021184 seconds (47.81 k allocations: 2.529 MiB, 99.77% compilation time)
385
Exercise 7
Define the wrap function taking one argument
i and returning the anonymous function
x -> x + i. Is the type of such anonymous function the
same across calls to wrap function?
Solution
Yes, the type is the same:
julia> wrap(i) = x -> x + i
wrap (generic function with 1 method)
julia> typeof(wrap(1))
var"#11#12"
julia> typeof(wrap(2))
var"#11#12"
Julia defines a new type for such an anonymous function only once The
consequence of this is that e.g. expressions inside a function like
sum(x -> x ^ i, 1:10) where i is an
argument to a function do not trigger compilation (as opposed to similar
expressions in global scope, see exercise 6).
julia> sumi(i) = sum(x -> x^i, 1:10)
sumi (generic function with 1 method)
julia> @time sumi(1)
0.000004 seconds
55
julia> @time sumi(2)
0.000001 seconds
385
julia> @time sumi(3)
0.000003 seconds
3025
Exercise 8
You want to write a function that accepts any Integer
except Bool and returns the passed value. If
Bool is passed an error should be thrown.
Solution
We check subtypes of Integer:
julia> subtypes(Integer)
3-element Vector{Any}:
Bool
Signed
Unsigned
The first way to write such a function is then:
fun1(i::Union{Signed, Unsigned}) = i
and now we have:
julia> fun1(1)
1
julia> fun1(true)
ERROR: MethodError: no method matching fun1(::Bool)
The second way is:
fun2(i::Integer) = i
fun2(::Bool) = throw(ArgumentError("Bool is not supported"))
and now you have:
julia> fun2(1)
1
julia> fun2(true)
ERROR: ArgumentError: Bool is not supported
Exercise 9
The @time macro measures time taken by an expression run
and prints it, but returns the value of the expression. The
@elapsed macro works differently - it does not print
anything, but returns time taken to evaluate an expression. Test the
@elapsed macro by to see how long it takes to shuffle a
vector of one million floats. Use the shuffle function from
Random module.
Solution
Here is the code that performs the task:
julia> using Random # needed to get access to shuffle
julia> x = rand(10^6); # generate random floats
julia> @elapsed shuffle(x)
0.0518085
julia> @elapsed shuffle(x)
0.01257
julia> @elapsed shuffle(x)
0.012483
Note that the first time we run shuffle it takes longer
due to compilation.
Exercise 10
Using the @btime macro benchmark the time of calculating
the sum of one million random floats.
Solution
The code you can use is:
julia> using BenchmarkTools
julia> @btime sum($(rand(10^6)))
155.300 μs (0 allocations: 0 bytes)
500330.6375697419
Note that the following:
julia> @btime sum(rand(10^6))
1.644 ms (2 allocations: 7.63 MiB)
500266.9457722128
would be an incorrect timing as you would also measure the time of generating of the vector.
Alternatively you can e.g. write:
julia> x = rand(10^6);
julia> @btime sum($x)
154.700 μs (0 allocations: 0 bytes)
500151.95875364926