8.0 KiB
Julia for Data Analysis
Bogumił Kamiński, Daniel Kaszyński
Chapter 3
Problems
Exercise 1
Check what methods does the repeat
function have. Are
they all covered in help for this function?
Solution
Write:
julia> methods(repeat)
# 6 methods for generic function "repeat":
[1] repeat(A::AbstractArray; inner, outer) in Base at abstractarraymath.jl:392
[2] repeat(A::AbstractArray, counts...) in Base at abstractarraymath.jl:355
[3] repeat(c::Char, r::Integer) in Base at strings/string.jl:336
[4] repeat(c::AbstractChar, r::Integer) in Base at strings/string.jl:335
[5] repeat(s::Union{SubString{String}, String}, r::Integer) in Base at strings/substring.jl:248
[6] repeat(s::AbstractString, r::Integer) in Base at strings/basic.jl:715
Now write ?repeat
and you will see that there are four
entries in help. The reason is that for Char
and
AbstractChar
as well as for AbstractString
and
Union{SubString{String}, String}
there is one help
entry.
Why do these cases have two methods defined? The reason is
performance. For example
repeat(c::AbstractChar, r::Integer)
is a generic function
that accept any character values and
repeat(c::Char, r::Integer)
is its faster version that
accepts values that have Char
type only (and it is invoked
by Julia if value of type Char
is passed as an argument to
repeat
).
Exercise 2
Write a function fun2
that takes any vector and returns
the difference between the largest and the smallest element in this
vector.
Solution
You can define is as follows:
fun2(x::AbstractVector) = maximum(x) - minimum(x)
or as follows:
function fun2(x::AbstractVector)
lo, hi = extrema(x)
return hi - lo
end
Note that these two functions will work with vectors of any elements that are ordered and support subtraction (they do not have to be numbers).
Exercise 3
Generate a vector of one million random numbers from
[0, 1]
interval. Check what is a faster way to get a
maximum and minimum element in it. One option is by using the
maximum
and minimum
functions and the other is
by using the extrema
function.
Solution
Here is a way to compare the performance of both options:
julia> using BenchmarkTools
julia> x = rand(10^6);
julia> @btime minimum($x), maximum($x)
860.700 μs (0 allocations: 0 bytes)
(1.489173560242918e-6, 0.9999984347293639)
julia> @btime extrema($x)
2.185 ms (0 allocations: 0 bytes)
(1.489173560242918e-6, 0.9999984347293639)
As you can see in this situation, although extrema
does
the operation in a single pass over x
it is slower than
computing minimum
and maximum
in two
passes.
Exercise 4
Assume you have accidentally typed +x = 1
when wanting
to assign 1
to variable x
. What effects can
this operation have?
Solution
If it is a fresh Julia session you define a new function in
Main
for +
operator:
julia> +x=1
+ (generic function with 1 method)
julia> methods(+)
# 1 method for generic function "+":
[1] +(x) in Main at REPL[1]:1
julia> +(10)
1
This will also break any further uses of +
in your
programs:
julia> 1 + 2
ERROR: MethodError: no method matching +(::Int64, ::Int64)
You may have intended to import Base.:+
Closest candidates are:
+(::Any) at REPL[1]:1
If you earlier used addition in this Julia session then the operation will error. Start a fresh Julia session:
julia> 1 + 2
3
julia> +x=1
ERROR: error in method definition: function Base.+ must be explicitly imported to be extended
Exercise 5
What is the result of calling the subtypes
on
Union{Bool, Missing}
and why?
Solution
You get an empty vector:
julia> subtypes(Union{Float64, Missing})
Type[]
The reason is that the subtypes
function returns
subtypes of explicitly declared types that have names (type of such
types is DataType
in Julia).
Extra for this reason subtypes
has a limited
use. To check if one type is a subtype of some other type use the
<:
operator.
Exercise 6
Define two identical anonymous functions x -> x + 1
in global scope? Do they have the same type?
Solution
No, each of them has a different type:
julia> f1 = x -> x + 1
#1 (generic function with 1 method)
julia> f2 = x -> x + 1
#3 (generic function with 1 method)
julia> typeof(f1)
var"#1#2"
julia> typeof(f2)
var"#3#4"
This is the reason why function call like
sum(x -> x^2, 1:10)
in global scope triggers compilation
each time:
julia> @time sum(x -> x^2, 1:10)
0.070714 seconds (167.41 k allocations: 8.815 MiB, 14.29% gc time, 93.91% compilation time)
385
julia> @time sum(x -> x^2, 1:10)
0.020971 seconds (47.82 k allocations: 2.529 MiB, 99.75% compilation time)
385
julia> @time sum(x -> x^2, 1:10)
0.021184 seconds (47.81 k allocations: 2.529 MiB, 99.77% compilation time)
385
Exercise 7
Define the wrap
function taking one argument
i
and returning the anonymous function
x -> x + i
. Is the type of such anonymous function the
same across calls to wrap
function?
Solution
Yes, the type is the same:
julia> wrap(i) = x -> x + i
wrap (generic function with 1 method)
julia> typeof(wrap(1))
var"#11#12"
julia> typeof(wrap(2))
var"#11#12"
Julia defines a new type for such an anonymous function only once The
consequence of this is that e.g. expressions inside a function like
sum(x -> x ^ i, 1:10)
where i
is an
argument to a function do not trigger compilation (as opposed to similar
expressions in global scope, see exercise 6).
julia> sumi(i) = sum(x -> x^i, 1:10)
sumi (generic function with 1 method)
julia> @time sumi(1)
0.000004 seconds
55
julia> @time sumi(2)
0.000001 seconds
385
julia> @time sumi(3)
0.000003 seconds
3025
Exercise 8
You want to write a function that accepts any Integer
except Bool
and returns the passed value. If
Bool
is passed an error should be thrown.
Solution
We check subtypes of Integer
:
julia> subtypes(Integer)
3-element Vector{Any}:
Bool
Signed
Unsigned
The first way to write such a function is then:
fun1(i::Union{Signed, Unsigned}) = i
and now we have:
julia> fun1(1)
1
julia> fun1(true)
ERROR: MethodError: no method matching fun1(::Bool)
The second way is:
fun2(i::Integer) = i
fun2(::Bool) = throw(ArgumentError("Bool is not supported"))
and now you have:
julia> fun2(1)
1
julia> fun2(true)
ERROR: ArgumentError: Bool is not supported
Exercise 9
The @time
macro measures time taken by an expression run
and prints it, but returns the value of the expression. The
@elapsed
macro works differently - it does not print
anything, but returns time taken to evaluate an expression. Test the
@elapsed
macro by to see how long it takes to shuffle a
vector of one million floats. Use the shuffle
function from
Random
module.
Solution
Here is the code that performs the task:
julia> using Random # needed to get access to shuffle
julia> x = rand(10^6); # generate random floats
julia> @elapsed shuffle(x)
0.0518085
julia> @elapsed shuffle(x)
0.01257
julia> @elapsed shuffle(x)
0.012483
Note that the first time we run shuffle
it takes longer
due to compilation.
Exercise 10
Using the @btime
macro benchmark the time of calculating
the sum of one million random floats.
Solution
The code you can use is:
julia> using BenchmarkTools
julia> @btime sum($(rand(10^6)))
155.300 μs (0 allocations: 0 bytes)
500330.6375697419
Note that the following:
julia> @btime sum(rand(10^6))
1.644 ms (2 allocations: 7.63 MiB)
500266.9457722128
would be an incorrect timing as you would also measure the time of generating of the vector.
Alternatively you can e.g. write:
julia> x = rand(10^6);
julia> @btime sum($x)
154.700 μs (0 allocations: 0 bytes)
500151.95875364926