JuliaForDataAnalysis/exercises/exercises02.md
2022-10-14 12:54:02 +02:00

6.8 KiB

Julia for Data Analysis

Bogumił Kamiński, Daniel Kaszyński

Chapter 2

Problems

Exercise 1

Consider the following code:

x = [1, 2]
y = x
y[1] = 10

What is the value of x[1] and why?

Solution

x[1] will be 10 because y = x is not copying data but it binds the same value both to variable x and y.

Exercise 2

How can you type ⚡ = 1. Check if this operation succeeds and what is its result.

Solution

In help mode (activated by ?) copy-paste to get:

help?> ⚡
"⚡" can be typed by \:zap:<tab>

After the ⚡ = 1 operation a new variable is defined and it is bound to value 1.

Exercise 3

What will be the value of variable x after running of the following code and why?

x = 0.0
for i in 1:7_000_000
    global x += 1/7
end
x /= 1_000_000
Solution

x will have value 0.9999999999242748. This value is below 1.0 because representation of 1/7 using Float64 type is less than rational number 1/7, and the error accumulates when we do addition multiple times.

Extra: You can check that indeed that Float64 representation is a bit less than rational 1/7 by increasing the precision of computations using the big function:

julia> big(1/7) # convert Floa64 to high-precision float
0.142857142857142849212692681248881854116916656494140625

julia> 1/big(7) # construct high-precision float directly
0.1428571428571428571428571428571428571428571428571428571428571428571428571428568

As you can see there is a difference at 17th place after decimal dot where we have 4 vs 5.

Exercise 4

Express the type Matrix{Bool} using Array type.

Solution

It is Array{Bool, 2}. You immediately get this information in REPL:

julia> Matrix{Bool}
Matrix{Bool} (alias for Array{Bool, 2})

Exercise 5

Let x be a vector. Write code that prints an error if x is empty (has zero elements)

Solution

You can do it like this:

length(x) == 0 && println("x is empty")

Extra: typically in such case one would use the isempty function and throw an exception instead of just printing information (here I assume that x was passed as an argument to the function):

isempty(x) && throw(ArgumentError("x is not allowed to be empty"))

Exercise 6

Write a function called exec that takes two values x and y and a function accepting two arguments, call it op and returns op(x, y). Make + to be the default value of op.

Solution

Here are two ways to define the exec function:

exec1(x, y, op=+) = op(x, y)
exec2(x, y; op=+) = op(x, y)

The first of them uses positional arguments, and the second a keyword argument. Here is a difference in how they would be called:

julia> exec1(2, 3, *)
6

julia> exec2(2, 3; op=*)
6

Exercise 7

Write a function that calculates a sum of absolute values of values stored in a collection passed to it.

Solution

Such a function can be written as:

sumabs(x) = sum(abs, x)

Exercise 8

Write a function that swaps first and last element in an array in place.

Solution

This can be written for example as:

function swap!(x)
    f = x[1]
    x[1] = x[end]
    x[end] = f
    return x
end

Extra A more advanced way to write this function would be:

function swap!(x)
    if length(x) > 1
        x[begin], x[end] = x[end], x[begin]
    end
    return x
end

Note the differences in the code: * we use begin instead of 1 to get the first element. This is a safer practice since some collections in Julia do not use 1-based indexing (in practice you are not likely to see them, so this comment is most relevant for package developers) * if there are 0 or 1 element in the collection the function does not do anything (depending on the context we might want to throw an error instead) * in x[begin], x[end] = x[end], x[begin] we perform two assignments at the same time to avoid having to use a temporaty variable f (this operation is technically called tuple destructuring; we discuss it in later chapters of the book)

Exercise 9

Write a loop in global scope that calculates the sum of cubes of numbers from 1 to 10^6. Next use the sum function to perform the same computation. What is the difference in timing of these operations?

Solution

We used @time macro in chapter 1.

Version in global scope:

julia> s = 0
0

julia> @time for i in 1:10^6
           global s += i^3
       end
  0.076299 seconds (2.00 M allocations: 30.517 MiB, 10.47% gc time)

Version with a function using a sum function:

julia> sum3(n) = sum(x -> x^3, 1:n)
sum3 (generic function with 1 method)

julia> @time sum3(10^6)
  0.000012 seconds
-8222430735553051648

Version with sum function in global scope:

julia> @time sum(x -> x^3, 1:10^6)
  0.027436 seconds (48.61 k allocations: 2.558 MiB, 99.75% compilation time)
-8222430735553051648

julia> @time sum(x -> x^3, 1:10^6)
  0.025744 seconds (48.61 k allocations: 2.557 MiB, 99.76% compilation time)
-8222430735553051648

As you can see using a loop in global scope is inefficient. It leads to many allocations and slow execution.

Using a sum3 function leads to fastest execution. You might ask why using sum(x -> x^3, 1:10^6) in global scope is slower. The reason is that an anonymous function x -> x^3 is defined anew each time this operation is called which forces compilation of the sum function (but it is still faster than the loop in global scope).

For a reference check the function with a loop inside it:

julia> function sum3loop(n)
           s = 0
           for i in 1:n
               s += i^3
           end
           return s
       end
sum3loop (generic function with 1 method)

julia> @time sum3loop(10^6)
  0.001378 seconds
-8222430735553051648

This is also much faster than a loop in global scope.

Exercise 10

Explain the value of the result of summation obtained in exercise 9.

Solution

In exercise 9 we note that the result is -8222430735553051648 which is a negative value, although we are adding cubes of positive values. The reason of the problem is that operations on integers overflow. If you are working with numbers larger that can be stored in Int type, which is:

julia> typemax(Int)
9223372036854775807

use big numbers that we discussed in Exercise 3:

julia> @time sum(x -> big(x)^3, 1:10^6)
  0.833234 seconds (11.05 M allocations: 236.113 MiB, 23.77% gc time, 2.63% compilation time)
250000500000250000000000

Now we get a correct result, at the cost of slower computation.