summerschool_simtech_2023/material/1_mon/firststeps/firststeps_handout.qmd

445 lines
11 KiB
Plaintext
Raw Normal View History

---
2023-09-13 14:40:46 +02:00
---
# First Steps
## Getting started
::: callout-tip
The [julia manual](https://docs.julialang.org/en/v1/manual/getting-started/) is excellent!
:::
2023-09-13 16:24:36 +02:00
At this point we assume that you have [Julia 1.9 installed](../../../installation/vscode.qmd), VSCode language extension ready, and installed the VSCode Julia plugin. There are some more [recommended settings in VSCode](../../../installation/vscode.qmd) which are not necessary, but helpful.
We further recommend to not use the small "play" button on the top right (which opens a new julia process everytime you change something), but rather open a new Julia repl (`ctrl`+`shift`+`p` => `>Julia: Start Repl`) which you keep open as long as possible.
::: callout-tip
2023-09-13 16:24:36 +02:00
VSCode automatically loads the `Revise.jl` package, which screens all your actively loaded packages/files and updates the methods instances whenever it detects a change. This is quite similar to `%autoreload 2` in ipython. If you use VSCode, you dont need to think about it, if you prefer a command line, you should put Revise.jl in your startup.jl file.
:::
## Syntax differences Python/R/MatLab
### In the beginning there was `nothing`
`nothing`- but also `NaN` and also `Missing`.
2023-09-13 15:03:14 +02:00
Each of those has a specific purpose, but most likely we will only need `a = nothing` and `b = NaN`.
### Control Structures
**Matlab User?** Syntax will be *very* familiar.
2023-09-13 15:03:14 +02:00
**R User?** Forget about all the `{}` brackets.
2023-09-13 15:03:14 +02:00
**Python User?** We don't need no intendation, and we also have 1-index.
``` julia
2023-09-13 15:03:14 +02:00
myarray = zeros(6) # <1>
for k = 1:length(myarray) # <2>
if iseven(k)
myarray[k] = sum(myarray[1:k]) # <3>
elseif k == 5
myarray = myarray .- 1 # <4>
else
myarray[k] = 5
end # <5>
end
```
1. initialize a vector (check with `typeof(myArray)`)
2. Control-Structure for-loop. 1-index!
3. **MatLab**: Notice the `[` brackets to index Arrays!
4. **Python/R**: `.` always means elementwise
5. **Python/R**: `end` after each control sequence
### Functions
2023-09-13 15:03:14 +02:00
```julia
2023-09-13 15:03:14 +02:00
function myfunction(a,b=123;keyword1="defaultkeyword") #<1>
if keyword1 == "defaultkeyword"
c = a+b
else
c = a*b
end
2023-09-13 15:03:14 +02:00
return c
end
methods(myfunction) # <2>
myfunction(0)
myfunction(1;keyword1 = "notdefault")
myfunction(0,5)
myfunction(0,5;keyword1 = "notdefault")
```
2023-09-13 15:03:14 +02:00
1. everything before the `;` => positional, after => `kwargs`
2023-09-13 16:22:46 +02:00
2. List all methods with that function name - returns two functions, due to the `b=123` optional positional argument
2023-09-28 10:01:28 +02:00
::: callout-tip
Terminology function vs. method: Methods are instantiations of an abstract `function`
:::
```julia
anonym = (x,y) -> x+y
anonym(3,4)
```
```julia
myshortfunction(x) = x^2
function mylongfunction(x)
return x^2
end
```
2023-09-13 16:22:46 +02:00
```julia
myfunction(args...;kwargs...) = myotherfunction(newarg,args...;kwargs...)
```
#### Excourse: splatting & slurping
2023-09-28 10:01:28 +02:00
2023-09-13 16:22:46 +02:00
Think of it as unpacking / collecting something
```julia
a = [1,2,3]
+(a)
+(a...) # <1>
```
1. equivalent to `+(1,2,3)`
#### elementwise-function / broadcasting
2023-09-28 10:01:28 +02:00
2023-09-13 16:22:46 +02:00
Julia is very neat in regards of applying functions elementwise (also called broadcasting).
```julia
a = [1,2,3,4]
b = sqrt(a) # <1>
c = sqrt.(a) # <2>
```
2023-09-13 15:03:14 +02:00
1. Error - there is no method defined for the `sqrt` of a `Vector`
2. the small `.` applies the function to all elements of the container `a` - this works as "expected"
::: callout-important
2023-09-13 15:03:14 +02:00
Broadcasting is very powerful, as Julia can get a huge performance boost in chaining many operations, without requiring saving temporary arrays. For example:
```julia
a = [1,2,3,4,5]
b = [6,7,8,9,10]
c = (a.^2 .+ sqrt.(a) .+ log.(a.*b))./5
```
2023-09-13 15:03:14 +02:00
In many languages (Matlab, Python, R) you would need to do the following:
```
1. temp1 = a.*b
2. temp2 = log.(temp1)
3. temp3 = a.^2
4. temp4 = sqrt.(a)
5. temp5 = temp3 .+ temp4
6. temp6 = temp5 + temp2
7. output = temp6./5
```
2023-09-13 15:03:14 +02:00
Thus, we need to allocate ~7x the memory of the vector (not at the same time though).
In Julia, the elementwise code above rather translates to:
```julia
c = similar(a) # <1>
for k = 1:length(a)
2023-09-13 15:03:14 +02:00
c[k] = (a[k]^2 + sqrt(a[k]) + log(a[k]*b[k]))/5
end
```
1. Function to initialize an `undef` array with the same size as `a`
The `temp` memory we need at each iteration is simply `c[k]`.
2023-09-13 15:03:14 +02:00
And a nice sideeffect: By doing this, we get rid of any specialized "serialized" function, e.g. to do sum, or + or whatever. Those are typically the inbuilt `C` functions in Python/Matlab/R, that really speed up things. In Julia **we do not need inbuilt functions for speed**.
:::
2023-09-13 16:57:26 +02:00
## Linear Algebra
2023-09-28 10:01:28 +02:00
2023-09-13 16:57:26 +02:00
```julia
import LinearAlgebra # <1>
import LinearAlgebra: qr
using LinearAlgebra # <2>
```
1. Requires to write `LinearAlgebra.QR(...)` to access a function
2. `LinearAlgebra` is a `Base` package, and always available
2023-09-28 10:01:28 +02:00
::: callout-tip
Julia typically recommends to use `using PackageNames`. Name-space polution is not a problem, as the package manager will never silently overwrite an already existing method - it will always ask the user to specify in those cases (different to R: shows a warning, or Python: just goes on with life as if nothing happened)
:::
2023-09-13 16:57:26 +02:00
```julia
A = Matrix{Float64}(undef,11,22) # <1>
B = Array{Float64,2}(undef,22,33) # <2>
qr(A*B)
```
1. equivalent to `Array`, as `Matrix` is a convenience type-alias for `Array` with 2 dimensions. Same thing for `Vector`.
2. the `2` of `{Float64,2}` is not mandatory
2023-09-28 10:01:28 +02:00
Much more on Wednesday in the lecture `LinearAlgebra`!
## Style-conventions
| | |
| -- | -- |
| variables | lowercase, lower_case|
| Types,Modules | UpperCamelCase|
| functions, macro | lowercase |
2023-09-13 16:22:46 +02:00
| inplace / side-effects | `endwith!()`^[A functionname ending with a `!` indicates that inplace operations will occur / side-effects are possible. This is convention only, but in 99% of cases adopted] |
2023-09-13 14:30:49 +02:00
# Task 1
2023-09-28 10:01:28 +02:00
Ok - lot of introduction, but I think you are ready for your first interactive task.
2023-09-28 10:01:28 +02:00
Follow [Task 1 here](tasks.qmd#1).
# Julia Basics - II
2023-09-28 10:01:28 +02:00
## Strings
```julia
character = 'a'
str = "abc"
str[3] # <1>
```
1. returns `c`
2023-09-28 10:01:28 +02:00
### characters
```julia
'a':'f' #<1>
collect('a':'f') # <2>
join('a':'f') # <3>
```
1. a `StepRange` between characters
2. a `Array{Chars}`
3. a `String`
2023-09-28 10:01:28 +02:00
### concatenation
```julia
a = "one"
b = "two"
ab = a * b # <1>
```
1. Indeed, `*` and not `+` - as plus implies from algebra that `a+b == b+a` which obviously is not true for string concatenation. But `a*b !== b*a` - at least for matrices.
2023-09-28 10:01:28 +02:00
### substrings
```julia
str = "long string"
substr = SubString(str, 1, 4)
whereis_str = findfirst("str",str)
```
2023-09-28 10:01:28 +02:00
## regexp
```julia
str = "any WORD written in CAPITAL?"
occursin(r"[A-Z]+", str) # <1>
m = match(r"[A-Z]+",str) # <2>
```
1. Returns `true`. Note the small `r` before the `r"regular expression"` - nifty!
2. Returns a `::RegexMatch` - access via `m.match` & `m.offset` (index) - or `m.captures` / `m.offsets` if you defined capture-groups
2023-09-28 10:01:28 +02:00
## Interpolation
```julia
a = 123
str = "this is a: $a; this 2*a: $(2*a)"
```
## Scopes
2023-09-28 10:01:28 +02:00
All things (excepts modules) are in local scope (in scripts)
``` julia
a = 0
for k = 1:10
a = 1
end
a #<1>
```
1. a = 0! - in a script; but a = 1 in the REPL!
Variables are in global scope in the REPL for debugging convenience
::: callout-tip
Putting this code into a function automatically resolves this issue
```julia
function myfun()
a = 0
for k = 1:10
a = 1
end
a #<1>
return a
end
myfun() # <1>
```
1. returns 1 now in both REPL and include("myscript.jl")
:::
2023-09-28 10:01:28 +02:00
### explicit global / local
``` julia
a = 0
global b
b = 0
for k = 1:10
local a
global b
a = 1
b = 1
end
a #<1>
b #<2>
```
1. a = 0
2. b = 1
2023-09-28 10:01:28 +02:00
### Modifying containers works in any case
```julia
a = zeros(10)
for k = 1:10
a[k] = k
end
a #<1>
```
1. This works "correctly" in the `REPL` as well as in a script, because we modify the content of `a`, not `a` itself
## Types
2023-09-28 10:01:28 +02:00
Types play a super important role in Julia for several main reasons:
1) The allow for specialization e.g. `+(a::Int64,b::Float64)` might have a different (faster?) implementation compared to `+(a::Float64,b::Float64)`
2) They allow for generalization using `abstract` types
3) They act as containers, structuring your programs and tools
Everything in julia has a type! Check this out:
2023-09-28 10:01:28 +02:00
```julia
typeof(1)
typeof(1.0)
typeof(sum)
typeof([1])
typeof([(1,2),"5"])
```
----
We will discuss two types of types:
1) **`composite`** types
2) `abstract` types.
::: {.callout-tip collapse="true"}
## Click me for even more types!
There is a third type, `primitive type` - but we will practically never use them
Not much to say at this level, they are types like `Float64`. You could define your own one, e.g.
```julia
primitive type Float128 <: AbstractFloat 128 end
```
And there are two more, `Singleton types` and `Parametric types` - which (at least the latter), you might use at some point. But not in this tutorial.
:::
### composite types
2023-09-28 10:01:28 +02:00
You can think of these types as containers for your variables, which allows you for specialization.
```julia
struct SimulationResults
parameters::Vector
results::Vector
end
s = SimulationResults([1,2,3],[5,6,7,8,9,10,NaN])
function print(s::SimulationResults)
println("The following simulation was run:")
println("Parameters: ",s.parameters)
println("And we got results!")
println("Results: ",s.results)
end
print(s)
function SimulationResults(parameters) # <1>
results = run_simulation(parameters)
return SimulationResults(parameters,results)
end
function run_simulation(x)
return cumsum(repeat(x,2))
end
s = SimulationResults([1,2,3])
print(s)
```
1. in case not all fields are directly defined, we can provide an outer constructor (there are also inner constructors, but we will not discuss them here)
::: callout-warning
once defined, a type-definition in the global scope of the REPL cannot be re-defined without restarting the julia REPL! This is annoying, there are some tricks arround it (e.g. defining the type in a module (see below), and then reloading the module)
:::
# Task 2
2023-09-28 10:01:28 +02:00
2023-09-13 16:22:46 +02:00
Follow [Task 2 here](tasks.qmd#2)
# Julia Basics III
## Modules
```julia
module MyStatsPackage
include("src/statistic_functions.jl")
export SimulationResults #<1>
export rse_tstat
end
using MyStatsPackage
```
1. This makes the `SimulationResults` type immediately available after running `using MyStatsPackage`. To use the other "internal" functions, one would use `MyStatsPackage.rse_sum`.
```julia
import MyStatsPackage
MyStatsPackage.rse_tstat(1:10)
import MyStatsPackage: rse_sum
rse_sum(1:10)
```
## Macros
2023-09-28 10:01:28 +02:00
Macros allow to programmers to edit the actual code **before** it is run. We will pretty much just use them, without learning how they work.
```julia
@which cumsum
@which(cumsum)
a = "123"
@show a
```
2023-09-28 10:01:28 +02:00
2023-09-28 14:51:54 +02:00
2023-09-13 16:57:26 +02:00
## Debugging
2023-09-28 10:01:28 +02:00
2023-09-28 14:51:54 +02:00
### Debug messages
```julia
@debug "my debug message"
ENV["JULIA_DEBUG"] = Main
ENV["JULIA_DEBUG"] = MyPackage
```
### Debugger proper:
[Cheatsheet for debugging](../../../cheatsheets/julia.qmd#debugging)
In most cases, `@run myFunction(1,2,3)` is sufficient.