summerschool_simtech_2023/material/1_mon/firststeps/firststeps_handout.qmd
2023-09-08 18:46:06 +00:00

418 lines
13 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
code-annotations: select
---
# First Steps
## Getting started
::: callout-tip
The [julia manual](https://docs.julialang.org/en/v1/manual/getting-started/) is excellent!
:::
At this point we assume that you have Julia 1.9 installed, VSCode ready, and installed the VSCode Julia plugin. There are some more [recommended settings in VSCode](vscode.qmd) which are not necessary, but helpful.
We further recommend to not use the small "play" button on the top right (which opens a new julia process everytime you change something), but rather open a new Julia repl (`ctrl`+`shift`+`p` => `>Julia: Start Repl`) which you keep open as long as possible.
::: callout-tip
VSCode automatically loads the `Revise.jl` package, which screens all your actively loaded packages/files and updates the methods instances whenever it detects a change. This is quite similar to `%autorelad 2` in python. If you use VSCode, you dont need to think about it, if you prefer a command line, you should put Revise.jl in your startup.jl file.
:::
## Syntax differences Python/R/MatLab
### In the beginning there was `nothing`
`nothing`- but also `NaN` and also `Missing`.
Each of those has a specific purpose, but most likely we will only need `a = nothing` and `b = NaN`
### Control Structures
**Matlab User?** Syntax will be *very* familiar.
**R User?** Forget about all the `{}` brackets
**Python User?** We don't need no intendation, and we also have 1-index
``` julia
myarray = zeros(6) # <1>
for k = 1:length(myarray) # <2>
if iseven(k)
myarray[k] = sum(myarray[1:k]) # <3>
elseif k == 5
myarray = myarray .- 1 # <4>
else
myarray[k] = 5
end # <5>
end
```
1. initialize a vector (check with `typeof(myArray)`)
2. Control-Structure for-loop. 1-index!
3. **MatLab**: Notice the `[` brackets to index Arrays!
4. **Python/R**: `.` always means elementwise
5. **Python/R**: `end` after each control sequence
### Functions
```julia
function myfunction(a,b=123;keyword1="defaultkeyword") #<1>
if keyword1 == "defaultkeyword"
c = a+b
else
c= a*b
end
return c
end
methods(myfunction) # <2>
myfunction(0)
myfunction(1;keyword1 = "notdefault")
myfunction(0,5)
myfunction(0,5;keyword1 = "notdefault")
```
1. everything before the `;` => positional, after => `kwargs`
2. returns two functions, due to the `b=123` optional positional argument
```julia
anonym = (x,y) -> x+y
anonym(3,4)
```
```julia
myshortfunction(x) = x^2
function mylongfunction(x)
return x^2
end
```
#### elementwise-function / broadcasting
Julia is very neat in regards of applying functions elementwise (also called broadcasting). (Matlab users know this already).
```julia
a = [1,2,3,4]
b = sqrt(a) # <1>
c = sqrt.(a) # <2>
```
1. Error - there is no method defined for the `sqrt` of an `Vector`
2. the small `.` applies the function to all elements of the container `a` - this works as "expected"
::: callout-important
Broadcasting is very powerful, as julia can get a huge performance boost in chaining many operations, without requiring saving temporary arrays. For example:
```julia
a = [1,2,3,4,5]
b = [6,7,8,9,10]
c = (a.^2 .+ sqrt.(a) .+ log.(a.*b))./5
```
In many languages (matlab, python, R) you would need to do the following:
```
1. temp1 = a.*b
2. temp2 = log.(temp1)
3. temp3 = a.^2
4. temp4 = sqrt.(a)
5. temp5 = temp3 .+ temp4
6. temp6 = temp5 + temp2
7. output = temp6./5
```
Thus, we need to allocate ~7x the memory of the vector (not at the same time though)
In Julia, the elementwise code above rather translates to:
```julia
c = similar(a) # <1>
for k = 1:length(a)
c[k] = (a[k]^2 + sqrt(a[k]) + log(a[k]*b[k]))./5
end
```
1. Function to initialize an `undef` array with the same size as `a`
The `temp` memory we need at each iteration is simply `c[k]`.
And a nice sideeffect: by doing this, we get rid of any specialized "serialized" function e.g. to do sum, or + or whatever. Those are typically the inbuilt `C` functions in python/matlab/R, that really speed up things. In Julia **we do not need inbuilt functions for speed**.
:::
## Style-conventions
| | |
| -- | -- |
| variables | lowercase, lower_case|
| Types,Modules | UpperCamelCase|
| functions, macro | lowercase |
| inplace / side-effects | `endwith!()` |
# Task 1.
Ok - lot of introduction, but I think you are ready for your first interactive task.
## Wait - how do I even run things in Julia/VScode?
Typically, you work in a Julia script ending in `scriptname.jl`
You concurrently have a REPL open, to not reload all packages etc. everytime. Further you typically have `Revise.jl` running in the background to automatically update your custom Packages / Modules (more to that later).
You can mark some code and execute it using `ctrl` + `enter` - you can also generate code-blocks using `#---` and run a whole code-block using `alt`+`enter`
1. Open a new script `statistic_functions.jl` in VSCode in a folder of your choice.
2. implement a function called `rse_sum`^[rse = research software engineering, we could use `sum` in a principled way, but it requires some knowledge you likely don't have right now]. This function should return `true` if provided with the following test: `res_sum(1:36) == 666`. You should further make use of a for-loop.
3. implement a second function called `rse_mean`, which calculates the mean of the provided vector. Make sure to use the `rse_sum` function! Test it using `res_mean(-15:17) == 1`
4. Next implement a standard deviation function `rse_std`: $\sqrt{\frac{\sum(x-mean(x))}{n-1}}$, this time you should use elementwise/broadcasting operators. Test it with `rse_std(1:3) == 1`
5. Finally, we will implement `rse_tstat`, returning the t-value with `length(x)-1` DF, that the provided Array actually has a mean of 0. Test it with `rse_tstat(2:3) == 5`. Add the keyword argument `σ` that allows the user to optionally provide a pre-calculated standard deviation.
Well done! You now have all functions defined with which we will continue our journey.
# Julia Basics - II
### Strings
```julia
character = 'a'
str = "abc"
str[3] # <1>
```
1. returns `c`
##### characters
```julia
'a':'f' #<1>
collect('a':'f') # <2>
join('a':'f') # <3>
```
1. a `StepRange` between characters
2. a `Array{Chars}`
3. a `String`
##### concatenation
```julia
a = "one"
b = "two"
ab = a * b # <1>
```
1. Indeed, `*` and not `+` - as plus implies from algebra that `a+b == b+a` which obviously is not true for string concatenation. But `a*b !== b*a` - at least for matrices.
##### substrings
```julia
str = "long string"
substr = SubString(str, 1, 4)
whereis_str = findfirst("str",str)
```
##### regexp
```julia
str = "any WORD written in CAPITAL?"
occursin(r"[A-Z]+", str) # <1>
m = match(r"[A-Z]+",str) # <2>
```
1. Returns `true`. Note the small `r` before the `r"regular expression"` - nifty!
2. Returns a `::RegexMatch` - access via `m.match` & `m.offset` (index) - or `m.captures` / `m.offsets` if you defined capture-groups
##### Interpolation
```julia
a = 123
str = "this is a: $a; this 2*a: $(2*a)"
```
## Scopes
All things (excepts modules) are in local scope (in scripts)
``` julia
a = 0
for k = 1:10
a = 1
end
a #<1>
```
1. a = 0! - in a script; but a = 1 in the REPL!
Variables are in global scope in the REPL for debugging convenience
::: callout-tip
Putting this code into a function automatically resolves this issue
```julia
function myfun()
a = 0
for k = 1:10
a = 1
end
a #<1>
return a
end
myfun() # <1>
```
1. returns 1 now in both REPL and include("myscript.jl")
:::
#### explicit global / local
``` julia
a = 0
global b
b = 0
for k = 1:10
local a
global b
a = 1
b = 1
end
a #<1>
b #<2>
```
1. a = 0
2. b = 1
#### Modifying containers works in any case
```julia
a = zeros(10)
for k = 1:10
a[k] = k
end
a #<1>
```
1. This works "correctly" in the `REPL` as well as in a script, because we modify the content of `a`, not `a` itself
## Types
Types play a super important role in Julia for several main reasons:
1) The allow for specialization e.g. `+(a::Int64,b::Float64)` might have a different (faster?) implementation compared to `+(a::Float64,b::Float64)`
2) They allow for generalization using `abstract` types
3) They act as containers, structuring your programs and tools
Everything in julia has a type! Check this out:
```julia
typeof(1)
typeof(1.0)
typeof(sum)
typeof([1])
typeof([(1,2),"5"])
```
----
We will discuss two types of types:
1) **`composite`** types
2) `abstract` types.
::: {.callout-tip collapse="true"}
## Click me for even more types!
There is a third type, `primitive type` - but we will practically never use them
Not much to say at this level, they are types like `Float64`. You could define your own one, e.g.
```julia
primitive type Float128 <: AbstractFloat 128 end
```
And there are two more, `Singleton types` and `Parametric types` - which (at least the latter), you might use at some point. But not in this tutorial.
:::
### composite types
You can think of these types as containers for your variables, which allows you for specialization.
```julia
struct SimulationResults
parameters::Vector
results::Vector
end
s = SimulationResults([1,2,3],[5,6,7,8,9,10,NaN])
function print(s::SimulationResults)
println("The following simulation was run:")
println("Parameters: ",s.parameters)
println("And we got results!")
println("Results: ",s.results)
end
print(s)
function SimulationResults(parameters) # <1>
results = run_simulation(parameters)
return SimulationResults(parameters,results)
end
function run_simulation(x)
return cumsum(repeat(x,2))
end
s = SimulationResults([1,2,3])
print(s)
```
1. in case not all fields are directly defined, we can provide an outer constructor (there are also inner constructors, but we will not discuss them here)
::: callout-warning
once defined, a type-definition in the global scope of the REPL cannot be re-defined without restarting the julia REPL! This is annoying, there are some tricks arround it (e.g. defining the type in a module (see below), and then reloading the module)
:::
# Task 2
1. Implement a type `StatResult` with fields for `x`, `n`, `std` and `tvalue`
2. Implement an outer constructor that can run `StatResult(2:10)` and return the full type including the calculated t-values.
3. Implement a function `length` for `StatResult` to multiple-dispatch on
4. **Optional:** If you have time, optimize the functions, so that mean, sum, length, std etc. is not calculated multiple times - you might want to rewrite your type. Note: This is a bit tricky :)
# Julia Basics III
## Modules
```julia
module MyStatsPackage
include("src/statistic_functions.jl")
export SimulationResults #<1>
export rse_tstat
end
using MyStatsPackage
```
1. This makes the `SimulationResults` type immediately available after running `using MyStatsPackage`. To use the other "internal" functions, one would use `MyStatsPackage.rse_sum`.
```julia
import MyStatsPackage
MyStatsPackage.rse_tstat(1:10)
import MyStatsPackage: rse_sum
rse_sum(1:10)
```
## Macros
Macros allow to programmers to edit the actual code **before** it is run. We will pretty much just use them, without learning how they work.
```julia
@which cumsum
@which(cumsum)
a = "123"
@show a
```
# Cheatsheets
## meta-tools
<!-- maybe move to own file "cheatsheets?" -->
| | Julia | Python |
|------------------------|------------------------|------------------------|
| Documentation | `?obj` | `help(obj)` |
| Object content | `dump(obj)` | `print(repr(obj))` |
| Exported functions | `names(FooModule)` | `dir(foo_module)` |
| List function signatures with that name | `methods(myFun)` | |
| List functions for specific type | `methodswith(SomeType)` | `dir(SomeType)` |
| Where is ...? | `@which func` | `func.__module__` |
| What is ...? | `typeof(obj)` | `type(obj)` |
| Is it really a ...? | `isa(obj, SomeType)` | `isinstance(obj, SomeType)` |
## debugging
|||
|--|--|
`@run sum(5+1)`| run debugger, stop at error/breakpoints
`@enter sum(5+1)` | enter debugger, dont start code yet
`@show variable` | prints: variable = variablecontent
`@debug variable` | prints only to debugger, very convient in combination with `>ENV["JULIA_DEBUG"] = ToBeDebuggedModule` (could be `Main` as well)