--- code-annotations: select --- # First Steps ## Getting started ::: callout-tip The [julia manual](https://docs.julialang.org/en/v1/manual/getting-started/) is excellent! ::: At this point, we assume that you have Julia 1.9 installed, VSCode ready, and installed the VSCode Julia Language Support extension. There are some more [recommended settings in VSCode](vscode.qmd) which are not necessary, but helpful. We further recommend to not use the small "play" button on the top right (which opens a new julia process everytime you change something), but rather open a new Julia repl (`ctrl`+`shift`+`p` => `>Julia: Start Repl`) which you keep open as long as possible. ::: callout-tip VSCode automatically loads the `Revise.jl` package, which screens all your actively loaded packages/files and updates the methods instances whenever it detects a change. This is quite similar to `%autoreload 2` in python. If you use VSCode, you dont need to think about it, if you prefer a command line, you should put Revise.jl in your startup.jl file. ::: ## Syntax differences Python/R/MatLab ### In the beginning there was `nothing` `nothing`- but also `NaN` and also `Missing`. Each of those has a specific purpose, but most likely we will only need `a = nothing` and `b = NaN`. ### Control Structures **Matlab User?** Syntax will be *very* familiar. **R User?** Forget about all the `{}` brackets. **Python User?** We don't need no intendation, and we also have 1-index. ``` julia myarray = zeros(6) # <1> for k = 1:length(myarray) # <2> if iseven(k) myarray[k] = sum(myarray[1:k]) # <3> elseif k == 5 myarray = myarray .- 1 # <4> else myarray[k] = 5 end # <5> end ``` 1. initialize a vector (check with `typeof(myArray)`) 2. Control-Structure for-loop. 1-index! 3. **MatLab**: Notice the `[` brackets to index Arrays! 4. **Python/R**: `.` always means elementwise 5. **Python/R**: `end` after each control sequence ### Functions ```julia function myfunction(a,b=123;keyword1="defaultkeyword") #<1> if keyword1 == "defaultkeyword" c = a+b else c = a*b end return c end methods(myfunction) # <2> myfunction(0) myfunction(1;keyword1 = "notdefault") myfunction(0,5) myfunction(0,5;keyword1 = "notdefault") ``` 1. everything before the `;` => positional, after => `kwargs` 2. returns two functions, due to the `b=123` optional positional argument ```julia anonym = (x,y) -> x+y anonym(3,4) ``` ```julia myshortfunction(x) = x^2 function mylongfunction(x) return x^2 end ``` #### Elementwise functions / broadcasting Julia is very neat in regards of applying functions elementwise (also called broadcasting). (Matlab users know this already.) ```julia a = [1,2,3,4] b = sqrt(a) # <1> c = sqrt.(a) # <2> ``` 1. Error - there is no method defined for the `sqrt` of a `Vector` 2. the small `.` applies the function to all elements of the container `a` - this works as "expected" ::: callout-important Broadcasting is very powerful, as Julia can get a huge performance boost in chaining many operations, without requiring saving temporary arrays. For example: ```julia a = [1,2,3,4,5] b = [6,7,8,9,10] c = (a.^2 .+ sqrt.(a) .+ log.(a.*b))./5 ``` In many languages (Matlab, Python, R) you would need to do the following: ``` 1. temp1 = a.*b 2. temp2 = log.(temp1) 3. temp3 = a.^2 4. temp4 = sqrt.(a) 5. temp5 = temp3 .+ temp4 6. temp6 = temp5 + temp2 7. output = temp6./5 ``` Thus, we need to allocate ~7x the memory of the vector (not at the same time though). In Julia, the elementwise code above rather translates to: ```julia c = similar(a) # <1> for k = 1:length(a) c[k] = (a[k]^2 + sqrt(a[k]) + log(a[k]*b[k]))/5 end ``` 1. Function to initialize an `undef` array with the same size as `a` The `temp` memory we need at each iteration is simply `c[k]`. And a nice sideeffect: By doing this, we get rid of any specialized "serialized" function, e.g. to do sum, or + or whatever. Those are typically the inbuilt `C` functions in Python/Matlab/R, that really speed up things. In Julia **we do not need inbuilt functions for speed**. ::: ## Style-conventions | | | | -- | -- | | variables | lowercase, lower_case| | Types,Modules | UpperCamelCase| | functions, macro | lowercase | | inplace / side-effects | `endwith!()` | # Task 1. Ok - lot of introduction, but I think you are ready for your first interactive task. ## Wait - how do I even run things in Julia/VScode? Typically, you work in a Julia script ending in `scriptname.jl` You concurrently have a REPL open, to not reload all packages etc. everytime. Further you typically have `Revise.jl` running in the background to automatically update your custom Packages / Modules (more to that later). You can mark some code and execute it using `ctrl` + `enter` - you can also generate code-blocks using `#---` and run a whole code-block using `alt`+`enter` 1. Open a new script `statistic_functions.jl` in VSCode in a folder of your choice. 2. implement a function called `rse_sum`^[rse = research software engineering, we could use `sum` in a principled way, but it requires some knowledge you likely don't have right now]. This function should return `true` if provided with the following test: `res_sum(1:36) == 666`. You should further make use of a for-loop. 3. implement a second function called `rse_mean`, which calculates the mean of the provided vector. Make sure to use the `rse_sum` function! Test it using `res_mean(-15:17) == 1` 4. Next implement a standard deviation function `rse_std`: $\sqrt{\frac{\sum(x-mean(x))}{n-1}}$, this time you should use elementwise/broadcasting operators. Test it with `rse_std(1:3) == 1` 5. Finally, we will implement `rse_tstat`, returning the t-value with `length(x)-1` DF, that the provided Array actually has a mean of 0. Test it with `rse_tstat(2:3) == 5`. Add the keyword argument `σ` that allows the user to optionally provide a pre-calculated standard deviation. Well done! You now have all functions defined with which we will continue our journey. # Julia Basics - II ### Strings ```julia character = 'a' str = "abc" str[3] # <1> ``` 1. returns `c` ##### characters ```julia 'a':'f' #<1> collect('a':'f') # <2> join('a':'f') # <3> ``` 1. a `StepRange` between characters 2. a `Array{Chars}` 3. a `String` ##### concatenation ```julia a = "one" b = "two" ab = a * b # <1> ``` 1. Indeed, `*` and not `+` - as plus implies from algebra that `a+b == b+a` which obviously is not true for string concatenation. But `a*b !== b*a` - at least for matrices. ##### substrings ```julia str = "long string" substr = SubString(str, 1, 4) whereis_str = findfirst("str",str) ``` ##### regexp ```julia str = "any WORD written in CAPITAL?" occursin(r"[A-Z]+", str) # <1> m = match(r"[A-Z]+",str) # <2> ``` 1. Returns `true`. Note the small `r` before the `r"regular expression"` - nifty! 2. Returns a `::RegexMatch` - access via `m.match` & `m.offset` (index) - or `m.captures` / `m.offsets` if you defined capture-groups ##### Interpolation ```julia a = 123 str = "this is a: $a; this 2*a: $(2*a)" ``` ## Scopes All things (excepts modules) are in local scope (in scripts) ``` julia a = 0 for k = 1:10 a = 1 end a #<1> ``` 1. a = 0! - in a script; but a = 1 in the REPL! Variables are in global scope in the REPL for debugging convenience ::: callout-tip Putting this code into a function automatically resolves this issue ```julia function myfun() a = 0 for k = 1:10 a = 1 end a #<1> return a end myfun() # <1> ``` 1. returns 1 now in both REPL and include("myscript.jl") ::: #### explicit global / local ``` julia a = 0 global b b = 0 for k = 1:10 local a global b a = 1 b = 1 end a #<1> b #<2> ``` 1. a = 0 2. b = 1 #### Modifying containers works in any case ```julia a = zeros(10) for k = 1:10 a[k] = k end a #<1> ``` 1. This works "correctly" in the `REPL` as well as in a script, because we modify the content of `a`, not `a` itself ## Types Types play a super important role in Julia for several main reasons: 1) The allow for specialization e.g. `+(a::Int64,b::Float64)` might have a different (faster?) implementation compared to `+(a::Float64,b::Float64)` 2) They allow for generalization using `abstract` types 3) They act as containers, structuring your programs and tools Everything in julia has a type! Check this out: ```julia typeof(1) typeof(1.0) typeof(sum) typeof([1]) typeof([(1,2),"5"]) ``` ---- We will discuss two types of types: 1) **`composite`** types 2) `abstract` types. ::: {.callout-tip collapse="true"} ## Click me for even more types! There is a third type, `primitive type` - but we will practically never use them Not much to say at this level, they are types like `Float64`. You could define your own one, e.g. ```julia primitive type Float128 <: AbstractFloat 128 end ``` And there are two more, `Singleton types` and `Parametric types` - which (at least the latter), you might use at some point. But not in this tutorial. ::: ### composite types You can think of these types as containers for your variables, which allows you for specialization. ```julia struct SimulationResults parameters::Vector results::Vector end s = SimulationResults([1,2,3],[5,6,7,8,9,10,NaN]) function print(s::SimulationResults) println("The following simulation was run:") println("Parameters: ",s.parameters) println("And we got results!") println("Results: ",s.results) end print(s) function SimulationResults(parameters) # <1> results = run_simulation(parameters) return SimulationResults(parameters,results) end function run_simulation(x) return cumsum(repeat(x,2)) end s = SimulationResults([1,2,3]) print(s) ``` 1. in case not all fields are directly defined, we can provide an outer constructor (there are also inner constructors, but we will not discuss them here) ::: callout-warning once defined, a type-definition in the global scope of the REPL cannot be re-defined without restarting the julia REPL! This is annoying, there are some tricks arround it (e.g. defining the type in a module (see below), and then reloading the module) ::: # Task 2 1. Implement a type `StatResult` with fields for `x`, `n`, `std` and `tvalue` 2. Implement an outer constructor that can run `StatResult(2:10)` and return the full type including the calculated t-values. 3. Implement a function `length` for `StatResult` to multiple-dispatch on 4. **Optional:** If you have time, optimize the functions, so that mean, sum, length, std etc. is not calculated multiple times - you might want to rewrite your type. Note: This is a bit tricky :) # Julia Basics III ## Modules ```julia module MyStatsPackage include("src/statistic_functions.jl") export SimulationResults #<1> export rse_tstat end using MyStatsPackage ``` 1. This makes the `SimulationResults` type immediately available after running `using MyStatsPackage`. To use the other "internal" functions, one would use `MyStatsPackage.rse_sum`. ```julia import MyStatsPackage MyStatsPackage.rse_tstat(1:10) import MyStatsPackage: rse_sum rse_sum(1:10) ``` ## Macros Macros allow to programmers to edit the actual code **before** it is run. We will pretty much just use them, without learning how they work. ```julia @which cumsum @which(cumsum) a = "123" @show a ``` # Cheatsheets ## meta-tools | | Julia | Python | |------------------------|------------------------|------------------------| | Documentation | `?obj` | `help(obj)` | | Object content | `dump(obj)` | `print(repr(obj))` | | Exported functions | `names(FooModule)` | `dir(foo_module)` | | List function signatures with that name | `methods(myFun)` | | | List functions for specific type | `methodswith(SomeType)` | `dir(SomeType)` | | Where is ...? | `@which func` | `func.__module__` | | What is ...? | `typeof(obj)` | `type(obj)` | | Is it really a ...? | `isa(obj, SomeType)` | `isinstance(obj, SomeType)` | ## debugging ||| |--|--| `@run sum(5+1)`| run debugger, stop at error/breakpoints `@enter sum(5+1)` | enter debugger, dont start code yet `@show variable` | prints: variable = variablecontent `@debug variable` | prints only to debugger, very convient in combination with `>ENV["JULIA_DEBUG"] = ToBeDebuggedModule` (could be `Main` as well)