diff --git a/_quarto.yml b/_quarto.yml index 117e3b4..4db53c5 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -47,6 +47,8 @@ website: text: "📝 3 - First Steps: Handout" - href: "material/1_mon/firststeps/tasks.qmd" text: "🛠 3 - First Steps: Exercises" + - href: "https://github.com/s-ccs/summerschool_simtech_2023/blob/main/material/1_mon/firststeps/statistic_functions.jl" + text: "✔ 3 - Solutions" - href: "material/1_mon/envs/envs_handout.qmd" text: "📝 4 - Envs & Pkgs : Handout" - href: "material/1_mon/envs/tasks.qmd" @@ -57,9 +59,9 @@ website: text: "📝 1 - Advanced Git and Contributing" - href: "material/2_tue/git/tasks.qmd" text: "🛠 1 - Git: Exercises" - - href: "material/2_tue/testing/slides.qmd" + - href: "material/2_tue/testing/slides.md" text: "📝 2 - Testing" - - href: "material/2_tue/CI/missing.qmd" + - href: "material/2_tue/ci/slides.md" text: "📝 3 - Continuous Integration" - href: material/2_tue/codereview/slides.qmd text: "📝 4 - Code Review" diff --git a/cheatsheets/git.qmd b/cheatsheets/git.qmd index e69de29..becee2f 100644 --- a/cheatsheets/git.qmd +++ b/cheatsheets/git.qmd @@ -0,0 +1 @@ +There are many good ones out there. One we can recommend is the [one from GitHub](https://education.github.com/git-cheat-sheet-education.pdf). diff --git a/cheatsheets/githubactions.qmd b/cheatsheets/githubactions.qmd index e69de29..ac8eaee 100644 --- a/cheatsheets/githubactions.qmd +++ b/cheatsheets/githubactions.qmd @@ -0,0 +1 @@ +Also [one from GitHub](https://github.github.io/actions-cheat-sheet/actions-cheat-sheet.pdf) diff --git a/index.qmd b/index.qmd index 3f276e2..2dbeb9d 100644 --- a/index.qmd +++ b/index.qmd @@ -22,6 +22,9 @@ Seminar room in the groundfloor (directly at the entrance) [Link to map](https://www.simtech.uni-stuttgart.de/events/simtech-summer-school/SuSch_2/location/) + +#### ⚗ Advanced topics +We probably have some time to discuss advanced topics towards the end of the summers school. You are welcome to send an email to benedikt and/or put it into [the git issue](https://github.com/s-ccs/summerschool_simtech_2023/issues/15) ---- We wish you all a interesting, safe and fun summerschool. If there are any interpersonal issues (especially regarding [code-of-conduct](https://www.uni-stuttgart.de/en/university/profile/diversity/code-of-conduct/)), please directly contact [Benedikt Ehinger](benedikt.ehinger@vis.uni-stuttgart.de)^[If there are problem with him, please contact **Marco Oesting**]. For organizational issues, please contact [Sina Schorndorfer](sina.schorndorfer@imsb.uni-stuttgart.de) diff --git a/material/1_mon/envs/tasks.qmd b/material/1_mon/envs/tasks.qmd index be6e862..0af7d10 100644 --- a/material/1_mon/envs/tasks.qmd +++ b/material/1_mon/envs/tasks.qmd @@ -3,6 +3,22 @@ 2. Add your `statistic.jl` & "include" it. 3. Export all functions 4. Create a new environment in a separate folder and add the package. -5. Does `using MyStatsPackage` work now? :tada: congratulations! -6. Go back to your package environment. Now add a dependency (e.g. ProgressMeter) and a `compat`-entry -7. Go back to your project environment, has the dependency been updated? Think: should you use `resolve` or `instantiate`? \ No newline at end of file +5. Does `using MyStatsPackage` work now? + +:::{.callout collapse=true} +## Yes! +:tada: congratulations! +::: + +:::{.callout collapse=true} +## No! +Oh no, better check you activated the right environment - ask for help! +::: +6. Go back to your package environment. Now add a dependency (e.g. ProgressMeter.jl) and a `compat`-entry +7. Go back to your project environment, has the + dependency been updated? + +:::{.callout collapse=true} +## Hint? +Should you use `resolve` or `instantiate`? +::: \ No newline at end of file diff --git a/material/1_mon/firststeps/tasks.qmd b/material/1_mon/firststeps/tasks.qmd index 05c4cb7..5794768 100644 --- a/material/1_mon/firststeps/tasks.qmd +++ b/material/1_mon/firststeps/tasks.qmd @@ -10,13 +10,13 @@ You can mark some code and execute it using `ctrl` + `enter` - you can also gene ## The exercise 1. Open a new script `statistic_functions.jl` in VSCode in a folder of your choice. -2. implement a function called `rse_sum`^[rse = research software engineering, we could use `sum` in a principled way, but it requires some knowledge you likely don't have right now]. This function should return `true` if provided with the following test: `res_sum(1:36) == 666`. You should further make use of a for-loop. +2. implement a function called `rse_sum`^[rse = research software engineering, we could use `sum` in a principled way, but it requires some knowledge you likely don't have right now]. This function should return `true` if provided with the following test: `rse_sum(1:36) == 666`. You should further make use of a for-loop. -3. implement a second function called `rse_mean`, which calculates the mean of the provided vector. Make sure to use the `rse_sum` function! Test it using `res_mean(-15:17) == 1` +3. implement a second function called `rse_mean`, which calculates the mean of the provided vector. Make sure to use the `rse_sum` function! Test it using `rse_mean(-15:17) == 1` -4. Next implement a standard deviation function `rse_std`: $\sqrt{\frac{\sum(x-mean(x))}{n-1}}$, this time you should use elementwise/broadcasting operators. Test it with `rse_std(1:3) == 1` +4. Next implement a standard deviation function `rse_std`: $\sqrt{\frac{\sum((x-mean(x))^2)}{n-1}}$, this time you should use elementwise/broadcasting operators. Test it with `rse_std(1:3) == 1.` -5. Finally, we will implement `rse_tstat`, returning the t-value with `length(x)-1` DF, that the provided Array actually has a mean of 0. Test it with `rse_tstat(2:3) == 5`. Add the keyword argument `σ` that allows the user to optionally provide a pre-calculated standard deviation. +5. Finally, we will implement `rse_tstat`, returning the t-value with `length(x)-1` DF, that the provided Array actually has a mean of 0. The formula is $\frac{mean(x)}{std(x) / (sqrt(length(x)))}$ Test it with `rse_tstat(2:3) == 5.`. Add the keyword argument `σ` that allows the user to optionally provide a pre-calculated standard deviation. Well done! You now have all functions defined with which we will continue our journey. diff --git a/material/1_mon/rse/rse_basics_slides.qmd b/material/1_mon/rse/rse_basics_slides.qmd index af57c9c..e9d119e 100644 --- a/material/1_mon/rse/rse_basics_slides.qmd +++ b/material/1_mon/rse/rse_basics_slides.qmd @@ -2,6 +2,9 @@ format: revealjs: output-file: rse_basics_slides_revealjs.html + scrollable: true + progress: true + history: false html: default --- diff --git a/material/2_tue/ci/slides.md b/material/2_tue/ci/slides.md new file mode 100644 index 0000000..b74324a --- /dev/null +++ b/material/2_tue/ci/slides.md @@ -0,0 +1,395 @@ +--- +type: slide +slideOptions: + transition: slide + width: 1400 + height: 900 + margin: 0.1 +--- + + + +# Learning Goals + +- Name and explain common workflows to automate in RSE. +- Explain the differences between the various continuous methodologies. +- Explain why automation is crucial in RSE. +- Write and understand basic automation scripts for GitHub Actions. + - s.t. we understand what `PkgTemplates` generates for us. + + +Material is taken and modified from the [SSE lecture](https://github.com/Simulation-Software-Engineering/Lecture-Material). + +--- + +# 1. Workflow Automation + +--- + +## Why Automation? + +- Automatize tasks + - Run tests frequently, give feedback early etc. + - Ensure reproducible test environments + - Cannot forget automatized tasks + - Less burden to developer (and their workstation) + - Avoid manual errors +- Process often integrated in development workflow + - Example: Support by Git hooks or Git forges + +--- + +## Typical Automation Tasks in RSE + +- Check code formatting and quality +- Compile and test code for different platforms +- Generate coverage reports and visualization +- Build documentation and deploy it +- Build, package, and upload releases + +--- + +## Continuous Methodologies (1/2) + +- **Continuous Integration** (CI) + - Continuously integrate changes into "main" branch + - Avoids "merge hell" + - Relies on testing and checking code continuously + - Should be automatized + +--- + +## Continuous Methodologies (2/2) + +- **Continuous Delivery** (CD) + - Software is in a state that allows new release at any time + - Software package is built + - Actual release triggered manually +- **Continuous Deployment** (CD) + - Software is in a state that allows new release at any time + - Software package is built + - Actual release triggered automatically (continuously) + +--- + +## Automation Services/Software + +- [GitHub Actions](https://github.com/features/actions) +- [GitLab CI/CD](https://docs.gitlab.com/ee/ci/) +- [Circle CI](https://circleci.com/) +- [Travis CI](https://www.travis-ci.com/) +- [Jenkins](https://www.jenkins.io/) +- ... + +--- + +# 2. GitHub Actions + +--- + +## What is "GitHub Actions"? + +> Automate, customize, and execute your software development workflows right in your repository with GitHub Actions. + +From: [https://docs.github.com/en/actions](https://docs.github.com/en/actions) + +--- + +## General Information + +- Usage of GitHub's runners is [limited](https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits) +- Available for public repositories or accounts with subscription +- By default Actions run on GitHub's runners + - Linux, Windows, or MacOS +- Quickly evolving and significant improvements in recent years + +--- + +## Components (1/2) + +- [Workflow](https://docs.github.com/en/actions/using-workflows): Runs one or more jobs +- [Event](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows): Triggers a workflow +- [Jobs](https://docs.github.com/en/actions/using-jobs): Set of steps (running on same runner) + - Steps executed consecutively and share data + - Jobs by default executed in parallel +- [Action](https://docs.github.com/en/actions/creating-actions): Application performing common, complex task (step) often used in workflows +- [Runner](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions#runners): Server that runs jobs +- [Artifacts](https://docs.github.com/en/actions/learn-github-actions/essential-features-of-github-actions#sharing-data-between-jobs): Files to be shared between jobs or to be kept after workflow finishes + +--- + +## Components (2/2) + + + + +From [GitHub Actions tutorial](https://docs.github.com/en/actions) + +--- + +## Setting up a Workflow + +- Workflow file files stored `${REPO_ROOT}/.github/workflows` +- Configured via YAML file + +```yaml +name: learn-github-actions +on: [push] +jobs: + check-bats-version: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - uses: actions/setup-node@v2 + with: + node-version: '14' + - run: npm install -g bats + - run: bats -v +``` + +--- + +## Actions + +```yaml +- uses: actions/checkout@v3 +- uses: actions/setup-node@v2 + with: + node-version: '14' +``` + +- Integrated via `uses` directive +- Additional configuration via `with` (options depend on Action) +- Find actions in [marketplace](https://github.com/marketplace?type=actions) +- Write [own actions](https://docs.github.com/en/actions/creating-actions) + +--- + +## Some Useful Julia Actions + +- Find on [gitHub.com/julia-actions](https://github.com/julia-actions/) + + ``` + - uses: julia-actions/setup-julia@v1 + with: + version: '1.9' + ``` + +- More: + - `cache`: caches `~/.julia/artifacts/*` and `~/.julia/packages/*` to reduce runtime of CI + - `julia-buildpkg`: build package + - `julia-runtest`: run tests + - `julia-format`: format code + +--- + +## User-specified Commands + +```yaml +- name: "Single line command" + run: echo "Single line command" +- name: "Multi line command" + run: | + echo "First line" + echo "Second line. Directory ${PWD}" + workdir: tmp/ + shell: bash +``` + +--- + +## Events + +- Single or multiple events + + ```yaml + on: [push, fork] + ``` + +- Activities + + ```yaml + on: + issue: + types: + - opened + - labeled + ``` + +- Filters + + ```yaml + on: + push: + branches: + - main + - 'releases/**' + ``` + +--- + +## Artifacts + +- Data sharing between jobs and data upload +- Uploading artifact + + ```yaml + - name: "Upload artifact" + uses: actions/upload-artifact@v2 + with: + name: my-artifact + path: my_file.txt + retention-days: 5 + ``` + +- Downloading artifact + + ```yaml + - name: "Download a single artifact" + uses: actions/download-artifact@v2 + with: + name: my-artifact + ``` + + **Note**: Drop name to download all artifacts + +--- + +## Test Actions Locally + +- [act](https://github.com/nektos/act) +- Relies extensively on Docker + - User should be in `docker` group +- Run `act` from root of the repository + + ```text + act (runs all workflows) + act --job WORKFLOWNAME + ``` + +- Environment is not 100% identical to GitHub's + - Workflows may fail locally, but work on GitHub + +--- + +## Further Reading + +- [What is Continuous Integration?](https://www.atlassian.com/continuous-delivery/continuous-integration) +- [GitHub Actions documentation](https://docs.github.com/en/actions) +- [GitHub Actions quickstart](https://docs.github.com/en/actions/quickstart) + +--- + +# 3. Demo: Automation with GitHub Actions + +--- + +## Setting up a Test Job + +- Import [Julia test package repository](https://github.com/uekerman/JuliaTestPackage) (the same code we used for testing) +- Set up workflow file + + ```bash + mkdir -p .github/workflows + cd .github/workflows + vi format-check.yml + ``` + +- Let's check whether our code is formatted correctly. Edit `format-check.yml` to have following content + + ```yaml + name: format-check + + on: [push, pull_request] + + jobs: + format: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + - uses: julia-actions/setup-julia@v1 + with: + version: '1.9' + - name: Install JuliaFormatter and format + run: | + julia -e 'using Pkg; Pkg.add(PackageSpec(name="JuliaFormatter"))' + julia -e 'using JuliaFormatter; format(".", verbose=true)' + - name: Format check + run: | + julia -e ' + out = Cmd(`git diff --name-only`) |> read |> String + if out == "" + exit(0) + else + @error "Some files have not been formatted" + write(stdout, out) + exit(1) + end' + ``` + +- `runs-on` does **not** refer to a Docker container, but to a runner tag. +- Add, commit, push +- After the push, inspect "Action" panel on GitHub repository + - GitHub will schedule a run (yellow dot) + - Hooray. We have set up our first action. +- Failing test example: + - Edit settings on GitHub that one can only merge if all tests pass: + - Settings -> Branches -> Branch protection rule + - Choose `main` branch + - Enable "Require status checks to pass before merging". Optionally enable "Require branches to be up to date before merging" + - Choose status checks that need to pass: `test` + - Click on "Create" at bottom of page. + - Create a new branch `break-code`. + - Edit some file, violate the formatting, commit it and push it to the branch. Afterwards open a new PR and inspect the failing test. We are also not able to merge the changes as the "Merge" button should be inactive. + +--- + +## act Demo + +- `act` is for quick checks while developing workflows, not for developing the code +- Check available jobs (at root of repository) + + ```bash + act -l + ``` + +- Run jobs for `push` event (default event) + + ```bash + act + ``` + +- Run a specific job + + ```bash + act -j test + ``` + +--- + +# 4. Exercise + +Set up GitHub Actions for your statistics package. They should format your code and run the tests. To structure and parallelize things, you could use two separate jobs. diff --git a/material/2_tue/git/slides.md b/material/2_tue/git/slides.md index f1540fc..9fd9320 100644 --- a/material/2_tue/git/slides.md +++ b/material/2_tue/git/slides.md @@ -32,7 +32,7 @@ slideOptions: } -## Learning Goals of the Git Lecture +# Learning Goals - Refresh and organize students' existing knowledge on Git (learn how to learn more). - Students can explain difference between merge and rebase and when to use what. @@ -232,7 +232,6 @@ Which level do you have? - [Official documentation](http://git-scm.com/doc) - [Video: Git in 15 minutes: basics, branching, no remote](https://www.youtube.com/watch?v=USjZcfj8yxE) -- [The GitHub Blog: Commits are snapshots, not diffs](https://github.blog/2020-12-17-commits-are-snapshots-not-diffs/) - Chapters [6](https://merely-useful.tech/py-rse/git-cmdline.html) and [7](https://merely-useful.tech/py-rse/git-advanced.html) of Research Software Engineering with Python - [Podcast All Things Git: History of VC](https://www.allthingsgit.com/episodes/the_history_of_vc_with_eric_sink.html) - [git purr](https://girliemac.com/blog/2017/12/26/git-purr/) diff --git a/material/2_tue/git/tasks.qmd b/material/2_tue/git/tasks.qmd index 94b9a45..0ecbf26 100644 --- a/material/2_tue/git/tasks.qmd +++ b/material/2_tue/git/tasks.qmd @@ -3,6 +3,7 @@ 1. Work with any forge that you like and create a user account (we strongly recommend GitHub since we will need it later again). 2. Push your package `MyStatsPackage` to a remote repository. 3. Add a function `printOwner` to the package through a pull request. The function should print your (GitHub) user name (hard-coded). -4. Use the package from somebody else in the classroom and verify with `printOwner` that you use the correct package. -5. Fork this other package and contribute a function `printContributor` to it via a PR. Get a review and get it merged. -6. Add more functions to other packages of classmates that print funny things, but always ensure a linear history. +4. Start a new Julia environment and use your package through its url: `]add https://github.com/[username]/MyStatsPackage`. +5. Now use the package from somebody else in the classroom instead and verify with `printOwner` that you use the correct package. +6. Fork this other package and contribute a function `printContributor` to it via a PR. Get a review and get it merged. +7. Add more functions to other packages of classmates that print funny things, but always ensure a linear history. diff --git a/material/2_tue/testing/slides.qmd b/material/2_tue/testing/slides.md similarity index 50% rename from material/2_tue/testing/slides.qmd rename to material/2_tue/testing/slides.md index 96a877d..cde26dd 100644 --- a/material/2_tue/testing/slides.qmd +++ b/material/2_tue/testing/slides.md @@ -1,16 +1,44 @@ - --- -format: revealjs - +type: slide +slideOptions: + transition: slide + width: 1400 + height: 900 + margin: 0.1 --- + + # Learning Goals -- Justify the effort of developing tests to some extent - Get to know a few common terms of testing - Work with the Julia unit testing package `Test.jl` -Material is taken and modified, on the one hand, from the [SSE lecture](https://github.com/Simulation-Software-Engineering/Lecture-Material), which builds partly on the [py-rse book](https://merely-useful.tech/py-rse), and, on the other hand, from the [Test.jl docs](https://docs.julialang.org/en/v1/stdlib/Test/). +Material is taken and modified from the [SSE lecture](https://github.com/Simulation-Software-Engineering/Lecture-Material), which builds partly on the [py-rse book](https://merely-useful.tech/py-rse), and from the [Test.jl docs](https://docs.julialang.org/en/v1/stdlib/Test/). + --- @@ -20,30 +48,21 @@ Material is taken and modified, on the one hand, from the [SSE lecture](https:// ## What is Testing? -- Smelling old milk before using it! -- A way to determine if a software is not producing reliable results and if so, what is the reason. -- Manual testing vs. automated testing. +- Smelling old milk before using it +- A way to determine if a software is not producing reliable results and if so, what is the reason +- Manual testing vs. automated testing --- ## Why Should you Test your Software? -- Improve software reliability and reproducibility. -- Make sure that changes (bugfixes, new features) do not affect other parts of software. +- Improve software reliability and reproducibility +- Make sure that changes (bugfixes, new features) do not affect other parts of software - Generally all software is better off being tested regularly. Possible exceptions are very small codes with single users. - Ensure that a released version of a software actually works. --- -## Nomenclature in Software Testing - -- **Fixture**: preparatory set for testing. -- **Actual result**: what the code produces when given the fixture. -- **Expected result**: what the actual result is compared to. -- **Test coverage**: how much of the code do tests touch in one run. - ---- - ## Some Ways to Test Software - Assertions @@ -55,29 +74,27 @@ Material is taken and modified, on the one hand, from the [SSE lecture](https:// ## Assertions -- Principle of *defensive programming*. +```julia +@assert condition "message" +``` + +- Principle of *defensive programming* - Nothing happens when an assertion is true; throws error when false. - Types of assertion statements: - Precondition - Postcondition - Invariant -- A basic but powerful tool to test a software on-the-go. -- Assertion statement syntax in Python - -```julia -@assert condition "message" -``` +- A basic but powerful tool to test a software on-the-go --- ## Unit Testing -- Catching errors with assertions is good but preventing them is better! +- Catching errors with assertions is good but preventing them is better. - A *unit* is a single function in one situation. - A situation is one amongst many possible variations of input parameters. -- User creates the expected result manually. -- Fixture is the set of inputs used to generate an actual result. -- Actual result is compared to the expected result by `@test`. +- User creates the **expected result** manually. +- **Actual result** is compared to the expected result by `@test`. --- @@ -85,109 +102,44 @@ Material is taken and modified, on the one hand, from the [SSE lecture](https:// - Test whether several units work in conjunction. - *Integrate* units and test them together in an *integration* test. -- Often more complicated than a unit test and has more test coverage. -- A fixture is used to generate an actual result. -- Actual result is compared to the expected result by `@test`. +- Often more complicated than a unit test and gives higher test coverage. --- ## Regression Testing - Generating an expected result is not possible in some situations. -- Compare the current actual result with a previous actual result. +- Compare the *current* actual result with a *previous* actual result. - No guarantee that the current actual result is correct. - Risk of a bug being carried over indefinitely. - Main purpose is to identify changes in the current state of the code with respect to a past state. --- -## Test Coverage - -- Coverage is the amount of code a test touches in one run. -- Aim for high test coverage. -- There is a trade-off: high test coverage vs. effort in test development - ---- - -## Comparing Floating-point Variables - -- Very often quantities in math software are `float` / `double`. -- Such quantities cannot be compared to exact values, an approximation is necessary. -- Comparison of floating point variables needs to be done to a certain tolerance. - -```julia -@test 1 ≈ 0.999999 rtol=1e-5 -``` - -- Get `≈` by Latex `\approx` + TAB - ---- - -## Test-driven Development (TDD) - -- Principle is to write a test and then write a code to fulfill the test. -- Advantages: - - In the end user ends up with a test alongside the code. - - Eliminates confirmation bias of the user. - - Writing tests gives clarity on what the code is supposed to do. -- Disadvantage: known to not improve productivity. - ---- - -## Checking-driven Development (CDD) - -- Developer performs spot checks; sanity checks at intermediate stages -- Math software often has heuristics which are easy to determine. -- Keep performing same checks at different stages of development to ensure the code works. - ---- - -## Verifying a Test - -- Test written as part of a bug-fix: - - Reproduce the bug in the test by ensuring that the test fails. - - Fix the bug. - - Rerun the test to ensure that it passes. -- Test written to increase code coverage: - - Make sure that the first iteration of the test passes. - - Try introducing a small fixable bug in the code to verify if the test fails. - ---- - # 2. Unit Testing in Julia with Test.jl --- -## Setup of Tests.jl +## Setup of Test.jl -- Standard library to write and manage tests, `using Test` - Standardized folder structure: -``` -├── Manifest.toml -├── Project.toml -├── src/ -└── test - ├── Manifest.toml - ├── Project.toml - ├── runtests.jl - └── setup.jl -``` + ``` + ├── Manifest.toml + ├── Project.toml + ├── src/ + └── test + ├── Manifest.toml + ├── Project.toml + ├── runtests.jl + └── setup.jl + ``` - Singular `test` vs plural `runtests.jl` - `setup.jl` for all `using XYZ` statements, included in `runtests.jl` -- Additional packages either in `[extra] section` of `./Project.toml` or in a new `./test/Project.toml` environment +- Additional packages in `[extra] section` of `./Project.toml` or in new `./test/Project.toml` - In case of the latter: Do not add the package itself to the `./test/Project.toml` - - ---- - -## Run Tests - -Various options: - -- Directly call `runtests.jl` TODO? -- From Pkg-Manager `]test` when root project is activated +- Run: `]test` when root project is activated --- @@ -206,11 +158,11 @@ Various options: - `@testset`: Structure tests ```julia - julia> @testset "trigonometric identities" begin - θ = 2/3*π - @test sin(-θ) ≈ -sin(θ) - @test cos(-θ) ≈ cos(θ) - end; + @testset "trigonometric identities" begin + θ = 2/3*π + @test sin(-θ) ≈ -sin(θ) + @test cos(-θ) ≈ cos(θ) + end; ``` - `@testset for ... end`: Test in loop @@ -223,6 +175,8 @@ Various options: - [HiRSE-Summer of Testing Part 2b: "Testing with Julia" by Nils Niggemann](https://www.youtube.com/watch?v=gSMKNbZOpZU) - [Official documentation of Test.jl](https://docs.julialang.org/en/v1/stdlib/Test/) +--- + # 3. Test.jl Demo We use [`MyTestPackage`](https://github.com/s-ccs/summerschool_simtech_2023/tree/main/material/2_tue/testing/MyTestPackage), which looks as follows: @@ -246,7 +200,7 @@ We use [`MyTestPackage`](https://github.com/s-ccs/summerschool_simtech_2023/tree - Look at `MyTestPackage.jl` and `find.jl`: We have two functions `find_max` and `find_mean`, which calculate the maximum and mean of all elements of a `::AbstractVector`. - Assertions were added to check for `NaN` values - Look at `runtests.jl`: - - TODO: Why do we need `using MyTestPackage`? + - Why do we need `using MyTestPackage`? - We include dependencies via `setup.jl`: `Test` and `StableRNG`. - Testset "find" - Look at `find.jl` diff --git a/material/3_wed/regression/Code_Snippets.jl b/material/3_wed/regression/Code_Snippets.jl new file mode 100644 index 0000000..e7889d9 --- /dev/null +++ b/material/3_wed/regression/Code_Snippets.jl @@ -0,0 +1,48 @@ +using Statistics +using Plots +using RDatasets +using GLM + +#--- + +trees = dataset("datasets", "trees") + +scatter(trees.Girth, trees.Volume, + legend=false, xlabel="Girth", ylabel="Volume") + +#--- + +scatter(trees.Girth, trees.Volume, + legend=false, xlabel="Girth", ylabel="Volume") +plot!(x -> -37 + 5*x) + +#--- + +linmod1 = lm(@formula(Volume ~ Girth), trees) + +#--- + +linmod2 = lm(@formula(Volume ~ Girth + Height), trees) + +#--- + +r2(linmod1) +r2(linmod2) + +linmod3 = lm(@formula(Volume ~ Girth + Height + Girth*Height), trees) + +r2(linmod3) + +#--- + +using CSV +using HTTP + +http_response = HTTP.get("https://vincentarelbundock.github.io/Rdatasets/csv/AER/SwissLabor.csv") +SwissLabor = DataFrame(CSV.File(http_response.body)) + +SwissLabor[!,"participation"] .= (SwissLabor.participation .== "yes") + +#--- + +model = glm(@formula(participation ~ age), SwissLabor, Binomial(), ProbitLink()) \ No newline at end of file diff --git a/material/3_wed/regression/MultipleRegressionBasics.qmd b/material/3_wed/regression/MultipleRegressionBasics.qmd index 8af7871..9e02584 100644 --- a/material/3_wed/regression/MultipleRegressionBasics.qmd +++ b/material/3_wed/regression/MultipleRegressionBasics.qmd @@ -10,13 +10,26 @@ editor: ### Introductory Example: tree dataset from R -\[figure of raw data\] +``` julia +using Statistics +using Plots +using RDatasets + +trees = dataset("datasets", "trees") + +scatter(trees.Volume, trees.Girth, + legend=false, xlabel="Girth", ylabel="Volume") +``` *Aim:* Find relationship between the *response variable* `volume` and the *explanatory variable/covariate* `girth`? Can we predict the volume of a tree given its girth? -\[figure including a straight line\] +``` julia +scatter(trees.Girth, trees.Volume, + legend=false, xlabel="Girth", ylabel="Volume") +plot!(x -> -37 + 5*x) +``` First Guess: There is a linear relation! @@ -52,8 +65,10 @@ Note: There is a closed-form expression for $(\hat \beta_0, \hat \beta_1)$. We will not make use of it here, but rather use Julia to solve the problem. -\[use Julia code (existing package) to perform linear regression for -`volume ~ girth`\] + +``` julia +lm(@formula(Volume ~ Girth), trees) +``` *Interpretation of the Julia output:* @@ -70,8 +85,12 @@ rather use Julia to solve the problem. Under the hypothesis $\beta_i=0$, the test statistics $t_i$ would follow a $t$-distribution. -- column `Pr(>|t|)`: $p$-values for the hyptheses $\beta_i=0$ for +- column `Pr(>|t|)`: $p$-values for the hypotheses $\beta_i=0$ for $i=0,1$ + +:::callout.tip +The command `rand(n)` generates a sample of `n` "random" (i.e., uniformly distributed) random numbers. +::: **Task 1**: Generate a random set of covariates $\mathbf{x}$. Given these covariates and true parameters $\beta_0$, $\beta_1$ and $\sigma^2$ @@ -166,6 +185,15 @@ the corresponding standard errors and the $t$-statistics. Test your functions with the \`\`\`tree''' data set and try to reproduce the output above. +``` julia +r2(linmod1) +r2(linmod2) + +linmod3 = lm(@formula(Volume ~ Girth + Height + Girth*Height), trees) + +r2(linmod3) +``` + ## Generalized Linear Models Classical linear model @@ -206,29 +234,35 @@ $$ For the models above, these are: -+----------------------+---------------------+----------------------+ -| Type of Data | Distribution Family | Link Function | -+======================+=====================+======================+ -| continuous | Normal | identity: | -| | | | -| | | $$ | -| | | g(x)=x | -| | | $$ | -+----------------------+---------------------+----------------------+ -| count | Poisson | log: | -| | | | -| | | $$ | -| | | g(x) = \log(x) | -| | | $$ | -+----------------------+---------------------+----------------------+ -| binary | Bernoulli | logit: | -| | | | -| | | $$ | -| | | g(x) = \log\left | -| | | ( | -| | | \frac{x}{1-x}\right) | -| | | $$ | -+----------------------+---------------------+----------------------+ ++----------------+------------------+-----------------+ +| Type of Data | Distribution | Link Function | +| | Family | | ++================+==================+=================+ +| continuous | Normal | identity: | +| | | | +| | | $$ | +| | | g(x)=x | +| | | $$ | ++----------------+------------------+-----------------+ +| count | Poisson | log: | +| | | | +| | | $$ | +| | | g(x) = \log(x) | +| | | $$ | ++----------------+------------------+-----------------+ +| binary | Bernoulli | logit: | +| | | | +| | | $$ | +| | | g | +| | | (x) = \log\left | +| | | ( | +| | | \ | +| | | f | +| | | ra | +| | | c | +| | | {x}{1-x}\right) | +| | | $$ | ++----------------+------------------+-----------------+ In general, the parameter vector $\beta$ is estimated via maximizing the likelihood, i.e., @@ -246,8 +280,23 @@ $$ In the Gaussian case, the maximum likelihood estimator is identical to the least squares estimator considered above. -\[\[ Example in Julia: maybe `SwissLabor` \]\] +``` julia +using CSV +using HTTP -**Task 3:** Reproduce the results of our data analysis of the `tree` -data set using a generalized linear model with normal distribution -family. +http_response = HTTP.get("https://vincentarelbundock.github.io/Rdatasets/csv/AER/SwissLabor.csv") +SwissLabor = DataFrame(CSV.File(http_response.body)) + +SwissLabor[!,"participation"] .= (SwissLabor.participation .== "yes") + +model = glm(@formula(participation ~ age^2), + SwissLabor, Binomial(), ProbitLink()) +``` + +::: callout-task +**Task 3**: + +1. Reproduce the results of our data analysis of the `tree` data set using +a generalized linear model with normal distribution family. +2. Generate +:::