This commit is contained in:
Benedikt Ehinger 2023-10-09 16:40:53 +02:00
commit ea48e8f12c
13 changed files with 634 additions and 162 deletions

View File

@ -47,6 +47,8 @@ website:
text: "📝 3 - First Steps: Handout"
- href: "material/1_mon/firststeps/tasks.qmd"
text: "🛠 3 - First Steps: Exercises"
- href: "https://github.com/s-ccs/summerschool_simtech_2023/blob/main/material/1_mon/firststeps/statistic_functions.jl"
text: "✔ 3 - Solutions"
- href: "material/1_mon/envs/envs_handout.qmd"
text: "📝 4 - Envs & Pkgs : Handout"
- href: "material/1_mon/envs/tasks.qmd"
@ -57,9 +59,9 @@ website:
text: "📝 1 - Advanced Git and Contributing"
- href: "material/2_tue/git/tasks.qmd"
text: "🛠 1 - Git: Exercises"
- href: "material/2_tue/testing/slides.qmd"
- href: "material/2_tue/testing/slides.md"
text: "📝 2 - Testing"
- href: "material/2_tue/CI/missing.qmd"
- href: "material/2_tue/ci/slides.md"
text: "📝 3 - Continuous Integration"
- href: material/2_tue/codereview/slides.qmd
text: "📝 4 - Code Review"

View File

@ -0,0 +1 @@
There are many good ones out there. One we can recommend is the [one from GitHub](https://education.github.com/git-cheat-sheet-education.pdf).

View File

@ -0,0 +1 @@
Also [one from GitHub](https://github.github.io/actions-cheat-sheet/actions-cheat-sheet.pdf)

View File

@ -22,6 +22,9 @@ Seminar room in the groundfloor (directly at the entrance)
[Link to map](https://www.simtech.uni-stuttgart.de/events/simtech-summer-school/SuSch_2/location/)
#### ⚗ Advanced topics
We probably have some time to discuss advanced topics towards the end of the summers school. You are welcome to send an email to benedikt and/or put it into [the git issue](https://github.com/s-ccs/summerschool_simtech_2023/issues/15)
----
We wish you all a interesting, safe and fun summerschool. If there are any interpersonal issues (especially regarding [code-of-conduct](https://www.uni-stuttgart.de/en/university/profile/diversity/code-of-conduct/)), please directly contact [Benedikt Ehinger](benedikt.ehinger@vis.uni-stuttgart.de)^[If there are problem with him, please contact **Marco Oesting**]. For organizational issues, please contact [Sina Schorndorfer](sina.schorndorfer@imsb.uni-stuttgart.de)

View File

@ -3,6 +3,22 @@
2. Add your `statistic.jl` & "include" it.
3. Export all functions
4. Create a new environment in a separate folder and add the package.
5. Does `using MyStatsPackage` work now? :tada: congratulations!
6. Go back to your package environment. Now add a dependency (e.g. ProgressMeter) and a `compat`-entry
7. Go back to your project environment, has the dependency been updated? Think: should you use `resolve` or `instantiate`?
5. Does `using MyStatsPackage` work now?
:::{.callout collapse=true}
## Yes!
:tada: congratulations!
:::
:::{.callout collapse=true}
## No!
Oh no, better check you activated the right environment - ask for help!
:::
6. Go back to your package environment. Now add a dependency (e.g. ProgressMeter.jl) and a `compat`-entry
7. Go back to your project environment, has the
dependency been updated?
:::{.callout collapse=true}
## Hint?
Should you use `resolve` or `instantiate`?
:::

View File

@ -10,13 +10,13 @@ You can mark some code and execute it using `ctrl` + `enter` - you can also gene
## The exercise
1. Open a new script `statistic_functions.jl` in VSCode in a folder of your choice.
2. implement a function called `rse_sum`^[rse = research software engineering, we could use `sum` in a principled way, but it requires some knowledge you likely don't have right now]. This function should return `true` if provided with the following test: `res_sum(1:36) == 666`. You should further make use of a for-loop.
2. implement a function called `rse_sum`^[rse = research software engineering, we could use `sum` in a principled way, but it requires some knowledge you likely don't have right now]. This function should return `true` if provided with the following test: `rse_sum(1:36) == 666`. You should further make use of a for-loop.
3. implement a second function called `rse_mean`, which calculates the mean of the provided vector. Make sure to use the `rse_sum` function! Test it using `res_mean(-15:17) == 1`
3. implement a second function called `rse_mean`, which calculates the mean of the provided vector. Make sure to use the `rse_sum` function! Test it using `rse_mean(-15:17) == 1`
4. Next implement a standard deviation function `rse_std`: $\sqrt{\frac{\sum(x-mean(x))}{n-1}}$, this time you should use elementwise/broadcasting operators. Test it with `rse_std(1:3) == 1`
4. Next implement a standard deviation function `rse_std`: $\sqrt{\frac{\sum((x-mean(x))^2)}{n-1}}$, this time you should use elementwise/broadcasting operators. Test it with `rse_std(1:3) == 1.`
5. Finally, we will implement `rse_tstat`, returning the t-value with `length(x)-1` DF, that the provided Array actually has a mean of 0. Test it with `rse_tstat(2:3) == 5`. Add the keyword argument `σ` that allows the user to optionally provide a pre-calculated standard deviation.
5. Finally, we will implement `rse_tstat`, returning the t-value with `length(x)-1` DF, that the provided Array actually has a mean of 0. The formula is $\frac{mean(x)}{std(x) / (sqrt(length(x)))}$ Test it with `rse_tstat(2:3) == 5.`. Add the keyword argument `σ` that allows the user to optionally provide a pre-calculated standard deviation.
Well done! You now have all functions defined with which we will continue our journey.

View File

@ -2,6 +2,9 @@
format:
revealjs:
output-file: rse_basics_slides_revealjs.html
scrollable: true
progress: true
history: false
html: default
---

395
material/2_tue/ci/slides.md Normal file
View File

@ -0,0 +1,395 @@
---
type: slide
slideOptions:
transition: slide
width: 1400
height: 900
margin: 0.1
---
<style>
.reveal strong {
font-weight: bold;
color: orange;
}
.reveal p {
text-align: left;
}
.reveal section h1 {
color: orange;
}
.reveal section h2 {
color: orange;
}
.reveal code {
font-family: 'Ubuntu Mono';
color: orange;
}
.reveal section img {
background:none;
border:none;
box-shadow:none;
}
</style>
# Learning Goals
- Name and explain common workflows to automate in RSE.
- Explain the differences between the various continuous methodologies.
- Explain why automation is crucial in RSE.
- Write and understand basic automation scripts for GitHub Actions.
- s.t. we understand what `PkgTemplates` generates for us.
Material is taken and modified from the [SSE lecture](https://github.com/Simulation-Software-Engineering/Lecture-Material).
---
# 1. Workflow Automation
---
## Why Automation?
- Automatize tasks
- Run tests frequently, give feedback early etc.
- Ensure reproducible test environments
- Cannot forget automatized tasks
- Less burden to developer (and their workstation)
- Avoid manual errors
- Process often integrated in development workflow
- Example: Support by Git hooks or Git forges
---
## Typical Automation Tasks in RSE
- Check code formatting and quality
- Compile and test code for different platforms
- Generate coverage reports and visualization
- Build documentation and deploy it
- Build, package, and upload releases
---
## Continuous Methodologies (1/2)
- **Continuous Integration** (CI)
- Continuously integrate changes into "main" branch
- Avoids "merge hell"
- Relies on testing and checking code continuously
- Should be automatized
---
## Continuous Methodologies (2/2)
- **Continuous Delivery** (CD)
- Software is in a state that allows new release at any time
- Software package is built
- Actual release triggered manually
- **Continuous Deployment** (CD)
- Software is in a state that allows new release at any time
- Software package is built
- Actual release triggered automatically (continuously)
---
## Automation Services/Software
- [GitHub Actions](https://github.com/features/actions)
- [GitLab CI/CD](https://docs.gitlab.com/ee/ci/)
- [Circle CI](https://circleci.com/)
- [Travis CI](https://www.travis-ci.com/)
- [Jenkins](https://www.jenkins.io/)
- ...
---
# 2. GitHub Actions
---
## What is "GitHub Actions"?
> Automate, customize, and execute your software development workflows right in your repository with GitHub Actions.
From: [https://docs.github.com/en/actions](https://docs.github.com/en/actions)
---
## General Information
- Usage of GitHub's runners is [limited](https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits)
- Available for public repositories or accounts with subscription
- By default Actions run on GitHub's runners
- Linux, Windows, or MacOS
- Quickly evolving and significant improvements in recent years
---
## Components (1/2)
- [Workflow](https://docs.github.com/en/actions/using-workflows): Runs one or more jobs
- [Event](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows): Triggers a workflow
- [Jobs](https://docs.github.com/en/actions/using-jobs): Set of steps (running on same runner)
- Steps executed consecutively and share data
- Jobs by default executed in parallel
- [Action](https://docs.github.com/en/actions/creating-actions): Application performing common, complex task (step) often used in workflows
- [Runner](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions#runners): Server that runs jobs
- [Artifacts](https://docs.github.com/en/actions/learn-github-actions/essential-features-of-github-actions#sharing-data-between-jobs): Files to be shared between jobs or to be kept after workflow finishes
---
## Components (2/2)
<img src="https://docs.github.com/assets/cb-25535/mw-1440/images/help/actions/overview-actions-simple.webp" width=95%; style="margin-left:auto; margin-right:auto; padding-top: 25px; padding-bottom: 25px; background: #eeeeee">
From [GitHub Actions tutorial](https://docs.github.com/en/actions)
---
## Setting up a Workflow
- Workflow file files stored `${REPO_ROOT}/.github/workflows`
- Configured via YAML file
```yaml
name: learn-github-actions
on: [push]
jobs:
check-bats-version:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v2
with:
node-version: '14'
- run: npm install -g bats
- run: bats -v
```
---
## Actions
```yaml
- uses: actions/checkout@v3
- uses: actions/setup-node@v2
with:
node-version: '14'
```
- Integrated via `uses` directive
- Additional configuration via `with` (options depend on Action)
- Find actions in [marketplace](https://github.com/marketplace?type=actions)
- Write [own actions](https://docs.github.com/en/actions/creating-actions)
---
## Some Useful Julia Actions
- Find on [gitHub.com/julia-actions](https://github.com/julia-actions/)
```
- uses: julia-actions/setup-julia@v1
with:
version: '1.9'
```
- More:
- `cache`: caches `~/.julia/artifacts/*` and `~/.julia/packages/*` to reduce runtime of CI
- `julia-buildpkg`: build package
- `julia-runtest`: run tests
- `julia-format`: format code
---
## User-specified Commands
```yaml
- name: "Single line command"
run: echo "Single line command"
- name: "Multi line command"
run: |
echo "First line"
echo "Second line. Directory ${PWD}"
workdir: tmp/
shell: bash
```
---
## Events
- Single or multiple events
```yaml
on: [push, fork]
```
- Activities
```yaml
on:
issue:
types:
- opened
- labeled
```
- Filters
```yaml
on:
push:
branches:
- main
- 'releases/**'
```
---
## Artifacts
- Data sharing between jobs and data upload
- Uploading artifact
```yaml
- name: "Upload artifact"
uses: actions/upload-artifact@v2
with:
name: my-artifact
path: my_file.txt
retention-days: 5
```
- Downloading artifact
```yaml
- name: "Download a single artifact"
uses: actions/download-artifact@v2
with:
name: my-artifact
```
**Note**: Drop name to download all artifacts
---
## Test Actions Locally
- [act](https://github.com/nektos/act)
- Relies extensively on Docker
- User should be in `docker` group
- Run `act` from root of the repository
```text
act (runs all workflows)
act --job WORKFLOWNAME
```
- Environment is not 100% identical to GitHub's
- Workflows may fail locally, but work on GitHub
---
## Further Reading
- [What is Continuous Integration?](https://www.atlassian.com/continuous-delivery/continuous-integration)
- [GitHub Actions documentation](https://docs.github.com/en/actions)
- [GitHub Actions quickstart](https://docs.github.com/en/actions/quickstart)
---
# 3. Demo: Automation with GitHub Actions
---
## Setting up a Test Job
- Import [Julia test package repository](https://github.com/uekerman/JuliaTestPackage) (the same code we used for testing)
- Set up workflow file
```bash
mkdir -p .github/workflows
cd .github/workflows
vi format-check.yml
```
- Let's check whether our code is formatted correctly. Edit `format-check.yml` to have following content
```yaml
name: format-check
on: [push, pull_request]
jobs:
format:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: julia-actions/setup-julia@v1
with:
version: '1.9'
- name: Install JuliaFormatter and format
run: |
julia -e 'using Pkg; Pkg.add(PackageSpec(name="JuliaFormatter"))'
julia -e 'using JuliaFormatter; format(".", verbose=true)'
- name: Format check
run: |
julia -e '
out = Cmd(`git diff --name-only`) |> read |> String
if out == ""
exit(0)
else
@error "Some files have not been formatted"
write(stdout, out)
exit(1)
end'
```
- `runs-on` does **not** refer to a Docker container, but to a runner tag.
- Add, commit, push
- After the push, inspect "Action" panel on GitHub repository
- GitHub will schedule a run (yellow dot)
- Hooray. We have set up our first action.
- Failing test example:
- Edit settings on GitHub that one can only merge if all tests pass:
- Settings -> Branches -> Branch protection rule
- Choose `main` branch
- Enable "Require status checks to pass before merging". Optionally enable "Require branches to be up to date before merging"
- Choose status checks that need to pass: `test`
- Click on "Create" at bottom of page.
- Create a new branch `break-code`.
- Edit some file, violate the formatting, commit it and push it to the branch. Afterwards open a new PR and inspect the failing test. We are also not able to merge the changes as the "Merge" button should be inactive.
---
## act Demo
- `act` is for quick checks while developing workflows, not for developing the code
- Check available jobs (at root of repository)
```bash
act -l
```
- Run jobs for `push` event (default event)
```bash
act
```
- Run a specific job
```bash
act -j test
```
---
# 4. Exercise
Set up GitHub Actions for your statistics package. They should format your code and run the tests. To structure and parallelize things, you could use two separate jobs.

View File

@ -32,7 +32,7 @@ slideOptions:
}
</style>
## Learning Goals of the Git Lecture
# Learning Goals
- Refresh and organize students' existing knowledge on Git (learn how to learn more).
- Students can explain difference between merge and rebase and when to use what.
@ -232,7 +232,6 @@ Which level do you have?
- [Official documentation](http://git-scm.com/doc)
- [Video: Git in 15 minutes: basics, branching, no remote](https://www.youtube.com/watch?v=USjZcfj8yxE)
- [The GitHub Blog: Commits are snapshots, not diffs](https://github.blog/2020-12-17-commits-are-snapshots-not-diffs/)
- Chapters [6](https://merely-useful.tech/py-rse/git-cmdline.html) and [7](https://merely-useful.tech/py-rse/git-advanced.html) of Research Software Engineering with Python
- [Podcast All Things Git: History of VC](https://www.allthingsgit.com/episodes/the_history_of_vc_with_eric_sink.html)
- [git purr](https://girliemac.com/blog/2017/12/26/git-purr/)

View File

@ -3,6 +3,7 @@
1. Work with any forge that you like and create a user account (we strongly recommend GitHub since we will need it later again).
2. Push your package `MyStatsPackage` to a remote repository.
3. Add a function `printOwner` to the package through a pull request. The function should print your (GitHub) user name (hard-coded).
4. Use the package from somebody else in the classroom and verify with `printOwner` that you use the correct package.
5. Fork this other package and contribute a function `printContributor` to it via a PR. Get a review and get it merged.
6. Add more functions to other packages of classmates that print funny things, but always ensure a linear history.
4. Start a new Julia environment and use your package through its url: `]add https://github.com/[username]/MyStatsPackage`.
5. Now use the package from somebody else in the classroom instead and verify with `printOwner` that you use the correct package.
6. Fork this other package and contribute a function `printContributor` to it via a PR. Get a review and get it merged.
7. Add more functions to other packages of classmates that print funny things, but always ensure a linear history.

View File

@ -1,16 +1,44 @@
---
format: revealjs
type: slide
slideOptions:
transition: slide
width: 1400
height: 900
margin: 0.1
---
<style>
.reveal strong {
font-weight: bold;
color: orange;
}
.reveal p {
text-align: left;
}
.reveal section h1 {
color: orange;
}
.reveal section h2 {
color: orange;
}
.reveal code {
font-family: 'Ubuntu Mono';
color: orange;
}
.reveal section img {
background:none;
border:none;
box-shadow:none;
}
</style>
# Learning Goals
- Justify the effort of developing tests to some extent
- Get to know a few common terms of testing
- Work with the Julia unit testing package `Test.jl`
Material is taken and modified, on the one hand, from the [SSE lecture](https://github.com/Simulation-Software-Engineering/Lecture-Material), which builds partly on the [py-rse book](https://merely-useful.tech/py-rse), and, on the other hand, from the [Test.jl docs](https://docs.julialang.org/en/v1/stdlib/Test/).
Material is taken and modified from the [SSE lecture](https://github.com/Simulation-Software-Engineering/Lecture-Material), which builds partly on the [py-rse book](https://merely-useful.tech/py-rse), and from the [Test.jl docs](https://docs.julialang.org/en/v1/stdlib/Test/).
---
@ -20,30 +48,21 @@ Material is taken and modified, on the one hand, from the [SSE lecture](https://
## What is Testing?
- Smelling old milk before using it!
- A way to determine if a software is not producing reliable results and if so, what is the reason.
- Manual testing vs. automated testing.
- Smelling old milk before using it
- A way to determine if a software is not producing reliable results and if so, what is the reason
- Manual testing vs. automated testing
---
## Why Should you Test your Software?
- Improve software reliability and reproducibility.
- Make sure that changes (bugfixes, new features) do not affect other parts of software.
- Improve software reliability and reproducibility
- Make sure that changes (bugfixes, new features) do not affect other parts of software
- Generally all software is better off being tested regularly. Possible exceptions are very small codes with single users.
- Ensure that a released version of a software actually works.
---
## Nomenclature in Software Testing
- **Fixture**: preparatory set for testing.
- **Actual result**: what the code produces when given the fixture.
- **Expected result**: what the actual result is compared to.
- **Test coverage**: how much of the code do tests touch in one run.
---
## Some Ways to Test Software
- Assertions
@ -55,29 +74,27 @@ Material is taken and modified, on the one hand, from the [SSE lecture](https://
## Assertions
- Principle of *defensive programming*.
```julia
@assert condition "message"
```
- Principle of *defensive programming*
- Nothing happens when an assertion is true; throws error when false.
- Types of assertion statements:
- Precondition
- Postcondition
- Invariant
- A basic but powerful tool to test a software on-the-go.
- Assertion statement syntax in Python
```julia
@assert condition "message"
```
- A basic but powerful tool to test a software on-the-go
---
## Unit Testing
- Catching errors with assertions is good but preventing them is better!
- Catching errors with assertions is good but preventing them is better.
- A *unit* is a single function in one situation.
- A situation is one amongst many possible variations of input parameters.
- User creates the expected result manually.
- Fixture is the set of inputs used to generate an actual result.
- Actual result is compared to the expected result by `@test`.
- User creates the **expected result** manually.
- **Actual result** is compared to the expected result by `@test`.
---
@ -85,109 +102,44 @@ Material is taken and modified, on the one hand, from the [SSE lecture](https://
- Test whether several units work in conjunction.
- *Integrate* units and test them together in an *integration* test.
- Often more complicated than a unit test and has more test coverage.
- A fixture is used to generate an actual result.
- Actual result is compared to the expected result by `@test`.
- Often more complicated than a unit test and gives higher test coverage.
---
## Regression Testing
- Generating an expected result is not possible in some situations.
- Compare the current actual result with a previous actual result.
- Compare the *current* actual result with a *previous* actual result.
- No guarantee that the current actual result is correct.
- Risk of a bug being carried over indefinitely.
- Main purpose is to identify changes in the current state of the code with respect to a past state.
---
## Test Coverage
- Coverage is the amount of code a test touches in one run.
- Aim for high test coverage.
- There is a trade-off: high test coverage vs. effort in test development
---
## Comparing Floating-point Variables
- Very often quantities in math software are `float` / `double`.
- Such quantities cannot be compared to exact values, an approximation is necessary.
- Comparison of floating point variables needs to be done to a certain tolerance.
```julia
@test 1 ≈ 0.999999 rtol=1e-5
```
- Get `≈` by Latex `\approx` + TAB
---
## Test-driven Development (TDD)
- Principle is to write a test and then write a code to fulfill the test.
- Advantages:
- In the end user ends up with a test alongside the code.
- Eliminates confirmation bias of the user.
- Writing tests gives clarity on what the code is supposed to do.
- Disadvantage: known to not improve productivity.
---
## Checking-driven Development (CDD)
- Developer performs spot checks; sanity checks at intermediate stages
- Math software often has heuristics which are easy to determine.
- Keep performing same checks at different stages of development to ensure the code works.
---
## Verifying a Test
- Test written as part of a bug-fix:
- Reproduce the bug in the test by ensuring that the test fails.
- Fix the bug.
- Rerun the test to ensure that it passes.
- Test written to increase code coverage:
- Make sure that the first iteration of the test passes.
- Try introducing a small fixable bug in the code to verify if the test fails.
---
# 2. Unit Testing in Julia with Test.jl
---
## Setup of Tests.jl
## Setup of Test.jl
- Standard library to write and manage tests, `using Test`
- Standardized folder structure:
```
├── Manifest.toml
├── Project.toml
├── src/
└── test
├── Manifest.toml
├── Project.toml
├── runtests.jl
└── setup.jl
```
```
├── Manifest.toml
├── Project.toml
├── src/
└── test
├── Manifest.toml
├── Project.toml
├── runtests.jl
└── setup.jl
```
- Singular `test` vs plural `runtests.jl`
- `setup.jl` for all `using XYZ` statements, included in `runtests.jl`
- Additional packages either in `[extra] section` of `./Project.toml` or in a new `./test/Project.toml` environment
- Additional packages in `[extra] section` of `./Project.toml` or in new `./test/Project.toml`
- In case of the latter: Do not add the package itself to the `./test/Project.toml`
---
## Run Tests
Various options:
- Directly call `runtests.jl` TODO?
- From Pkg-Manager `]test` when root project is activated
- Run: `]test` when root project is activated
---
@ -206,11 +158,11 @@ Various options:
- `@testset`: Structure tests
```julia
julia> @testset "trigonometric identities" begin
θ = 2/3*π
@test sin(-θ) ≈ -sin(θ)
@test cos(-θ) ≈ cos(θ)
end;
@testset "trigonometric identities" begin
θ = 2/3*π
@test sin(-θ) ≈ -sin(θ)
@test cos(-θ) ≈ cos(θ)
end;
```
- `@testset for ... end`: Test in loop
@ -223,6 +175,8 @@ Various options:
- [HiRSE-Summer of Testing Part 2b: "Testing with Julia" by Nils Niggemann](https://www.youtube.com/watch?v=gSMKNbZOpZU)
- [Official documentation of Test.jl](https://docs.julialang.org/en/v1/stdlib/Test/)
---
# 3. Test.jl Demo
We use [`MyTestPackage`](https://github.com/s-ccs/summerschool_simtech_2023/tree/main/material/2_tue/testing/MyTestPackage), which looks as follows:
@ -246,7 +200,7 @@ We use [`MyTestPackage`](https://github.com/s-ccs/summerschool_simtech_2023/tree
- Look at `MyTestPackage.jl` and `find.jl`: We have two functions `find_max` and `find_mean`, which calculate the maximum and mean of all elements of a `::AbstractVector`.
- Assertions were added to check for `NaN` values
- Look at `runtests.jl`:
- TODO: Why do we need `using MyTestPackage`?
- Why do we need `using MyTestPackage`?
- We include dependencies via `setup.jl`: `Test` and `StableRNG`.
- Testset "find"
- Look at `find.jl`

View File

@ -0,0 +1,48 @@
using Statistics
using Plots
using RDatasets
using GLM
#---
trees = dataset("datasets", "trees")
scatter(trees.Girth, trees.Volume,
legend=false, xlabel="Girth", ylabel="Volume")
#---
scatter(trees.Girth, trees.Volume,
legend=false, xlabel="Girth", ylabel="Volume")
plot!(x -> -37 + 5*x)
#---
linmod1 = lm(@formula(Volume ~ Girth), trees)
#---
linmod2 = lm(@formula(Volume ~ Girth + Height), trees)
#---
r2(linmod1)
r2(linmod2)
linmod3 = lm(@formula(Volume ~ Girth + Height + Girth*Height), trees)
r2(linmod3)
#---
using CSV
using HTTP
http_response = HTTP.get("https://vincentarelbundock.github.io/Rdatasets/csv/AER/SwissLabor.csv")
SwissLabor = DataFrame(CSV.File(http_response.body))
SwissLabor[!,"participation"] .= (SwissLabor.participation .== "yes")
#---
model = glm(@formula(participation ~ age), SwissLabor, Binomial(), ProbitLink())

View File

@ -10,13 +10,26 @@ editor:
### Introductory Example: tree dataset from R
\[figure of raw data\]
``` julia
using Statistics
using Plots
using RDatasets
trees = dataset("datasets", "trees")
scatter(trees.Volume, trees.Girth,
legend=false, xlabel="Girth", ylabel="Volume")
```
*Aim:* Find relationship between the *response variable* `volume` and
the *explanatory variable/covariate* `girth`? Can we predict the volume
of a tree given its girth?
\[figure including a straight line\]
``` julia
scatter(trees.Girth, trees.Volume,
legend=false, xlabel="Girth", ylabel="Volume")
plot!(x -> -37 + 5*x)
```
First Guess: There is a linear relation!
@ -52,8 +65,10 @@ Note: There is a closed-form expression for
$(\hat \beta_0, \hat \beta_1)$. We will not make use of it here, but
rather use Julia to solve the problem.
\[use Julia code (existing package) to perform linear regression for
`volume ~ girth`\]
``` julia
lm(@formula(Volume ~ Girth), trees)
```
*Interpretation of the Julia output:*
@ -70,9 +85,13 @@ rather use Julia to solve the problem.
Under the hypothesis $\beta_i=0$, the test statistics $t_i$ would
follow a $t$-distribution.
- column `Pr(>|t|)`: $p$-values for the hyptheses $\beta_i=0$ for
- column `Pr(>|t|)`: $p$-values for the hypotheses $\beta_i=0$ for
$i=0,1$
:::callout.tip
The command `rand(n)` generates a sample of `n` "random" (i.e., uniformly distributed) random numbers.
:::
**Task 1**: Generate a random set of covariates $\mathbf{x}$. Given
these covariates and true parameters $\beta_0$, $\beta_1$ and $\sigma^2$
(you can choose them)), simulate responses from a linear model and
@ -166,6 +185,15 @@ the corresponding standard errors and the $t$-statistics. Test your
functions with the \`\`\`tree''' data set and try to reproduce the
output above.
``` julia
r2(linmod1)
r2(linmod2)
linmod3 = lm(@formula(Volume ~ Girth + Height + Girth*Height), trees)
r2(linmod3)
```
## Generalized Linear Models
Classical linear model
@ -206,29 +234,35 @@ $$
For the models above, these are:
+----------------------+---------------------+----------------------+
| Type of Data | Distribution Family | Link Function |
+======================+=====================+======================+
| continuous | Normal | identity: |
| | | |
| | | $$ |
| | | g(x)=x |
| | | $$ |
+----------------------+---------------------+----------------------+
| count | Poisson | log: |
| | | |
| | | $$ |
| | | g(x) = \log(x) |
| | | $$ |
+----------------------+---------------------+----------------------+
| binary | Bernoulli | logit: |
| | | |
| | | $$ |
| | | g(x) = \log\left |
| | | ( |
| | | \frac{x}{1-x}\right) |
| | | $$ |
+----------------------+---------------------+----------------------+
+----------------+------------------+-----------------+
| Type of Data | Distribution | Link Function |
| | Family | |
+================+==================+=================+
| continuous | Normal | identity: |
| | | |
| | | $$ |
| | | g(x)=x |
| | | $$ |
+----------------+------------------+-----------------+
| count | Poisson | log: |
| | | |
| | | $$ |
| | | g(x) = \log(x) |
| | | $$ |
+----------------+------------------+-----------------+
| binary | Bernoulli | logit: |
| | | |
| | | $$ |
| | | g |
| | | (x) = \log\left |
| | | ( |
| | | \ |
| | | f |
| | | ra |
| | | c |
| | | {x}{1-x}\right) |
| | | $$ |
+----------------+------------------+-----------------+
In general, the parameter vector $\beta$ is estimated via maximizing the
likelihood, i.e.,
@ -246,8 +280,23 @@ $$
In the Gaussian case, the maximum likelihood estimator is identical to
the least squares estimator considered above.
\[\[ Example in Julia: maybe `SwissLabor` \]\]
``` julia
using CSV
using HTTP
**Task 3:** Reproduce the results of our data analysis of the `tree`
data set using a generalized linear model with normal distribution
family.
http_response = HTTP.get("https://vincentarelbundock.github.io/Rdatasets/csv/AER/SwissLabor.csv")
SwissLabor = DataFrame(CSV.File(http_response.body))
SwissLabor[!,"participation"] .= (SwissLabor.participation .== "yes")
model = glm(@formula(participation ~ age^2),
SwissLabor, Binomial(), ProbitLink())
```
::: callout-task
**Task 3**:
1. Reproduce the results of our data analysis of the `tree` data set using
a generalized linear model with normal distribution family.
2. Generate
:::