---
type: slide
slideOptions:
transition: slide
width: 1400
height: 900
margin: 0.1
---
## Learning Goals of the Git Lecture
- Refresh and organize students' existing knowledge on Git (learn how to learn more).
- Students can explain difference between merge and rebase and when to use what.
- How to use Git workflows to organize research software development in a team.
- Get to know a few useful GitHub/GitLab standards and a few helpful tools.
- Get to know a few rules on good commit messages.
Material is taken and modified from the [SSE lecture](https://github.com/Simulation-Software-Engineering/Lecture-Material), which builds partly on the [py-rse book](https://merely-useful.tech/py-rse).
---
# 1. Introduction to Version Control
---
## Why Do We Need Version Control?
Version control ...
- tracks changes to files and helps people share those changes with each other.
- Could also be done via email / Google Docs / ..., but not as accurately and efficiently
- was originally developed for software development, but today cornerstone of *reproducible research*
> "If you can't git diff a file format, it's broken."
---
## How Does Version Control Work?
- *master* (or *main*) copy of code in repository, can't edit directly
- Instead: check out a working copy of code, edit, commit changes back
- Repository records complete revision history
- You can go back in time
- It's clear who did what when
---
## The Alternative: A Story Told in File Names
[http://phdcomics.com/comics/archive/phd052810s.gif](http://phdcomics.com/comics/archive/phd052810s.gif)
---
## A Very Short History of Version Control I
The old centralized variants:
- 1982: RCS (Revision Control System), operates on single files
- 1986 (release in 1990): CVS (Concurrent Versions System), front end of RCS, operates on whole projects
- 1994: VSS (Microsoft Visual SourceSafe)
- 2000: SVN (Apache Subversion), mostly compatible successor of CVS, *still used today*
---
## A Very Short History of Version Control II
Distributed version control:
- Besides remote master version, also local copy of repository
- More memory required, but much better performance
- For a long time: highly fragmented market
- 2000: BitKeeper (originally proprietary software)
- 2005: Mercurial
- 2005: Git
- A few more
Learn more: [Podcast All Things Git: History of VC](https://www.allthingsgit.com/episodes/the_history_of_vc_with_eric_sink.html)
---
## The Only Standard Today: Git
No longer a fragmented market, there is nearly only Git today:
- [Stackoverflow developer survey 2021](https://insights.stackoverflow.com/survey/2021#technology-most-popular-technologies):
> "Over 90% of respondents use Git, suggesting that it is a fundamental tool to being a developer."
- Is this good or bad?
---
## More Facts on Git
- Git itself is open-source: GPL license
- Source code on [GitHub](https://github.com/git/git), contributions are a bit more complicated than a simple PR
- Written mainly in C
- Started by Linus Torvalds, core maintainer since later 2005: Junio Hamano
- **Git** (the version control software) vs. **git** (the command line interface)
---
## Forges
There is a difference between Git and hosting services ([*forges*](https://en.wikipedia.org/wiki/Forge_(software))):
- [GitHub](https://github.com/)
- [GitLab](https://about.gitlab.com/), open-source, hosted e.g. at [IPVS](https://gitlab-sim.informatik.uni-stuttgart.de)
- [Bitbucket](https://bitbucket.org/product/)
- [SourceForge](https://sourceforge.net/)
- many more
- often, more than just hosting, also DevOps
---
# 2. Recap of Git Basics
---
## Expert level poll
Which level do you have?
- **Beginner**: hardly ever used Git
- **User**: pull, commit, push, status, diff
- **Developer**: fork, branch, merge, checkout
- **Maintainer**: rebase, squash, cherry-pick, bisect
- **Owner**: submodules
---
## Overview
[Git overview picture from py-rse](https://merely-useful.tech/py-rse/figures/git-cmdline/git-remote.png)
---
## Demo
- `git --help`, `git commit --help`
- incomplete statement `git comm`
- There is not *the one solution* how to do things with Git. I simply show what I typically use.
- Don't use a client if you don't understand the command line `git`
- (1) Look at GitHub
- [preCICE repository](https://github.com/precice/precice)
- default branch `develop`
- fork -> my fork
- (2) Working directory:
- ZSH shell shows git branches
- `git remote -v` (I have upstream, myfork, ...)
- mention difference between ssh and https (also see GitHub)
- get newest changes `git pull upstream develop`
- `git log` -> I use special format, see `~/.gitconfig`,
- check log on GitHub; explain short hash
- `git branch`
- `git branch add-demo-feature`
- `git checkout add-demo-feature`
- (3) First commit
- `git status` -> always tells you what you can do
- `vi src/action/Action.hpp` -> add `#include "MagicHeader.hpp"`
- `git diff`, `git diff src/com/Action.hpp`, `git diff --color-words`
- `git status`, `git add`, `git status`
- `git commit` -> "Include MagicHeader in Action.hpp"
- `git status`, `git log`, `git log -p`, `git show`
- (4) Change or revert things
- I forgot to add sth: `git reset --soft HEAD~1`, `git status`
- `git diff`, `git diff HEAD` because already staged
- `git log`
- `git commit`
- actually all that is nonsense: `git reset --hard HEAD~1`
- modify again, all nonsense before committing: `git checkout src/action/Action.hpp`
- (5) Stash
- while working on unfinished feature, I need to change / test this other thing quickly, too lazy for commits / branches
- `git stash`
- `git stash pop`
- (6) Create PR
- create commit again
- preview what will be in PR: `git diff develop..add-demo-feature`
- `git push -u myfork add-demo-feature` -> copy link
- explain PR template
- explain target branch
- explain "Allow edits by maintainers"
- cancel
- my fork -> branches -> delete
- (7) Check out someone else's work
- have a look at an existing PR, look at all tabs, show suggestion feature
- but sometimes we want to really build and try sth out ...
- `git remote -v`
- `git remote add alex git@github.com:ajaust/precice.git` if I don't have remote already (or somebody else)
- `git fetch alex`
- `git checkout -t alex/[branch-name]`
- I could now also push to `ajaust`'s remote
---
## Useful Links
- [Official documentation](http://git-scm.com/doc)
- [Video: Git in 15 minutes: basics, branching, no remote](https://www.youtube.com/watch?v=USjZcfj8yxE)
- [The GitHub Blog: Commits are snapshots, not diffs](https://github.blog/2020-12-17-commits-are-snapshots-not-diffs/)
- Chapters [6](https://merely-useful.tech/py-rse/git-cmdline.html) and [7](https://merely-useful.tech/py-rse/git-advanced.html) of Research Software Engineering with Python
- [Podcast All Things Git: History of VC](https://www.allthingsgit.com/episodes/the_history_of_vc_with_eric_sink.html)
- [git purr](https://girliemac.com/blog/2017/12/26/git-purr/)
---
# 3. Merge vs. Rebase
---
## Linear History
- Commits are snapshots + pointer to parent, not diffs
- But for linear history, this makes no difference
- Each normal commit has one parent commit
- `c05f017^` <-- `c05f017`
- `A` = `B^` <-- `B`
- (`^` is the same as `~1`)
- Pointer to parent commit goes into hash
- `git show` gives diff of commit to parent
---
## Merge Commits
- `git checkout main && git merge feature`
- A merge commit (normally) has two parent commits `M^1` and `M^2` (don't confuse `^2` with `~2`)
- Can't show unique diff
- First parent relative to the branch you are on (`M^1` = `C`, `M^2` = `E`)
- `git show`
- `git show`: *"combined diff"*
- GitHub: `git show --first-parent`
- `git show -m`: separate diff to all parents
---
## Why is a Linear History Important?
We use here:
> Linear history := no merge commits
- Merge commits are hard to understand per se.
- A merge takes all commits from `feature` to `main` (on `git log`). --> Hard to understand
- Developers often follow projects by reading commits (reading the diffs). --> Harder to read (where happened what)
- Tracing bugs easier with linear history (see `git bisect`)
- Example: We know a bug was introduced between `v1.3` and `v1.4`.
---
## How to get a Linear History?
- Real conflicts are very rare in real projects, most merge commits are false positives (not conflicts) and should be avoided.
- If there are no changes on `main`, `git merge` does a *"fast-forward"* merge (no merge commit).
- If there are changes on `main`, rebase `feature` branch.
---
## Rebase
- `git checkout feature && git rebase main`
- States of issues change (and new parents) --> history is rewritten
- If `feature` is already on remote, it needs a force push `git push --force myfork feature` (or `--force-with-lease`).
- Be careful: Only use rebase if **only you** work on a branch (a local branch or a branch on your fork).
- For local branches very helpful: `git pull --rebase` (fetch & rebase)
---
## GitHub PR Merge Variants
- GitHub offers three ways to merge a non-conflicting (no changes in same files) PR:
- Create a merge commit
- Squash and merge
- Rebase and merge
- Look at a PR together, e.g. [PR 1432 from preCICE](https://github.com/precice/precice/pull/1824) (will be closed eventually)
> What do the options do?
---
## Squash and Merge
- ... squashes all commits into one
- Often, single commits of feature branch are important while developing the feature,
- ... but not when the feature is merged
- Works well for small feature PRs
- ... also does a rebase (interactively, `git rebase -i`)
---
## Conflicts
> But what if there is a conflict?
- Resolve by rebasing `feature` branch (recommended)
- Or resolve by merging `main` into `feature`
---
## Summary and Remarks
- Try to keep a linear history with rebasing whenever reasonable
- Don't use rebase on a public/shared branch during development
- Squash before merging if reasonable
- Delete `feature` branch after merging
- Local view: `git log --graph`
- Remote view on GitHub, e.g. [for preCICE](https://github.com/precice/precice/network)
---
## Further Reading
- [Bitbucket docs: "Merging vs. Rebasing"](https://www.atlassian.com/git/tutorials/merging-vs-rebasing)
- [Hackernoon: "What's the diff?"](https://hackernoon.com/git-merge-vs-rebase-whats-the-diff-76413c117333)
- [GitHub Blog: "Commits are snapshots, not diffs"](https://github.blog/2020-12-17-commits-are-snapshots-not-diffs/)
- [Stack Overflow: "Git show of a merge commit"](https://stackoverflow.com/questions/40986518/git-show-of-a-merge-commit?)
---
# 4. Working in Teams / Git Workflows
---
## Why Workflows?
- Git offers a lot of flexibility in managing changes.
- When working in a team, some agreements need to be made however (especially on how to work with branches).
---
## Which Workflow?
- There are standard solutions.
- It depends on the size of the team.
- Workflow should enhance effectiveness of team, not be a burden that limits productivity.
---
## Centralized Workflow
- Only one branch: the `main` branch
- Keep your changes in local commits till some feature is ready
- If ready, directly push to `main`; no PRs, no reviews
- Conflicts: fix locally (push not allowed anyway), use `git pull --rebase`
- **Good for**: small teams, small projects, projects that are anyway reviewed over and over again
- Example: LaTeX papers
- Put each section in separate file
- Put each sentence in separate line
---
## Feature Branch Workflow
- Each feature (or bugfix) in separate branch
- Push feature branch to remote, use descriptive name
- e.g. issue number in name if each branch closes one issue
- `main` should never contain broken code
- Protect direct push to `main`
- PR (or MR) with review to merge from feature branch to `main`
- Rebase feature branch on `main` if necessary
- Delete remote branch once merged and no longer needed (one click on GitHub after merge)
- **Good for**: small teams, small projects, prototyping, websites (continuous deployment), documentation
- Aka. [trunk-based development](https://www.atlassian.com/continuous-delivery/continuous-integration/trunk-based-development) or [GitHub flow](https://guides.github.com/introduction/flow/)
---
## Gitflow
- [Visualization by Vincent Driessen](https://nvie.com/img/git-model@2x.png), from [original blog post in 2010](https://nvie.com/posts/a-successful-git-branching-model/)
- `main` and `develop`
- `main` contains releases as tags
- `develop` contains latest features
- Feature branches created of `develop`, PRs back to `develop`
- Protect `main` and (possibly) `develop` from direct pushes
- Dedicated release branches (e.g., `v1.0`) created of `develop`
- Tested, fixed, merged to `main`
- Afterwards, tagged, merged back to `develop`
- Hotfix branches directly of and to `main`
- **Good for**: software with users, larger teams
- There is a tool `git-flow`, a wrapper around `git`, e.g. `git flow init` ... but not really necessary IMHO
---
## Forking Workflow
- Gitflow + feature branches on other forks
- More control over access rights, distinguish between maintainers and external contributors
- Should maintainers also use branches on their forks?
- Makes overview of branches easier
- Distinguishes between prototype branches (on fork, no PR), serious enhancements (on fork with PR), joint enhancements (on upstream)
- **Good for**: open-source projects with external contributions (used more or less in preCICE)
---
## Do Small PRs
- For all workflows, it is better to do small PRs
- Easier to review
- Faster to merge --> fewer conflicts
- Easier to squash
---
## Quick Reads
- [Atlassian docs on workflows](https://www.atlassian.com/git/tutorials/comparing-workflows)
- [Original gitflow blog post](https://nvie.com/posts/a-successful-git-branching-model/)
- [Trunk-based development](https://www.atlassian.com/continuous-delivery/continuous-integration/trunk-based-development)
- [GitHub flow](https://guides.github.com/introduction/flow/)
- [How to keep pull requests manageable](https://gist.github.com/sktse/569cb192ce1518f83db58567591e3205)
---
# 5. GitHub / GitLab Standards
---
## What Do We Mean With Standards?
- GitHub uses standards or conventions.
- Certain files or names trigger certain behavior automatically.
- Many are supported by most forges.
- **This is good.**
- Everybody should know them.
---
## Special Files
Certain files lead to special formatting (normally directly at root of repo):
- `README.md`
- ... contains meta information / overview / first steps of software.
- ... gets rendered on landing page (and in every folder).
- `LICENSE`
- ... contains software license.
- ... gets rendered on right sidebar, when clicking on license, and on repo preview.
- `CONTRIBUTING.md`
- ... contains guidelines for contributing.
- First-time contributors see banner.
- `CODE_OF_CONDUCT.md`
- ... contains code of conduct.
- ... gets rendered on right sidebar.
---
## Issues and PRs
- Templates for description in `.github` folder
- `closes #34` (or several other keywords: `fixes`, `resolves`) in commit message or PR description will close issue 34 when merged.
- `help wanted` label gets rendered on repo preview (e.g. *"3 issues need help"*).
---
# 6. Commit Messages
---
## Commit Messages (1/2)
- Consistent
- Descriptive and concise (such that complete history becomes skimmable)
- Explain the "why" (the "how" is covered in the diff)
---
## Commit Messages (2/2)
[The seven rules of a great Git commit message](https://chris.beams.io/git-commit/):
- Separate subject from body with a blank line.
- Limit the subject line to 50 characters.
- Capitalize the subject line.
- Do not end the subject line with a period.
- Use the imperative mood in the subject line.
- Wrap the body at 72 characters.
- Use the body to explain what and why vs. how.