Don't always derefence ref

Add change 34 to instructions
Dereference refs when reading and writing
2024-07-06 19:50:52 +02:00 · 2024-07-06 19:49:59 +02:00 · 2024-06-29 18:48:21 +02:00 · 2024-06-29 18:47:23 +02:00 · 2024-06-14 16:29:58 +02:00 · 2024-06-14 16:29:25 +02:00
34 changed files with 1035 additions and 4 deletions
--- a/how_to/Change_05.md
+++ b/how_to/Change_05.md
@@ -0,0 +1,24 @@
 - cat-file: Print hashed objects
 This command is the "opposite" of `hash-object`: it can print an object by its
 OID. Its implementation just reads the file at ".ugit/objects/{OID}".
 The names `hash-object` and `cat-file` aren't the clearest of names, but they
 are the names that Git uses so we'll stick to them for consistency.
 We can now try the full cycle:
 ```
 $ cd /tmp/new
 $ ugit init
 Initialized empty ugit repository in /tmp/new/.ugit
 $ echo some file > bla
 $ ugit hash-object bla
 0e08b5e8c10abc3e455b75286ba4a1fbd56e18a5
 $ ugit cat-file 0e08b5e8c10abc3e455b75286ba4a1fbd56e18a5
 some file
 ```
 Note that the name of the file (bla) wasn't preserved as part of this process,
 because, again, the object database is just about storing bytes for later
 retrieval and it doesn't care which filename the bytes came from.
--- a/how_to/Change_06.md
+++ b/how_to/Change_06.md
@@ -0,0 +1,17 @@
 - data: Add types to objects
 As we will soon see, there will be different logical types of objects that are
 used in different contexts (even though, from the Object Database's point of
 view, they are just all bytes). In order to lower the chance of using an object
 in the wrong context we're going to add a type tag for each object.
 The type is just a string that's going to be prepended to the start of the file,
 followed by a null byte. When reading the file later we'll extract the type and
 verify that it's indeed the expected type.
 The default type is going to be `blob`, since by default an object is a
 collection of bytes with no further semantic meaning.
 We can also pass `expected=None` to `get_object()` if we don't want to verify
 the type. This is useful for the `cat-file` CLI command which is a debug command
 used for printing all objects.
--- a/how_to/Change_07.md
+++ b/how_to/Change_07.md
--- a/how_to/Change_08.md
+++ b/how_to/Change_08.md
@@ -0,0 +1,26 @@
 - write-tree: List files
 The next command is `write-tree`. This command will take the current working
 directory and store it to the object database. If `hash-object` was for storing
 an individual file, then `write-tree` is for storing a whole directory.
 Like `hash-object`, `write-tree` is going to give us an OID after it's done and
 we'll be able to use the OID in order to retrieve the directory at a later time.
 In Git's lingo a "tree" means a directory.
 We'll get into the details in later changes, in this change we'll only prepare
 the code around the feature:
  + Create a `write-tree` CLI command
  + Create a `write_tree()` function in base module. Why in base module and not
  in data module? Because `write_tree()` is not going to write to disk directly
  but use the object database provided by data to store the directory. Hence it
  belongs to the higher-level base module.
  + Add code to `write_tree()` to print a directory recursively. For now nothing
  is written anywhere, but we just coded the boilerplate to recursively scan a
  directory.
 We continue in the next change.
--- a/how_to/Change_09.md
+++ b/how_to/Change_09.md
@@ -0,0 +1,8 @@
 - write-tree: Ignore .ugit files
 If we run `ugit write-tree`, we will see that it also prints the content of the
 .ugit directory. This directory isn't part of the user's files, so let's ignore
 it.
 Actually, I created a separate `is_ignored()` function. This way if we have any
 other files we want to ignore later we have one place to change.
--- a/how_to/Change_10.md
+++ b/how_to/Change_10.md
@@ -0,0 +1,12 @@
 - write-tree: Hash the files
 Instead of only printing the file name, let's put all files in the object
 database. For now we'll print their OID and their name.
 Notice that instead of getting one OID to represent a directory we now get a
 separate OID for each file, which isn't very useful. Plus, note that the names
 of the files aren't stored in the object database, they are just printed and
 then the information is discarded.
 So at this stage `write-tree` isn't useful (it just saves a bunch of files as
 blobs) but the next change will fix it.
--- a/how_to/Change_11.md
+++ b/how_to/Change_11.md
@@ -0,0 +1,62 @@
 - write-tree: Write tree objects
 Now comes the fun part, where we turn a collection of separate files into a
 single object that represents a directory.
 The idea is that we will create one additional object that collects all the data
 necessary to store a complete directory. For example, if we have a directory
 with two files:
 ```
 $ ls
 cats.txt    dogs.txt
 ```
 And we want to save the directory, we will first put the individual files into
 the object database:
 ```
 $ ugit hash-object cats.txt
 91a7b14a584645c7b995100223e65f8a5a33b707
 $ ugit hash-object dogs.txt
 fa958e0dd2203e9ad56853a3f51e5945dad317a4
 ```
 Then we will create a "tree" object that has the content of:
 ```
 91a7b14a584645c7b995100223e65f8a5a33b707 cats.txt
 fa958e0dd2203e9ad56853a3f51e5945dad317a4 dogs.txt
 ```
 And we will put this tree object into the object database as well. Then the OID
 of the tree object will actually represent the entire directory! Why? Because we
 can first retrieve the tree object by its OID, then see all the files it
 contains (their names and OIDs) and then read all the OIDs of the files to get
 their actual content.
 What if our directory contains other directories? We'll just create tree objects
 for them as well and we'll allow one tree object to point to another:
 ```
 $ ls
 cats.txt    dogs.txt    other/
 $ ls other/
 shoes.jpg
 ```
 The root tree object will look like this:
 ```
 blob 91a7b14a584645c7b995100223e65f8a5a33b707 cats.txt
 blob fa958e0dd2203e9ad56853a3f51e5945dad317a4 dogs.txt
 tree 53891a3c27b17e0f8fd96c058f968d19e340428d other
 ```
 Note that we added a type to each entry so that we know if it's a file or a
 directory. The tree that represents the "other" directory (OID 53891a3c27b17e0f8fd96c058f968d19e340428d) looks like:
 ```
 blob 0aa186b09fd81e8cf449ba10eee6aff9711cc1ac shoes.jpg
 ```
 We can think about this structure as a tree you know from Computer Science where
 each entries' OID as a pointer to either another tree or to a file (leaf node).
 Note that we actually save the tree objects with type "tree" in
 `data.hash_object()` since we don't want the trees to be confused with regular
 files.
--- a/how_to/Change_12.md
+++ b/how_to/Change_12.md
@@ -0,0 +1,28 @@
 - read-tree: Extract tree from object
 This command will take an OID of a tree and extract it to the working directory.
 Kind of the opposite of `write-tree`.
 I divided the implementation into a few layers:
 `_iter_tree_entries` is a generator that will take an OID of a tree, tokenize it
 line-by-line and yield the raw string values.
 `get_tree` uses `_iter_tree_entries` to recursively parse a tree into a
 dictionary.
 `read_tree` uses `get_tree` to get the file OIDs and writes them into the
 working directory.
 Now we can actually save versions of the working directory! It's nothing like
 proper version control, but we can see that a super basic flow is possible:
  + Imagine you work on some code and you want to save a version.
  + You run ```ugit write-tree```.
  + You remember that OID that was printed out (write it on a post-it note or
  something :)).
  + Continue working and repeat steps 2 and 3 as necessary.
  + If you want to return to a previous version, use `ugit read-tree` to restore
  it to the working directory.
 Is it convenient to use? No. But it's just the beginning!
--- a/how_to/Change_13.md
+++ b/how_to/Change_13.md
@@ -0,0 +1,7 @@
 - read-tree: Delete all existing stuff before reading
 This is done so that we won't have any old files left around after a read-tree.
 Before this change, if we save tree A which contains only `a.txt`, then we save
 tree B which contains `a.txt` and `b.txt` and then we `read-tree` A, we will
 have `b.txt` left over in the working directory.
--- a/how_to/Change_14.md
+++ b/how_to/Change_14.md
@@ -0,0 +1,31 @@
 - commit: Create commit
 So far we were able to save versions of a directory (with `write-tree`), but
 without any additional context. In reality, when we save a snapshot we would
 like to attach data such as:
 + Message describing it
 + When the snapshot was created
 + Who created the snapshot
 + ...
 We will create a new type of object called a "commit" that will store all this
 information. A commit will just be a text file stored in the object database
 with the type of `'commit'`.
 The first lines in the commit will be key-values, then an empty line will mark
 the end of the key-values and then the commit message will follow. Like this:
 ```
 tree 5e550586c91fce59e0006799e0d46b3948f05693
 author Nikita Leshenko
 time 2019-09-14T09:31:09+00:00
 This is the commit message!
 ```
 For now we'll just write the "tree" key and the commit message to the commit
 object.
 We will create a new `ugit commit` command that will accept a commit message,
 snapshot the current directory using `ugit write-tree` and save the resulting
 object.
--- a/how_to/Change_15.md
+++ b/how_to/Change_15.md
@@ -0,0 +1,10 @@
 - commit: Record hash of last commit to HEAD
 I would like to link new commits to older commits. Right now, if we make changes
 in the working directory and make periodic commits, each commit will be a
 standalone object, separate from all other commits. The motivation for linking
 them together is so that we can look at the commits as a series of snapshots in
 some order.
 Before we can do it, let's record the OID of the last commit that we created.
 We'll call the last commit the "HEAD" and just put the OID in .ugit/HEAD file.
--- a/how_to/Change_16.md
+++ b/how_to/Change_16.md
@@ -0,0 +1,23 @@
 - commit: set parent to HEAD
 When creating a new commit, we will use the HEAD to link the new commit to the
 previous commit. We'll call the previous commit the "parent commit" and we will
 save its OID in the "parent" key on the commit object.
 For example, HEAD is currently bd0de093f1a0f90f54913d694a11cccf450bd990 and we
 create a new commit, the new commit will look like this in the object store:
 ```
 tree 50bed982245cd21e2798f179e0b032904398485b
 parent bd0de093f1a0f90f54913d694a11cccf450bd990
 This is the commit message!
 ```
 The first commit in the repository will obviously have no parent.
 Now we can retrieve the entire list of commits just by referencing the last
 commit! We can start from the HEAD, read the "parent" key on the HEAD commit and
 discover the commit before HEAD. Then read the parent of that commit, and go
 back on and on... This is basically a linked list implemented over the object
 database.
--- a/how_to/Change_17.md
+++ b/how_to/Change_17.md
@@ -0,0 +1,12 @@
 - log: Implement
 `log` will walk the list of commits and print them.
 We will start by implementing `get_commit()` that will parse a commit object by
 OID.
 Then in the CLI module we will start from the HEAD commit and walk its parents
 until we reach a commit without a parent.
 The result is that the entire commit history is printed to the screen once we
 run `ugit log`.
--- a/how_to/Change_18.md
+++ b/how_to/Change_18.md
@@ -0,0 +1,5 @@
 - log: Add oid parameter
 Just a small cosmetic change: Instead of always printing the list of commits
 from HEAD, add an optional parameter to specify an alternative commit OID to
 start from. By default it will still be HEAD.
--- a/how_to/Change_19.md
+++ b/how_to/Change_19.md
@@ -0,0 +1,90 @@
 - checkout: Read tree and move HEAD
 When given a commit OID, `ugit checkout` will "checkout" that commit, meaning
 that it will populate the working directory with the content of the commit and
 move HEAD to point to it.
 This is a small but important change and it greatly expands the power of ugit in
 two ways.
 First, it allows us to travel conveniently in history. If we've made a handful
 of commits and we would like to revisit a previous commit, we can now "checkout"
 that commit to the working directory, play with it (compile, run tests, read
 code, whatever we want) and checkout the latest commit again to resume working
 where we've left.
 You might be wondering why `checkout` is needed when we could just use
 `read-tree`, and the answer is that moving HEAD in addition to reading the tree
 allows us to record which commit is checked out right now. If we would only use
 `read-tree` and later forget which commit we are looking at, we will see a bunch
 of files in the working directory and have no idea where they came from. On the
 other hand, if we use `checkout`, the commit will be recorded in HEAD and we can
 always know what we're looking at (by running `ugit log` for example and seeing the first entry).
 The second way by which `checkout` expands the power of ugit is by allowing
 multiple branches of history. Let me explain: So far we have set HEAD to point
 to the latest commit that was created. It means that all our commits were
 linear, each new commit was added on top of the previous. The `checkout`
 command now allows us to move HEAD to any commit we wish. Then, new commits will
 be created on top of the current HEAD commit, which isn't necessarily the last
 created commit.
 For example, imagine that we're working on some code. So far, we have created a
 few commits, represented by a graph:
 ```
 o-----o-----o-----o
 ^                 ^
 first commit      HEAD
 ```
 Then we wanted to code a new feature. We created a few commits while working on
 the feature (new commits represented by @):
 ```
 o-----o-----o-----o-----@-----@-----@
 ^                                   ^
 first commit                        HEAD
 ```
 Now we have an alternative idea for implementing that feature. We would like to
 go back in time and try a different implementation, without throwing away the
 current implementation. We can remember the current HEAD and run `ugit checkout`
 to go back in time, by providing the OID of the commit before the new feature
 was implemented (that OID can be discovered with `ugit log`).
 ```
 o-----o-----o-----o-----@-----@-----@
 ^                 ^
 first commit      HEAD
 ```
 The working directory will effectively go back in time. We can start working on
 an alternative implementation and create new commit. The new commits will be on
 top of HEAD and look like this (represented by $):
 ```
 o-----o-----o-----o-----@-----@-----@
 ^                  \
 first commit        ----$-----$
                              ^
                              HEAD
 ```
 See how the history now contains two "branches". We can actually switch back and
 forth between them and work on them in parallel. Finally, we can checkout the
 preferred implementation and work from it on future code. Assuming that we liked
 the second branch, we'll just keep working from it, and future commits will look
 like this:
 ```
 o-----o-----o-----o-----@-----@-----@
 ^                  \
 first commit        ----$-----$-----o-----o-----o-----o-----o
                                                            ^
                                                            HEAD
 ```
 Pretty useful, right? We've just introduced a simple form of branching history.
 Note that something pretty cool happened here: The implementation of checkout is
 very simple (we just call `read_tree` and update HEAD) but the implications of
 checkout are quite big - we can suddenly have a branching workflow which might
 look complicated but it is actually a direct consequence of what we implemented
 in previous changes. This is why I believe learning Git internals from the
 bottom up is useful - we can see how simple concepts compose into complicated
 functionality.
--- a/how_to/Change_20.md
+++ b/how_to/Change_20.md
@@ -0,0 +1,41 @@
 - tag: Implement CLI command
 Now that we have branching history we have some OIDs we need to keep track of.
 Assume we have two branches (continuing from the example we had for `checkout`):
 ```
 o-----o-----o-----o-----@-----@-----@
 ^                  \                ^
 first commit        ----$-----$     6c9f80a187ba39b4...
                              ^
                              d8d43b0e3a21df0c...
 ```
 If we want to switch back and forth between the two "branches" with `checkout`,
 we need to remember both OIDs, which are quite long.
 To make our lives easier, let's implement a command to attach a name to an OID.
 Then we'll be able to refer to the OID by that name.
 The end result will look like this:
 ```
 $ # Make some changes
 ...
 $ ugit commit
 d8d43b0e3a21df0c845e185d08be8e4028787069
 $ ugit tag my-cool-commit d8d43b0e3a21df0c845e185d08be8e4028787069
 $ # Make more changes
 ...
 $ ugit commit
 e549f09bbd08a8a888110b07982952e17e8c9669
 $ ugit checkout my-cool-commit
        or
 $ ugit checkout d8d43b0e3a21df0c845e185d08be8e4028787069
 ```
 The last two commands are equivalent, because "my-cool-commit" is a tag that
 points to d8d43b0e3a21df0c845e185d08be8e4028787069.
 We will implement this in a few steps. The first step is to create a CLI
 commmand that call the relevant command in the base module. The base module does
 nothing at this stage.
--- a/how_to/Change_21.md
+++ b/how_to/Change_21.md
@@ -0,0 +1,23 @@
 - tag: Generalize HEAD to refs
 As part of implementing `tag`, we'll generalize the way we handle HEAD. If you
 think about it, HEAD and tags are similar. They are both ways for ugit to attach
 a name to an OID. In case of HEAD, the name is hardcoded by ugit; in case of
 tags, the name will be provided by the user. It makes sense to handle them
 similarly in *data.py*.
 In *data.py*, let's extend the function `set_HEAD` and `get_HEAD` to
 `update_ref` and `get_ref`. "Ref" is a short for reference, and that's the name
 Git uses. The function will now accept the name of the ref and write/read it as
 a file under *.ugit* directory. Logically, a ref is a named pointer to an object.
 The important change is in *data.py*. The rest of the changes just rename some
 functions:
 ```
 - get_HEAD()    ->   get_ref('HEAD')
 - set_HEAD(oid) ->   update_ref('HEAD', oid)
 ```
 Note that we didn't change any behaviour of ugit here, this is purely
 refactoring.
--- a/how_to/Change_22.md
+++ b/how_to/Change_22.md
@@ -0,0 +1,28 @@
 - tag: Create the tag ref
 After we've implemented refs in the previous change, it's time to create a ref
 when the user creates a tag.
 `create_tag` now calls update_ref with the tag name to actually create the tag.
 For namespacing purposes, we'll put all tags under *refs/tags/*. That is, if the
 user creates *my-cool-commit* tag, we'll create *refs/tags/my-cool-commit* ref
 to point to the desired OID.
 Then we'll update *data.py* to handle this "namespaced" ref. Since we can't have
 a / in the file name, we'll create directories for it. Now if a ref
 *refs/tags/sometag* is created, it will be placed under *.ugit/refs/tags* in a
 file named *sometag*.
 To verify that this code works, you can run:
 ```
 $ ugit tag test
 ```
 And make sure that the tag points to HEAD:
 ```
 $ cat .ugit/refs/tags/test
 $ cat .ugit/HEAD
 ```
 The last two commands should give the same output.
--- a/how_to/Change_23.md
+++ b/how_to/Change_23.md
@@ -0,0 +1,22 @@
 - tag: Resolve name to oid in argparse
 It's nice that we can create tags, but now let's actually make them usable from
 the CLI.
 In *base.py*, we'll create `get_oid` to resolve a "name" to an OID. A name can
 either be a ref (in which case `get_oid` will return the OID that the ref points
 to) or an OID (in which case `get_oid` will just return that same OID).
 Next, we'll modify the argument parser in *cli.py* to call `get_oid` on all
 arguments which are expected to be an OID. This way we can pass a ref there
 instead of an OID.
 At this point we can do something like:
 ```
 $ ugit tag mytag d8d43b0e3a21df0c845e185d08be8e4028787069
 $ ugit log refs/tags/mytag
 # Will print log of commits starting at d8d43b0e...
 $ ugit checkout refs/tags/mytag
 # Will checkout commit d8d43b0e...
 etc...
 ```
--- a/how_to/Change_24.md
+++ b/how_to/Change_24.md
@@ -0,0 +1,18 @@
 - base: Try different directories when searching for a ref
 In the previous change, you might have noticed that we need to spell out the
 full name of a tag (Like *refs/tags/mytag*). This isn't very convenient, we
 would like to have shorter command names. For example, if we've created "mytag"
 tag, we should be able to do `ugit log mytag` rather than having to specify
 `ugit log refs/tags/mytag`.
 We'll extend `get_oid` to search in different ref subdirectories when resolving
 a name. We'll search in:
 ```
    Root (.ugit): This way we can specify refs/tags/mytag
    .ugit/refs: This way we can specify tags/mytag
    .ugit/refs/tags: This way we can specify mytag
    .ugit/refs/heads: This will be needed for a future change
 ```
 If we find the requested name in any of the directories, return it. Otherwise
 assume that the name is an OID.
--- a/how_to/Change_25.md
+++ b/how_to/Change_25.md
@@ -0,0 +1,12 @@
 - cli: pass HEAD by default in argparse
 First, make "@" be an alias for HEAD. (Implemented in `get_oid`)
 Second, do a little refactoring in *cli.py*. Some commands accept an optional
 OID argument and if the argument isn't provided it defaults to HEAD. For example
 `git log` can get an OID to start logging from, but by default it logs all
 commits before HEAD.
 Instead of having each command implement this logic, let's just make "@" (HEAD)
 be the default value for those commands. The relevant commands at this stage
 are `log` and `tag`. More will follow.
--- a/how_to/Change_26.md
+++ b/how_to/Change_26.md
@@ -0,0 +1,14 @@
 - k: Print refs
 Now that we have refs and a potentially branching commit history, it's a good
 idea to create a visualization tool to see all the mess that we've created.
 The visualization tool will draw all refs and all the commits pointed by the refs.
 Our command to run the tool will be called `ugit k`, similar to `gitk` (which is
 a graphical visualization tool for Git).
 We'll create a new `k` command in *cli.py*. We'll create `iter_refs` which is a
 generator which will iterate on all available refs (it will return HEAD from the
 ugit root directory and everything under *.ugit/refs*). As a first step, let's
 just print all refs when running `k`.
--- a/how_to/Change_27.md
+++ b/how_to/Change_27.md
@@ -0,0 +1,21 @@
 - k: Iterate commits and parents
 In addition to printing the refs, we'll also print all OIDs that are reachable
 from those refs. We'll create `iter_commits_and_parents`, which is a generator
 that returns all commits that it can reach from a given set of OIDs.
 Note that `iter_commits_and_parents` will return an OID once, even if it's
 reachable from multiple refs. Here, for example:
 ```
 o<----o<----o<----o<----@<----@<----@
 ^                  \                ^
 first commit        -<--$<----$     refs/tags/tag1
                              ^
                              refs/tags/tag2
 ```
 We can reach the first commit by following the parents of *tag1* or by following
 the parents of *tag2*. Yet if we call `iter_commits_and_parents({tag1, tag2})`,
 the first commit will be yielded only once. This property will be useful later.
 (Note that nothing is visualized yet, we're preparing for that.)
--- a/how_to/Change_28.md
+++ b/how_to/Change_28.md
@@ -0,0 +1,18 @@
 - k: Render graph
 `k` is supposed to be a visualization tool, but so far we've just printed a
 bunch of OIDs... Now comes the visualization part!
 There's a convenient file format called "dot" that can describe a graph. This is
 a textual format. We'll generate a graph of all commits and refs in dot format
 and then visualize it using the "dot" utility that comes with Graphviz.
 (If you're unfamiliar with dot or Graphviz please look it up online.)
 The graph will contain a node for each commit, that points to the parent commit.
 The graph will also contain a node for each ref, which points to the relevant
 commit.
 At this point, `ugit k` is fully functional and I encourage you to play with it.
 Create a crazy branching history and a bunch of tags and see for yourself that
 `ugit k` can draw all that visually.
--- a/how_to/Change_29.md
+++ b/how_to/Change_29.md
@@ -0,0 +1,9 @@
 - log: Use `iter_commits_and_parents`
 Refactoring ahead! Since we have `iter_commits_and_parents` from `k`, let's also
 use this function in `log`. We'll need to adjust it a bit to use
 `collections.deque` instead of a set so that the order of commits is deterministic.
 This generalization might seem unneeded at this point, but it will be useful
 later. (Note for the advanced folks: When we implement merge commits that have
 multiple parents, this generic way to iterate will come in handy.)
--- a/how_to/Change_30.md
+++ b/how_to/Change_30.md
@@ -0,0 +1,82 @@
 - branch: Create new branch
 Tags were an improvement since they freed us from the burden of remembering OIDs
 directly. But they are still somewhat inconvenient, since they are static. Let
 me illustrate:
 ```
 o-----o-----o-----o-----o-----o-----o
                   \                ^
                    ----o-----o  tag2,HEAD
                              ^
                           tag1
 ```
 If we have the above situation, we can easily flip between *tag1* and *tag2* with
 `checkout`. But what happens if we do
    - ugit checkout tag2
    - Make some changes
    - ugit commit?
 Now it looks like this:
 ```
 o-----o-----o-----o-----o-----o-----o-----o
                   \                ^     ^
                    ----o-----o  tag2     HEAD
                              ^
                           tag1
 ```
 The upper branch has advanced, but *tag2* still points to the previous commit.
 This is by design, since tags are supposed to just name a specific OID. So if we
 want to remember the new HEAD position we need to create another tag.
 But now let's create a ref that will "move forward" as the branch grows. Just
 like we have `ugit tag`, we'll create `ugit branch` that will point a branch to
 a specific OID. This time the ref will be created under *refs/heads*.
 At this stage, `branch` doesn't look any different from tag (the only difference
 is that the branch is created under *refs/heads* rather than *refs/tags*). But
 the magic will happen once we try to `checkout` a branch.
 So far when we checkout anything we update HEAD to point to the OID that we've
 just checked out. But if we checkout a branch by name, we'll do something
 different, we will update HEAD to point to the **name of the branch!** Assume
 that we have a branch here:
 ```
 o-----o-----o-----o-----o-----o-----o
                   \                ^
                    ----o-----o tag2,branch2
                              ^
                           tag1
 ```
 Running `ugit checkout branch2` will create the following situation:
 ```
 o-----o-----o-----o-----o-----o-----o
                   \                ^
                    ----o-----o tag2,branch2 <--- HEAD
                              ^
                           tag1
 ```
 You see? HEAD points to *branch2* rather than the OID of the commit directly.
 Now if we create another commit, ugit will update HEAD to point to the latest
 commit (just like it does every time) but as a side effect it will also update
 *branch2* to point to the latest commit.
 ```
 o-----o-----o-----o-----o-----o-----o-----o
                   \                ^     ^
                    ----o-----o  tag2     branch2 <--- HEAD
                              ^
                           tag1
 ```
 This way, if we checkout a branch and create some commits on top of it, the ref
 will always point to the latest commit.
 But right now HEAD (or any ref for that matter) may only point to an OID. It
 can't point to another ref, like I described above. So our next step would be
 to implement this concept. To mirror Git's terminology, we will call a ref that
 points to another ref a "symbolic ref". Please see the next change for an
 implementation of symbolic refs.
--- a/how_to/Change_31.md
+++ b/how_to/Change_31.md
@@ -0,0 +1,5 @@
 - data: Implement symbolic refs idea
 If the file that represents a ref contains an OID, we'll assume that the ref
 points to an OID. If the file contains the content `ref: <refname>`, we'll
 assume that the ref points to `<refname>` and we will dereference it recursively.
--- a/how_to/Change_32.md
+++ b/how_to/Change_32.md
@@ -0,0 +1,8 @@
 - data: Create Refvalue container
 To make working with symbolic refs easier, we will create a `Refvalue` container
 to represent the value of a ref. `Refvalue` will have a property symbolic that
 will say whether it's a symbolic or a direct ref.
 This change is just refactoring, we will wrap every OID that is written or read
 from a ref in a `RefValue`.
--- a/how_to/Change_33.md
+++ b/how_to/Change_33.md
@@ -0,0 +1,17 @@
 data: Dereference refs when reading and writing
 Now we'll dereference symbolic refs not only when reading them but also when
 writing them.
 We'll implement a helper function called `_get_ref_internal` which will return
 the path and the value of the last ref pointed by a symbolic ref. In simple words:
 - When given a non-symbolic ref, `_get_ref_internal` will return the ref name
 and value.
 - When given a symbolic ref, `_get_ref_internal` will dereference the ref
 recursively, and then return the name of the last (non-symbolic) ref that points
 to an OID, plus its value.
 Now `update_ref` will use `_get_ref_internal` to know which ref it needs to update.
 Additionally, we'll use `_get_ref_internal` in `get_ref`.
--- a/how_to/Change_34.md
+++ b/how_to/Change_34.md
@@ -0,0 +1,15 @@
 - data: Don't always dereference refs (for `ugit k`)
 Actually, it's not always desirable to dereference a ref all the way. Sometimes
 we would like to know at which ref a symbolic ref points, rather than the final
 OID. Or we would like to update a ref directly, rather then updating the last
 ref in the chain.
 One such usecase is `ugit k`. When visualizing refs it would be nice to see
 which ref points to which ref. We will see another usecase soon.
 To accomodate this, we will add a `deref` option to `get_ref`, `iter_refs` and
 `update_ref`. If they will be called with `deref=False`, they will work on the
 raw value of a ref and not dereference any symbolic refs.
 Then we will update `k` to use `deref=False`.
--- a/ruff.toml
+++ b/ruff.toml
@@ -0,0 +1,3 @@
 [lint]
 select = ["E", "F"]
 ignore = ["F401"]
--- a/ugit/base.py
+++ b/ugit/base.py
@@ -0,0 +1,176 @@
 import itertools
 import operator
 import os
 import string
 from collections import deque, namedtuple
 from pathlib import Path, PurePath
 from . import data
 def write_tree(directory="."):
    entries = []
    with Path.iterdir(directory) as it:
        for entry in it:
            full = f"{directory}/{entry.name}"
            if is_ignored(full):
                continue
            if entry.is_file(follow_symlinks=False):
                type_ = "blob"
                with open(full, "rb") as f:
                    oid = data.hash_object(f.read())
            elif entry.is_dir(follow_symlinks=False):
                type_ = "tree"
                oid = write_tree(full)
            entries.append((entry.name, oid, type_))
    tree = "".join(f"{type_} {oid} {name}\n" for name, oid, type_ in sorted(entries))
    return data.hash_object(tree.encode(), "tree")
 def _iter_tree_entries(oid):
    if not oid:
        return
    tree = data.get_object(oid, "tree")
    for entry in tree.decode().splitlines():
        type_, oid, name = entry.split(" ", 2)
        yield type_, oid, name
 def get_tree(oid, base_path=""):
    result = {}
    for type_, oid, name in _iter_tree_entries(oid):
        assert "/" not in name
        assert name not in ("..", ".")
        path = base_path + name
        if type_ == "blob":
            result[path] = oid
        elif type_ == "tree":
            result.update(get_tree(oid, f"{path}/"))
        else:
            assert False, f"Unknown tree entry {type_}"
    return result
 def _empty_current_directory():
    for root, dirnames, filenames in os.walk(".", topdown=False):
        for filename in filenames:
            path = PurePath.relative_to(f"{root}/{filename}")
            if is_ignored(path) or not Path.is_file(path):
                continue
            Path.unlink(path)
        for dirname in dirnames:
            path = PurePath.relative_to(f"{root}/{dirname}")
            if is_ignored(path):
                continue
            try:
                Path.rmdir(path)
            except (FileNotFoundError, OSError):
                # Deletion might fail if the directory contains ignored files,
                # so it's OK
                pass
 def read_tree(tree_oid):
    _empty_current_directory()
    for path, oid in get_tree(tree_oid, base_path="./").items():
        Path.mkdir(PurePath.parent(path), exist_ok=True)
        with open(path, "wb") as f:
            f.write(data.get_object(oid))
 def commit(message):
    commit = f"tree {write_tree()}\n"
    HEAD = data.get_ref("HEAD").value
    if HEAD:
        commit += f"parent {HEAD}\n"
    commit += "\n"
    commit += f"{message}\n"
    oid = data.hash_object(commit.encode(), "commit")
    data.update_ref("HEAD", data.RefValue(symbolic=False, value=oid))
    return oid
 def create_tag(name, oid):
    data.update_ref(f"refs/tags/{name}", data.RefValue(symbolic=False, value=oid))
 def checkout(oid):
    commit = get_commit(oid)
    read_tree(commit.tree)
    data.update_ref("HEAD", data.RefValue(symbolic=False, value=oid))
 def create_branch(name, oid):
    data.update_ref(f"refs/heads/{name}", data.RefValue(symbolic=False, value=oid))
 Commit = namedtuple("Commit", ["tree", "parent", "message"])
 def get_commit(oid):
    parent = None
    commit = data.get_object(oid, "commit").decode()
    lines = iter(commit.splitlines())
    for line in itertools.takewhile(operator.truth, lines):
        key, value = line.split(" ", 1)
        if key == "tree":
            tree = value
        elif key == "parent":
            parent = value
        else:
            assert False, f"Unknown field {key}"
    message = "\n".join(lines)
    return Commit(tree=tree, parent=parent, message=message)
 def iter_commits_and_parents(oids):
    oids = deque(oids)
    visited = set()
    while oids:
        oid = oids.popleft()
        if not oid or oid in visited:
            continue
        visited.add(oid)
        yield oid
        commit = get_commit(oid)
        # Return parent next
        oids.appendleft(commit.parent)
 def get_oid(name):
    if name == "@":
        name = "HEAD"
    # Name is ref
    refs_to_try = [
        f"{name}",
        f"refs/{name}",
        f"refs/tags/{name}",
        f"refs/heads/{name}",
    ]
    for ref in refs_to_try:
        if data.get_ref(ref, deref=False).value:
            return data.get_ref(ref).value
    # Name is SHA1
    is_hex = all(c in string.hexdigits for c in name)
    if len(name) == 40 and is_hex:
        return name
    assert False, f"Unknown name {name}"
 def is_ignored(path):
    return ".ugit" in path.split("/")
--- a/ugit/cli.py
+++ b/ugit/cli.py
@@ -1,6 +1,11 @@
 import argparse
 import subprocess
 import sys
 import textwrap
 from pathlib import Path
 from . import base
 from . import data
@@ -15,6 +20,8 @@ def parse_args():
    commands = parser.add_subparsers(dest="command")
    commands.required = True
    oid = base.get_oid
    init_parser = commands.add_parser("init")
    init_parser.set_defaults(func=init)
@@ -22,6 +29,42 @@ def parse_args():
    hash_object_parser.set_defaults(func=hash_object)
    hash_object_parser.add_argument("file")
    cat_file_parser = commands.add_parser("cat-file")
    cat_file_parser.set_defaults(func=cat_file)
    cat_file_parser.add_argument("object", type=oid)
    write_tree_parser = commands.add_parser("write-tree")
    write_tree_parser.set_defaults(func=write_tree)
    read_tree_parser = commands.add_parser("read-tree")
    read_tree_parser.set_defaults(func=read_tree)
    read_tree_parser.add_argument("tree", type=oid)
    commit_parser = commands.add_parser("commit")
    commit_parser.set_defaults(func=commit)
    commit_parser.add_argument("-m", "--message", required=True)
    log_parser = commands.add_parser("log")
    log_parser.set_defaults(func=log)
    log_parser.add_argument("oid", default="@", type=oid, nargs="?")
    checkout_parser = commands.add_parser("checkout")
    checkout_parser.set_defaults(func=checkout)
    checkout_parser.add_argument("oid", type=oid)
    tag_parser = commands.add_parser("tag")
    tag_parser.set_defaults(func=tag)
    tag_parser.add_argument("name")
    tag_parser.add_argument("oid", default="@", type=oid, nargs="?")
    branch_parser = commands.add_parser("branch")
    branch_parser.set_defaults(func=branch)
    branch_parser.add_argument("name")
    branch_parser.add_argument("start_point", default="@", type=oid, nargs="?")
    k_parser = commands.add_parser("k")
    k_parser.set_defaults(func=k)
    return parser.parse_args()
@@ -33,3 +76,67 @@ def init(args):
 def hash_object(args):
    with open(args.file, "rb") as f:
        print(data.hash_object(f.read()))
 def cat_file(args):
    sys.stdout.flush()
    sys.stdout.buffer.write(data.get_object(args.object), expected=None)
 def write_tree(args):
    print(base.write_tree())
 def read_tree(args):
    base.read_tree(args.tree)
 def commit(args):
    print(base.commit(args.message))
 def log(args):
    for oid in base.iter_commits_and_parents({args.oid}):
        commit = base.get_commit(oid)
        print(f"commit {oid}\n")
        print(textwrap.indent(commit.message, "    "))
        print("")
 def checkout(args):
    base.checkout(args.oid)
 def tag(args):
    base.create_tag(args.name, args.oid)
 def branch(args):
    base.create_branch(args.name, args.start_point)
    print(f"Branch {args.name} created at {args.start_point[:10]}")
 def k(args):
    dot = "digraph commits {\n"
    oids = set()
    for refname, ref in data.iter_refs(deref=False):
        dot += f"'{refname}' [shape=note]\n"
        dot += f"'{refname}' -> '{ref.value}'\n"
        if not ref.symbolic:
            oids.add(ref.value)
    for oid in base.iter_commits_and_parents(oids):
        commit = base.get_commit(oid)
        dot += f"'{oid}' [shape=box style=filled label='{oid[:10]}']\n"
        if commit.parent:
            dot += f"'{oid}' -> '{commit.parent}'\n"
    dot += "}"
    print(dot)
    with subprocess.Popen(
        ["dot", "-Tgtk", "/dev/stdin"], stdin=subprocess.PIPE
    ) as proc:
        proc.communicate(dot.encode())
--- a/ugit/data.py
+++ b/ugit/data.py
@@ -1,6 +1,9 @@
-from pathlib import Path
+from pathlib import Path, PurePath
 import hashlib
 import os
 from collections import namedtuple
 GIT_DIR = ".ugit"
@@ -10,8 +13,62 @@ def init():
    Path.mkdir(f"{GIT_DIR}/objects")
-def hash_object(data):
+RefValue = namedtuple("RefValue", ["symbolic", "value"])
-    oid = hashlib.sha1(data).hexdigest()
+
 def update_ref(ref, value, deref=True):
    assert not value.symbolic
    ref = _get_ref_internal(ref, deref)[0]
    ref_path = f"{GIT_DIR}/{ref}"
    Path.mkdir(ref_path, exist_ok=True)
    with open(ref_path, "w") as f:
        f.write(value.value)
 def get_ref(ref):
    return _get_ref_internal(ref)[1]
 def _get_ref_internal(ref):
    ref_path = f"{GIT_DIR}/{ref}"
    value = None
    if Path.is_file(ref_path):
        with open(ref_path) as f:
            value = f.read().strip()
    symbolic = bool(value) and value.startswith("ref")
    if symbolic:
        value = value.split(":", 1)[1].strip()
        return _get_ref_internal(value)
    return ref, RefValue(symbolic=False, value=value)
 def iter_refs():
    refs = ["HEAD"]
    for root, _, filenames in Path.walk(f"{GIT_DIR}/refs"):
        root = PurePath.relative_to(root, GIT_DIR)
        refs.extend(f"{root}/{name}" for name in filenames)
    for refname in refs:
        yield refname, get_ref(refname)
 def hash_object(data, type_="blob"):
    obj = type_.encode() + b"\x00" + data
    oid = hashlib.sha1(obj).hexdigest()
    with open(f"{GIT_DIR}/objects/{oid}", "wb") as out:
-        out.write(data)
+        out.write(obj)
    return oid
 def get_object(oid, expected="blob"):
    with open(f"{GIT_DIR}/objects/{oid}", "rb") as f:
        obj = f.read()
    type_, _, content = obj.partition(b"\x00")
    type_ = type_.decode()
    if expected is not None:
        assert type_ == expected, f"Expected {expected}, got {type_}"
    return content
Author	SHA1	Message	Date
daviddoji	c484b41a89	Don't always derefence ref	2024-07-06 19:50:52 +02:00
daviddoji	fe5ed910a3	Add change 34 to instructions	2024-07-06 19:49:59 +02:00
daviddoji	9c53919802	Dereference refs when reading and writing	2024-06-29 18:48:21 +02:00
daviddoji	30ce8c84e4	Add change 33 to instructions	2024-06-29 18:47:23 +02:00
daviddoji	6841a97d18	Create RefValue container	2024-06-14 16:29:58 +02:00
daviddoji	556c16c081	Add change 32 to instructions	2024-06-14 16:29:25 +02:00
daviddoji	7a0f86e49b	Implement symbolic refs idea	2024-06-09 21:20:53 +02:00
daviddoji	3770c81942	Implement symbolic refs idea	2024-06-09 21:20:38 +02:00
daviddoji	9f8fde3c60	Create new branch	2024-06-05 20:18:40 +02:00
daviddoji	772f631768	Add change 30 to instructions	2024-06-05 20:18:06 +02:00
daviddoji	7fe3e0f497	Use iter_commits_and_parents	2024-05-24 14:45:06 +02:00
daviddoji	7896b80c42	Add change 29 to instructions	2024-05-24 14:44:43 +02:00
daviddoji	b854b4fa18	Render graph	2024-05-22 16:52:44 +02:00
daviddoji	2362d69673	Add change 28 to instructions	2024-05-22 16:52:14 +02:00
daviddoji	d53322c256	Iterate commits and parents	2024-05-16 12:01:30 +02:00
daviddoji	7fbf6640f6	Iterate commits and parents	2024-05-16 11:54:52 +02:00
daviddoji	dad9077515	Add change 27 to instructions	2024-05-16 11:54:28 +02:00
daviddoji	db7d608010	Print refs for k (visualization tool)	2024-05-15 11:01:19 +02:00
daviddoji	c9d8b443ed	Add change 26 to instructions	2024-05-15 10:59:54 +02:00
david	41333f06bc	pass HEAD by default to argparse	2024-05-05 21:04:28 +02:00
david	fe292c02c9	Add change 25 instructions	2024-05-05 21:04:02 +02:00
david	de595261e6	Try different dirextories when searching for a ref	2024-04-23 17:36:41 +02:00
david	81bf86d41b	Add change 24 instructions	2024-04-23 17:36:02 +02:00
david	671fa4b6b1	Resolve name to oid in argparse	2024-04-20 21:38:04 +02:00
david	63dcbeb9e7	Add change 23 instructions	2024-04-20 21:37:31 +02:00
david	edae32dc86	Create the tag ref	2024-04-17 19:33:54 +02:00
david	e85766f671	Add change 22 instructions	2024-04-17 19:33:24 +02:00
david	1f947e6343	Generalize HEAD to refs	2024-04-12 17:19:14 +02:00
david	cb8e744794	Add change 21 instructions	2024-04-12 17:18:48 +02:00
daviddoji	6797bcfabe	Implement CLI command for tagging	2024-04-03 19:47:53 +02:00
daviddoji	95355befb4	Implement CLI command for tagging	2024-04-03 19:47:35 +02:00
daviddoji	b802e1eb9d	Read tree and move HEAD in checkout	2024-03-29 18:27:10 +01:00
daviddoji	817f38f49c	Add change 19 to instructions	2024-03-29 18:26:20 +01:00
daviddoji	d00a7817ab	Add oid parameter to log	2024-03-29 18:10:35 +01:00
daviddoji	8ac5264366	Add change 18 to instructions	2024-03-29 18:10:16 +01:00
daviddoji	cd91f18da6	Implement log	2024-03-20 19:43:58 +01:00
daviddoji	78044a877a	Add change 17 to instructions	2024-03-20 19:43:29 +01:00
daviddoji	b0d8cab498	Set parent to HEAD	2024-03-18 19:01:53 +01:00
daviddoji	450391089f	Add change 16 to instructions	2024-03-18 19:01:16 +01:00
daviddoji	1847cfbb17	Record hash of last commit	2024-03-13 19:38:35 +01:00
daviddoji	6a91c03f40	Add change 15 to instructions	2024-03-13 19:37:46 +01:00
daviddoji	4e13a27f79	Create commit	2024-03-11 19:29:15 +01:00
daviddoji	2c940abd1d	Add change 14 to instructions	2024-03-11 19:28:36 +01:00
daviddoji	c72370f930	Write-tree delete all before read	2024-03-07 19:44:28 +01:00
daviddoji	40a19615aa	Add change 13 instructions	2024-03-07 19:43:13 +01:00
daviddoji	db8c1379c2	Read-tree extract tree from object	2024-03-02 16:18:48 +01:00
daviddoji	6f5fe864a9	Add change 12 instructions	2024-03-02 16:18:19 +01:00
daviddoji	2f8545d48e	Write-tree write tree objects	2024-03-02 16:02:42 +01:00
daviddoji	5faf498917	Add change 11 instructions	2024-03-02 16:01:44 +01:00
daviddoji	4540b98a88	Write-tree hash files	2024-02-28 19:51:24 +01:00
daviddoji	af1928a360	Add change 10 instructions	2024-02-28 19:50:41 +01:00
daviddoji	46a20c8b60	Write-tree ignore .ugit files	2024-02-28 19:45:34 +01:00
daviddoji	fdfcfdbdad	Add change 09 instructions	2024-02-28 19:44:10 +01:00
daviddoji	73eb89d397	Write-tree for listing files	2024-02-26 18:59:58 +01:00
daviddoji	d666efcbd3	Add change 08 instructions	2024-02-26 18:58:52 +01:00
daviddoji	30ee2098ab	Add base module	2024-02-21 20:46:59 +01:00
daviddoji	103837cb73	Add change 07 instructions	2024-02-21 20:36:07 +01:00
daviddoji	2556bde16f	Add types to objects	2024-02-15 20:20:00 +01:00
daviddoji	36f6f88990	Add change 06 instructions	2024-02-15 20:18:45 +01:00
daviddoji	a010615cf2	Print hashed objects	2024-02-14 20:24:33 +01:00
daviddoji	9634544d68	Add change 05 instructions	2024-02-14 20:23:44 +01:00