Don't always derefence ref

Add change 34 to instructions
Dereference refs when reading and writing
2024-07-06 19:50:52 +02:00 · 2024-07-06 19:49:59 +02:00 · 2024-06-29 18:48:21 +02:00 · 2024-06-29 18:47:23 +02:00 · 2024-06-14 16:29:58 +02:00 · 2024-06-14 16:29:25 +02:00
33 changed files with 1005 additions and 14 deletions
--- a/how_to/Change_06.md
+++ b/how_to/Change_06.md
@@ -0,0 +1,17 @@
+- data: Add types to objects
+
+As we will soon see, there will be different logical types of objects that are
+used in different contexts (even though, from the Object Database's point of
+view, they are just all bytes). In order to lower the chance of using an object
+in the wrong context we're going to add a type tag for each object.
+
+The type is just a string that's going to be prepended to the start of the file,
+followed by a null byte. When reading the file later we'll extract the type and
+verify that it's indeed the expected type.
+
+The default type is going to be `blob`, since by default an object is a
+collection of bytes with no further semantic meaning.
+
+We can also pass `expected=None` to `get_object()` if we don't want to verify
+the type. This is useful for the `cat-file` CLI command which is a debug command
+used for printing all objects.
--- a/how_to/Change_07.md
+++ b/how_to/Change_07.md
--- a/how_to/Change_08.md
+++ b/how_to/Change_08.md
@@ -0,0 +1,26 @@
+- write-tree: List files
+
+The next command is `write-tree`. This command will take the current working
+directory and store it to the object database. If `hash-object` was for storing
+an individual file, then `write-tree` is for storing a whole directory.
+
+Like `hash-object`, `write-tree` is going to give us an OID after it's done and
+we'll be able to use the OID in order to retrieve the directory at a later time.
+
+In Git's lingo a "tree" means a directory.
+
+We'll get into the details in later changes, in this change we'll only prepare
+the code around the feature:
+
+  + Create a `write-tree` CLI command
+
+  + Create a `write_tree()` function in base module. Why in base module and not
+  in data module? Because `write_tree()` is not going to write to disk directly
+  but use the object database provided by data to store the directory. Hence it
+  belongs to the higher-level base module.
+
+  + Add code to `write_tree()` to print a directory recursively. For now nothing
+  is written anywhere, but we just coded the boilerplate to recursively scan a
+  directory.
+
+We continue in the next change.
--- a/how_to/Change_09.md
+++ b/how_to/Change_09.md
@@ -0,0 +1,8 @@
+- write-tree: Ignore .ugit files
+
+If we run `ugit write-tree`, we will see that it also prints the content of the
+.ugit directory. This directory isn't part of the user's files, so let's ignore
+it.
+
+Actually, I created a separate `is_ignored()` function. This way if we have any
+other files we want to ignore later we have one place to change.
--- a/how_to/Change_10.md
+++ b/how_to/Change_10.md
@@ -0,0 +1,12 @@
+- write-tree: Hash the files
+
+Instead of only printing the file name, let's put all files in the object
+database. For now we'll print their OID and their name.
+
+Notice that instead of getting one OID to represent a directory we now get a
+separate OID for each file, which isn't very useful. Plus, note that the names
+of the files aren't stored in the object database, they are just printed and
+then the information is discarded.
+
+So at this stage `write-tree` isn't useful (it just saves a bunch of files as
+blobs) but the next change will fix it.
--- a/how_to/Change_11.md
+++ b/how_to/Change_11.md
@@ -0,0 +1,62 @@
+- write-tree: Write tree objects
+
+Now comes the fun part, where we turn a collection of separate files into a
+single object that represents a directory.
+
+
+The idea is that we will create one additional object that collects all the data
+necessary to store a complete directory. For example, if we have a directory
+with two files:
+```
+$ ls
+cats.txt    dogs.txt
+```
+
+And we want to save the directory, we will first put the individual files into
+the object database:
+```
+$ ugit hash-object cats.txt
+91a7b14a584645c7b995100223e65f8a5a33b707
+$ ugit hash-object dogs.txt
+fa958e0dd2203e9ad56853a3f51e5945dad317a4
+```
+
+Then we will create a "tree" object that has the content of:
+```
+91a7b14a584645c7b995100223e65f8a5a33b707 cats.txt
+fa958e0dd2203e9ad56853a3f51e5945dad317a4 dogs.txt
+```
+
+And we will put this tree object into the object database as well. Then the OID
+of the tree object will actually represent the entire directory! Why? Because we
+can first retrieve the tree object by its OID, then see all the files it
+contains (their names and OIDs) and then read all the OIDs of the files to get
+their actual content.
+
+What if our directory contains other directories? We'll just create tree objects
+for them as well and we'll allow one tree object to point to another:
+```
+$ ls
+cats.txt    dogs.txt    other/
+$ ls other/
+shoes.jpg
+```
+
+The root tree object will look like this:
+```
+blob 91a7b14a584645c7b995100223e65f8a5a33b707 cats.txt
+blob fa958e0dd2203e9ad56853a3f51e5945dad317a4 dogs.txt
+tree 53891a3c27b17e0f8fd96c058f968d19e340428d other
+```
+
+Note that we added a type to each entry so that we know if it's a file or a
+directory. The tree that represents the "other" directory (OID 53891a3c27b17e0f8fd96c058f968d19e340428d) looks like:
+```
+blob 0aa186b09fd81e8cf449ba10eee6aff9711cc1ac shoes.jpg
+```
+We can think about this structure as a tree you know from Computer Science where
+each entries' OID as a pointer to either another tree or to a file (leaf node).
+
+Note that we actually save the tree objects with type "tree" in
+`data.hash_object()` since we don't want the trees to be confused with regular
+files.
--- a/how_to/Change_12.md
+++ b/how_to/Change_12.md
@@ -0,0 +1,28 @@
+- read-tree: Extract tree from object
+
+This command will take an OID of a tree and extract it to the working directory.
+Kind of the opposite of `write-tree`.
+
+I divided the implementation into a few layers:
+
+`_iter_tree_entries` is a generator that will take an OID of a tree, tokenize it
+line-by-line and yield the raw string values.
+
+`get_tree` uses `_iter_tree_entries` to recursively parse a tree into a
+dictionary.
+
+`read_tree` uses `get_tree` to get the file OIDs and writes them into the
+working directory.
+
+Now we can actually save versions of the working directory! It's nothing like
+proper version control, but we can see that a super basic flow is possible:
+
+  + Imagine you work on some code and you want to save a version.
+  + You run ```ugit write-tree```.
+  + You remember that OID that was printed out (write it on a post-it note or
+  something :)).
+  + Continue working and repeat steps 2 and 3 as necessary.
+  + If you want to return to a previous version, use `ugit read-tree` to restore
+  it to the working directory.
+
+Is it convenient to use? No. But it's just the beginning!
--- a/how_to/Change_13.md
+++ b/how_to/Change_13.md
@@ -0,0 +1,7 @@
+- read-tree: Delete all existing stuff before reading
+
+This is done so that we won't have any old files left around after a read-tree.
+
+Before this change, if we save tree A which contains only `a.txt`, then we save
+tree B which contains `a.txt` and `b.txt` and then we `read-tree` A, we will
+have `b.txt` left over in the working directory.
--- a/how_to/Change_14.md
+++ b/how_to/Change_14.md
@@ -0,0 +1,31 @@
+- commit: Create commit
+
+So far we were able to save versions of a directory (with `write-tree`), but
+without any additional context. In reality, when we save a snapshot we would
+like to attach data such as:
+ Message describing it
+ When the snapshot was created
+ Who created the snapshot
+ ...
+
+We will create a new type of object called a "commit" that will store all this
+information. A commit will just be a text file stored in the object database
+with the type of `'commit'`.
+
+The first lines in the commit will be key-values, then an empty line will mark
+the end of the key-values and then the commit message will follow. Like this:
+
+```
+tree 5e550586c91fce59e0006799e0d46b3948f05693
+author Nikita Leshenko
+time 2019-09-14T09:31:09+00:00
+
+This is the commit message!
+```
+
+For now we'll just write the "tree" key and the commit message to the commit
+object.
+
+We will create a new `ugit commit` command that will accept a commit message,
+snapshot the current directory using `ugit write-tree` and save the resulting
+object.
--- a/how_to/Change_15.md
+++ b/how_to/Change_15.md
@@ -0,0 +1,10 @@
+- commit: Record hash of last commit to HEAD
+
+I would like to link new commits to older commits. Right now, if we make changes
+in the working directory and make periodic commits, each commit will be a
+standalone object, separate from all other commits. The motivation for linking
+them together is so that we can look at the commits as a series of snapshots in
+some order.
+
+Before we can do it, let's record the OID of the last commit that we created.
+We'll call the last commit the "HEAD" and just put the OID in .ugit/HEAD file.
--- a/how_to/Change_16.md
+++ b/how_to/Change_16.md
@@ -0,0 +1,23 @@
+- commit: set parent to HEAD
+
+When creating a new commit, we will use the HEAD to link the new commit to the
+previous commit. We'll call the previous commit the "parent commit" and we will
+save its OID in the "parent" key on the commit object.
+
+For example, HEAD is currently bd0de093f1a0f90f54913d694a11cccf450bd990 and we
+create a new commit, the new commit will look like this in the object store:
+
+```
+tree 50bed982245cd21e2798f179e0b032904398485b
+parent bd0de093f1a0f90f54913d694a11cccf450bd990
+
+This is the commit message!
+```
+
+The first commit in the repository will obviously have no parent.
+
+Now we can retrieve the entire list of commits just by referencing the last
+commit! We can start from the HEAD, read the "parent" key on the HEAD commit and
+discover the commit before HEAD. Then read the parent of that commit, and go
+back on and on... This is basically a linked list implemented over the object
+database.
--- a/how_to/Change_17.md
+++ b/how_to/Change_17.md
@@ -0,0 +1,12 @@
+- log: Implement
+
+`log` will walk the list of commits and print them.
+
+We will start by implementing `get_commit()` that will parse a commit object by
+OID.
+
+Then in the CLI module we will start from the HEAD commit and walk its parents
+until we reach a commit without a parent.
+
+The result is that the entire commit history is printed to the screen once we
+run `ugit log`.
--- a/how_to/Change_18.md
+++ b/how_to/Change_18.md
@@ -0,0 +1,5 @@
+- log: Add oid parameter
+
+Just a small cosmetic change: Instead of always printing the list of commits
+from HEAD, add an optional parameter to specify an alternative commit OID to
+start from. By default it will still be HEAD.
--- a/how_to/Change_19.md
+++ b/how_to/Change_19.md
@@ -0,0 +1,90 @@
+- checkout: Read tree and move HEAD
+
+When given a commit OID, `ugit checkout` will "checkout" that commit, meaning
+that it will populate the working directory with the content of the commit and
+move HEAD to point to it.
+
+This is a small but important change and it greatly expands the power of ugit in
+two ways.
+
+First, it allows us to travel conveniently in history. If we've made a handful
+of commits and we would like to revisit a previous commit, we can now "checkout"
+that commit to the working directory, play with it (compile, run tests, read
+code, whatever we want) and checkout the latest commit again to resume working
+where we've left.
+
+You might be wondering why `checkout` is needed when we could just use
+`read-tree`, and the answer is that moving HEAD in addition to reading the tree
+allows us to record which commit is checked out right now. If we would only use
+`read-tree` and later forget which commit we are looking at, we will see a bunch
+of files in the working directory and have no idea where they came from. On the
+other hand, if we use `checkout`, the commit will be recorded in HEAD and we can
+always know what we're looking at (by running `ugit log` for example and seeing the first entry).
+
+The second way by which `checkout` expands the power of ugit is by allowing
+multiple branches of history. Let me explain: So far we have set HEAD to point
+to the latest commit that was created. It means that all our commits were
+linear, each new commit was added on top of the previous. The `checkout`
+command now allows us to move HEAD to any commit we wish. Then, new commits will
+be created on top of the current HEAD commit, which isn't necessarily the last
+created commit.
+
+For example, imagine that we're working on some code. So far, we have created a
+few commits, represented by a graph:
+```
+o-----o-----o-----o
+^                 ^
+first commit      HEAD
+```
+
+Then we wanted to code a new feature. We created a few commits while working on
+the feature (new commits represented by @):
+```
+o-----o-----o-----o-----@-----@-----@
+^                                   ^
+first commit                        HEAD
+```
+
+Now we have an alternative idea for implementing that feature. We would like to
+go back in time and try a different implementation, without throwing away the
+current implementation. We can remember the current HEAD and run `ugit checkout`
+to go back in time, by providing the OID of the commit before the new feature
+was implemented (that OID can be discovered with `ugit log`).
+```
+o-----o-----o-----o-----@-----@-----@
+^                 ^
+first commit      HEAD
+```
+
+The working directory will effectively go back in time. We can start working on
+an alternative implementation and create new commit. The new commits will be on
+top of HEAD and look like this (represented by $):
+```
+o-----o-----o-----o-----@-----@-----@
+^                  \
+first commit        ----$-----$
+                              ^
+                              HEAD
+```
+
+See how the history now contains two "branches". We can actually switch back and
+forth between them and work on them in parallel. Finally, we can checkout the
+preferred implementation and work from it on future code. Assuming that we liked
+the second branch, we'll just keep working from it, and future commits will look
+like this:
+```
+o-----o-----o-----o-----@-----@-----@
+^                  \
+first commit        ----$-----$-----o-----o-----o-----o-----o
+                                                            ^
+                                                            HEAD
+```
+
+Pretty useful, right? We've just introduced a simple form of branching history.
+Note that something pretty cool happened here: The implementation of checkout is
+very simple (we just call `read_tree` and update HEAD) but the implications of
+checkout are quite big - we can suddenly have a branching workflow which might
+look complicated but it is actually a direct consequence of what we implemented
+in previous changes. This is why I believe learning Git internals from the
+bottom up is useful - we can see how simple concepts compose into complicated
+functionality.
--- a/how_to/Change_20.md
+++ b/how_to/Change_20.md
@@ -0,0 +1,41 @@
+- tag: Implement CLI command
+
+Now that we have branching history we have some OIDs we need to keep track of.
+Assume we have two branches (continuing from the example we had for `checkout`):
+```
+o-----o-----o-----o-----@-----@-----@
+^                  \                ^
+first commit        ----$-----$     6c9f80a187ba39b4...
+                              ^
+                              d8d43b0e3a21df0c...
+```
+
+If we want to switch back and forth between the two "branches" with `checkout`,
+we need to remember both OIDs, which are quite long.
+
+To make our lives easier, let's implement a command to attach a name to an OID.
+Then we'll be able to refer to the OID by that name.
+
+The end result will look like this:
+```
+$ # Make some changes
+...
+$ ugit commit
+d8d43b0e3a21df0c845e185d08be8e4028787069
+$ ugit tag my-cool-commit d8d43b0e3a21df0c845e185d08be8e4028787069
+$ # Make more changes
+...
+$ ugit commit
+e549f09bbd08a8a888110b07982952e17e8c9669
+
+$ ugit checkout my-cool-commit
+        or
+$ ugit checkout d8d43b0e3a21df0c845e185d08be8e4028787069
+```
+
+The last two commands are equivalent, because "my-cool-commit" is a tag that
+points to d8d43b0e3a21df0c845e185d08be8e4028787069.
+
+We will implement this in a few steps. The first step is to create a CLI
+commmand that call the relevant command in the base module. The base module does
+nothing at this stage.
--- a/how_to/Change_21.md
+++ b/how_to/Change_21.md
@@ -0,0 +1,23 @@
+- tag: Generalize HEAD to refs
+
+As part of implementing `tag`, we'll generalize the way we handle HEAD. If you
+think about it, HEAD and tags are similar. They are both ways for ugit to attach
+a name to an OID. In case of HEAD, the name is hardcoded by ugit; in case of
+tags, the name will be provided by the user. It makes sense to handle them
+similarly in *data.py*.
+
+In *data.py*, let's extend the function `set_HEAD` and `get_HEAD` to
+`update_ref` and `get_ref`. "Ref" is a short for reference, and that's the name
+Git uses. The function will now accept the name of the ref and write/read it as
+a file under *.ugit* directory. Logically, a ref is a named pointer to an object.
+
+The important change is in *data.py*. The rest of the changes just rename some
+functions:
+
+```
+- get_HEAD()    ->   get_ref('HEAD')
+- set_HEAD(oid) ->   update_ref('HEAD', oid)
+```
+
+Note that we didn't change any behaviour of ugit here, this is purely
+refactoring.
--- a/how_to/Change_22.md
+++ b/how_to/Change_22.md
@@ -0,0 +1,28 @@
+- tag: Create the tag ref
+
+After we've implemented refs in the previous change, it's time to create a ref
+when the user creates a tag.
+
+`create_tag` now calls update_ref with the tag name to actually create the tag.
+
+For namespacing purposes, we'll put all tags under *refs/tags/*. That is, if the
+user creates *my-cool-commit* tag, we'll create *refs/tags/my-cool-commit* ref
+to point to the desired OID.
+
+Then we'll update *data.py* to handle this "namespaced" ref. Since we can't have
+a / in the file name, we'll create directories for it. Now if a ref
+*refs/tags/sometag* is created, it will be placed under *.ugit/refs/tags* in a
+file named *sometag*.
+
+To verify that this code works, you can run:
+```
+$ ugit tag test
+```
+
+And make sure that the tag points to HEAD:
+```
+$ cat .ugit/refs/tags/test
+$ cat .ugit/HEAD
+```
+
+The last two commands should give the same output.
--- a/how_to/Change_23.md
+++ b/how_to/Change_23.md
@@ -0,0 +1,22 @@
+- tag: Resolve name to oid in argparse
+
+It's nice that we can create tags, but now let's actually make them usable from
+the CLI.
+
+In *base.py*, we'll create `get_oid` to resolve a "name" to an OID. A name can
+either be a ref (in which case `get_oid` will return the OID that the ref points
+to) or an OID (in which case `get_oid` will just return that same OID).
+
+Next, we'll modify the argument parser in *cli.py* to call `get_oid` on all
+arguments which are expected to be an OID. This way we can pass a ref there
+instead of an OID.
+
+At this point we can do something like:
+```
+$ ugit tag mytag d8d43b0e3a21df0c845e185d08be8e4028787069
+$ ugit log refs/tags/mytag
+# Will print log of commits starting at d8d43b0e...
+$ ugit checkout refs/tags/mytag
+# Will checkout commit d8d43b0e...
+etc...
+```
--- a/how_to/Change_24.md
+++ b/how_to/Change_24.md
@@ -0,0 +1,18 @@
+- base: Try different directories when searching for a ref
+
+In the previous change, you might have noticed that we need to spell out the
+full name of a tag (Like *refs/tags/mytag*). This isn't very convenient, we
+would like to have shorter command names. For example, if we've created "mytag"
+tag, we should be able to do `ugit log mytag` rather than having to specify
+`ugit log refs/tags/mytag`.
+
+We'll extend `get_oid` to search in different ref subdirectories when resolving
+a name. We'll search in:
+```
+    Root (.ugit): This way we can specify refs/tags/mytag
+    .ugit/refs: This way we can specify tags/mytag
+    .ugit/refs/tags: This way we can specify mytag
+    .ugit/refs/heads: This will be needed for a future change
+```
+If we find the requested name in any of the directories, return it. Otherwise
+assume that the name is an OID.
--- a/how_to/Change_25.md
+++ b/how_to/Change_25.md
@@ -0,0 +1,12 @@
+- cli: pass HEAD by default in argparse
+
+First, make "@" be an alias for HEAD. (Implemented in `get_oid`)
+
+Second, do a little refactoring in *cli.py*. Some commands accept an optional
+OID argument and if the argument isn't provided it defaults to HEAD. For example
+`git log` can get an OID to start logging from, but by default it logs all
+commits before HEAD.
+
+Instead of having each command implement this logic, let's just make "@" (HEAD)
+be the default value for those commands. The relevant commands at this stage
+are `log` and `tag`. More will follow.
--- a/how_to/Change_26.md
+++ b/how_to/Change_26.md
@@ -0,0 +1,14 @@
+- k: Print refs
+
+Now that we have refs and a potentially branching commit history, it's a good
+idea to create a visualization tool to see all the mess that we've created.
+
+The visualization tool will draw all refs and all the commits pointed by the refs.
+
+Our command to run the tool will be called `ugit k`, similar to `gitk` (which is
+a graphical visualization tool for Git).
+
+We'll create a new `k` command in *cli.py*. We'll create `iter_refs` which is a
+generator which will iterate on all available refs (it will return HEAD from the
+ugit root directory and everything under *.ugit/refs*). As a first step, let's
+just print all refs when running `k`.
--- a/how_to/Change_27.md
+++ b/how_to/Change_27.md
@@ -0,0 +1,21 @@
+- k: Iterate commits and parents
+
+In addition to printing the refs, we'll also print all OIDs that are reachable
+from those refs. We'll create `iter_commits_and_parents`, which is a generator
+that returns all commits that it can reach from a given set of OIDs.
+
+Note that `iter_commits_and_parents` will return an OID once, even if it's
+reachable from multiple refs. Here, for example:
+```
+o<----o<----o<----o<----@<----@<----@
+^                  \                ^
+first commit        -<--$<----$     refs/tags/tag1
+                              ^
+                              refs/tags/tag2
+```
+
+We can reach the first commit by following the parents of *tag1* or by following
+the parents of *tag2*. Yet if we call `iter_commits_and_parents({tag1, tag2})`,
+the first commit will be yielded only once. This property will be useful later.
+
+(Note that nothing is visualized yet, we're preparing for that.)
--- a/how_to/Change_28.md
+++ b/how_to/Change_28.md
@@ -0,0 +1,18 @@
+- k: Render graph
+
+`k` is supposed to be a visualization tool, but so far we've just printed a
+bunch of OIDs... Now comes the visualization part!
+
+There's a convenient file format called "dot" that can describe a graph. This is
+a textual format. We'll generate a graph of all commits and refs in dot format
+and then visualize it using the "dot" utility that comes with Graphviz.
+
+(If you're unfamiliar with dot or Graphviz please look it up online.)
+
+The graph will contain a node for each commit, that points to the parent commit.
+The graph will also contain a node for each ref, which points to the relevant
+commit.
+
+At this point, `ugit k` is fully functional and I encourage you to play with it.
+Create a crazy branching history and a bunch of tags and see for yourself that
+`ugit k` can draw all that visually.
--- a/how_to/Change_29.md
+++ b/how_to/Change_29.md
@@ -0,0 +1,9 @@
+- log: Use `iter_commits_and_parents`
+
+Refactoring ahead! Since we have `iter_commits_and_parents` from `k`, let's also
+use this function in `log`. We'll need to adjust it a bit to use
+`collections.deque` instead of a set so that the order of commits is deterministic.
+
+This generalization might seem unneeded at this point, but it will be useful
+later. (Note for the advanced folks: When we implement merge commits that have
+multiple parents, this generic way to iterate will come in handy.)
--- a/how_to/Change_30.md
+++ b/how_to/Change_30.md
@@ -0,0 +1,82 @@
+- branch: Create new branch
+
+Tags were an improvement since they freed us from the burden of remembering OIDs
+directly. But they are still somewhat inconvenient, since they are static. Let
+me illustrate:
+```
+o-----o-----o-----o-----o-----o-----o
+                   \                ^
+                    ----o-----o  tag2,HEAD
+                              ^
+                           tag1
+```
+
+If we have the above situation, we can easily flip between *tag1* and *tag2* with
+`checkout`. But what happens if we do
+
+    - ugit checkout tag2
+    - Make some changes
+    - ugit commit?
+
+Now it looks like this:
+```
+o-----o-----o-----o-----o-----o-----o-----o
+                   \                ^     ^
+                    ----o-----o  tag2     HEAD
+                              ^
+                           tag1
+```
+
+The upper branch has advanced, but *tag2* still points to the previous commit.
+This is by design, since tags are supposed to just name a specific OID. So if we
+want to remember the new HEAD position we need to create another tag.
+
+But now let's create a ref that will "move forward" as the branch grows. Just
+like we have `ugit tag`, we'll create `ugit branch` that will point a branch to
+a specific OID. This time the ref will be created under *refs/heads*.
+
+At this stage, `branch` doesn't look any different from tag (the only difference
+is that the branch is created under *refs/heads* rather than *refs/tags*). But
+the magic will happen once we try to `checkout` a branch.
+
+So far when we checkout anything we update HEAD to point to the OID that we've
+just checked out. But if we checkout a branch by name, we'll do something
+different, we will update HEAD to point to the **name of the branch!** Assume
+that we have a branch here:
+```
+o-----o-----o-----o-----o-----o-----o
+                   \                ^
+                    ----o-----o tag2,branch2
+                              ^
+                           tag1
+```
+
+Running `ugit checkout branch2` will create the following situation:
+```
+o-----o-----o-----o-----o-----o-----o
+                   \                ^
+                    ----o-----o tag2,branch2 <--- HEAD
+                              ^
+                           tag1
+```
+
+You see? HEAD points to *branch2* rather than the OID of the commit directly.
+Now if we create another commit, ugit will update HEAD to point to the latest
+commit (just like it does every time) but as a side effect it will also update
+*branch2* to point to the latest commit.
+```
+o-----o-----o-----o-----o-----o-----o-----o
+                   \                ^     ^
+                    ----o-----o  tag2     branch2 <--- HEAD
+                              ^
+                           tag1
+```
+
+This way, if we checkout a branch and create some commits on top of it, the ref
+will always point to the latest commit.
+
+But right now HEAD (or any ref for that matter) may only point to an OID. It
+can't point to another ref, like I described above. So our next step would be
+to implement this concept. To mirror Git's terminology, we will call a ref that
+points to another ref a "symbolic ref". Please see the next change for an
+implementation of symbolic refs.
--- a/how_to/Change_31.md
+++ b/how_to/Change_31.md
@@ -0,0 +1,5 @@
+- data: Implement symbolic refs idea
+
+If the file that represents a ref contains an OID, we'll assume that the ref
+points to an OID. If the file contains the content `ref: <refname>`, we'll
+assume that the ref points to `<refname>` and we will dereference it recursively.
--- a/how_to/Change_32.md
+++ b/how_to/Change_32.md
@@ -0,0 +1,8 @@
+- data: Create Refvalue container
+
+To make working with symbolic refs easier, we will create a `Refvalue` container
+to represent the value of a ref. `Refvalue` will have a property symbolic that
+will say whether it's a symbolic or a direct ref.
+
+This change is just refactoring, we will wrap every OID that is written or read
+from a ref in a `RefValue`.
--- a/how_to/Change_33.md
+++ b/how_to/Change_33.md
@@ -0,0 +1,17 @@
+data: Dereference refs when reading and writing
+
+Now we'll dereference symbolic refs not only when reading them but also when
+writing them.
+
+We'll implement a helper function called `_get_ref_internal` which will return
+the path and the value of the last ref pointed by a symbolic ref. In simple words:
+
+- When given a non-symbolic ref, `_get_ref_internal` will return the ref name
+and value.
+- When given a symbolic ref, `_get_ref_internal` will dereference the ref
+recursively, and then return the name of the last (non-symbolic) ref that points
+to an OID, plus its value.
+
+Now `update_ref` will use `_get_ref_internal` to know which ref it needs to update.
+
+Additionally, we'll use `_get_ref_internal` in `get_ref`.
--- a/how_to/Change_34.md
+++ b/how_to/Change_34.md
@@ -0,0 +1,15 @@
+- data: Don't always dereference refs (for `ugit k`)
+
+Actually, it's not always desirable to dereference a ref all the way. Sometimes
+we would like to know at which ref a symbolic ref points, rather than the final
+OID. Or we would like to update a ref directly, rather then updating the last
+ref in the chain.
+
+One such usecase is `ugit k`. When visualizing refs it would be nice to see
+which ref points to which ref. We will see another usecase soon.
+
+To accomodate this, we will add a `deref` option to `get_ref`, `iter_refs` and
+`update_ref`. If they will be called with `deref=False`, they will work on the
+raw value of a ref and not dereference any symbolic refs.
+
+Then we will update `k` to use `deref=False`.
--- a/ruff.toml
+++ b/ruff.toml
@@ -0,0 +1,3 @@
+[lint]
+select = ["E", "F"]
+ignore = ["F401"]
--- a/ugit/base.py
+++ b/ugit/base.py
@@ -0,0 +1,176 @@
+import itertools
+import operator
+import os
+import string
+
+from collections import deque, namedtuple
+from pathlib import Path, PurePath
+
+from . import data
+
+
+def write_tree(directory="."):
+    entries = []
+    with Path.iterdir(directory) as it:
+        for entry in it:
+            full = f"{directory}/{entry.name}"
+            if is_ignored(full):
+                continue
+            if entry.is_file(follow_symlinks=False):
+                type_ = "blob"
+                with open(full, "rb") as f:
+                    oid = data.hash_object(f.read())
+            elif entry.is_dir(follow_symlinks=False):
+                type_ = "tree"
+                oid = write_tree(full)
+            entries.append((entry.name, oid, type_))
+
+    tree = "".join(f"{type_} {oid} {name}\n" for name, oid, type_ in sorted(entries))
+
+    return data.hash_object(tree.encode(), "tree")
+
+
+def _iter_tree_entries(oid):
+    if not oid:
+        return
+    tree = data.get_object(oid, "tree")
+    for entry in tree.decode().splitlines():
+        type_, oid, name = entry.split(" ", 2)
+        yield type_, oid, name
+
+
+def get_tree(oid, base_path=""):
+    result = {}
+    for type_, oid, name in _iter_tree_entries(oid):
+        assert "/" not in name
+        assert name not in ("..", ".")
+        path = base_path + name
+        if type_ == "blob":
+            result[path] = oid
+        elif type_ == "tree":
+            result.update(get_tree(oid, f"{path}/"))
+        else:
+            assert False, f"Unknown tree entry {type_}"
+    return result
+
+
+def _empty_current_directory():
+    for root, dirnames, filenames in os.walk(".", topdown=False):
+        for filename in filenames:
+            path = PurePath.relative_to(f"{root}/{filename}")
+            if is_ignored(path) or not Path.is_file(path):
+                continue
+            Path.unlink(path)
+        for dirname in dirnames:
+            path = PurePath.relative_to(f"{root}/{dirname}")
+            if is_ignored(path):
+                continue
+            try:
+                Path.rmdir(path)
+            except (FileNotFoundError, OSError):
+                # Deletion might fail if the directory contains ignored files,
+                # so it's OK
+                pass
+
+
+def read_tree(tree_oid):
+    _empty_current_directory()
+    for path, oid in get_tree(tree_oid, base_path="./").items():
+        Path.mkdir(PurePath.parent(path), exist_ok=True)
+        with open(path, "wb") as f:
+            f.write(data.get_object(oid))
+
+
+def commit(message):
+    commit = f"tree {write_tree()}\n"
+
+    HEAD = data.get_ref("HEAD").value
+    if HEAD:
+        commit += f"parent {HEAD}\n"
+
+    commit += "\n"
+    commit += f"{message}\n"
+
+    oid = data.hash_object(commit.encode(), "commit")
+
+    data.update_ref("HEAD", data.RefValue(symbolic=False, value=oid))
+
+    return oid
+
+
+def create_tag(name, oid):
+    data.update_ref(f"refs/tags/{name}", data.RefValue(symbolic=False, value=oid))
+
+
+def checkout(oid):
+    commit = get_commit(oid)
+    read_tree(commit.tree)
+    data.update_ref("HEAD", data.RefValue(symbolic=False, value=oid))
+
+
+def create_branch(name, oid):
+    data.update_ref(f"refs/heads/{name}", data.RefValue(symbolic=False, value=oid))
+
+
+Commit = namedtuple("Commit", ["tree", "parent", "message"])
+
+
+def get_commit(oid):
+    parent = None
+
+    commit = data.get_object(oid, "commit").decode()
+    lines = iter(commit.splitlines())
+    for line in itertools.takewhile(operator.truth, lines):
+        key, value = line.split(" ", 1)
+        if key == "tree":
+            tree = value
+        elif key == "parent":
+            parent = value
+        else:
+            assert False, f"Unknown field {key}"
+
+    message = "\n".join(lines)
+    return Commit(tree=tree, parent=parent, message=message)
+
+
+def iter_commits_and_parents(oids):
+    oids = deque(oids)
+    visited = set()
+
+    while oids:
+        oid = oids.popleft()
+        if not oid or oid in visited:
+            continue
+        visited.add(oid)
+        yield oid
+
+        commit = get_commit(oid)
+        # Return parent next
+        oids.appendleft(commit.parent)
+
+
+def get_oid(name):
+    if name == "@":
+        name = "HEAD"
+
+    # Name is ref
+    refs_to_try = [
+        f"{name}",
+        f"refs/{name}",
+        f"refs/tags/{name}",
+        f"refs/heads/{name}",
+    ]
+    for ref in refs_to_try:
+        if data.get_ref(ref, deref=False).value:
+            return data.get_ref(ref).value
+
+    # Name is SHA1
+    is_hex = all(c in string.hexdigits for c in name)
+    if len(name) == 40 and is_hex:
+        return name
+
+    assert False, f"Unknown name {name}"
+
+
+def is_ignored(path):
+    return ".ugit" in path.split("/")
--- a/ugit/cli.py
+++ b/ugit/cli.py
@@ -1,8 +1,11 @@
+import argparse
+import subprocess
+import sys
+import textwrap
+
 from pathlib import Path

-import argparse
-import sys
-
+from . import base
 from . import data


@@ -17,17 +20,51 @@ def parse_args():
    commands = parser.add_subparsers(dest="command")
    commands.required = True

+    oid = base.get_oid
+
    init_parser = commands.add_parser("init")
    init_parser.set_defaults(func=init)

-    cat_file_parser = commands.add_parser("cat-file")
-    cat_file_parser.set_defaults(func=cat_file)
-    cat_file_parser.add_argument("object")
-
    hash_object_parser = commands.add_parser("hash-object")
    hash_object_parser.set_defaults(func=hash_object)
    hash_object_parser.add_argument("file")

+    cat_file_parser = commands.add_parser("cat-file")
+    cat_file_parser.set_defaults(func=cat_file)
+    cat_file_parser.add_argument("object", type=oid)
+
+    write_tree_parser = commands.add_parser("write-tree")
+    write_tree_parser.set_defaults(func=write_tree)
+
+    read_tree_parser = commands.add_parser("read-tree")
+    read_tree_parser.set_defaults(func=read_tree)
+    read_tree_parser.add_argument("tree", type=oid)
+
+    commit_parser = commands.add_parser("commit")
+    commit_parser.set_defaults(func=commit)
+    commit_parser.add_argument("-m", "--message", required=True)
+
+    log_parser = commands.add_parser("log")
+    log_parser.set_defaults(func=log)
+    log_parser.add_argument("oid", default="@", type=oid, nargs="?")
+
+    checkout_parser = commands.add_parser("checkout")
+    checkout_parser.set_defaults(func=checkout)
+    checkout_parser.add_argument("oid", type=oid)
+
+    tag_parser = commands.add_parser("tag")
+    tag_parser.set_defaults(func=tag)
+    tag_parser.add_argument("name")
+    tag_parser.add_argument("oid", default="@", type=oid, nargs="?")
+
+    branch_parser = commands.add_parser("branch")
+    branch_parser.set_defaults(func=branch)
+    branch_parser.add_argument("name")
+    branch_parser.add_argument("start_point", default="@", type=oid, nargs="?")
+
+    k_parser = commands.add_parser("k")
+    k_parser.set_defaults(func=k)
+
    return parser.parse_args()


@@ -43,4 +80,63 @@ def hash_object(args):

 def cat_file(args):
    sys.stdout.flush()
-    sys.stdout.buffer.write(data.get_object(args.object))
+    sys.stdout.buffer.write(data.get_object(args.object), expected=None)
+
+
+def write_tree(args):
+    print(base.write_tree())
+
+
+def read_tree(args):
+    base.read_tree(args.tree)
+
+
+def commit(args):
+    print(base.commit(args.message))
+
+
+def log(args):
+    for oid in base.iter_commits_and_parents({args.oid}):
+        commit = base.get_commit(oid)
+
+        print(f"commit {oid}\n")
+        print(textwrap.indent(commit.message, "    "))
+        print("")
+
+
+def checkout(args):
+    base.checkout(args.oid)
+
+
+def tag(args):
+    base.create_tag(args.name, args.oid)
+
+
+def branch(args):
+    base.create_branch(args.name, args.start_point)
+    print(f"Branch {args.name} created at {args.start_point[:10]}")
+
+
+def k(args):
+    dot = "digraph commits {\n"
+
+    oids = set()
+    for refname, ref in data.iter_refs(deref=False):
+        dot += f"'{refname}' [shape=note]\n"
+        dot += f"'{refname}' -> '{ref.value}'\n"
+        if not ref.symbolic:
+            oids.add(ref.value)
+
+    for oid in base.iter_commits_and_parents(oids):
+        commit = base.get_commit(oid)
+        dot += f"'{oid}' [shape=box style=filled label='{oid[:10]}']\n"
+        if commit.parent:
+            dot += f"'{oid}' -> '{commit.parent}'\n"
+
+    dot += "}"
+    print(dot)
+
+    with subprocess.Popen(
+        ["dot", "-Tgtk", "/dev/stdin"], stdin=subprocess.PIPE
+    ) as proc:
+        proc.communicate(dot.encode())
--- a/ugit/data.py
+++ b/ugit/data.py
@@ -1,6 +1,9 @@
-from pathlib import Path
+from pathlib import Path, PurePath

 import hashlib
+import os
+
+from collections import namedtuple

 GIT_DIR = ".ugit"

@@ -10,13 +13,62 @@ def init():
    Path.mkdir(f"{GIT_DIR}/objects")


-def hash_object(data):
-    oid = hashlib.sha1(data).hexdigest()
+RefValue = namedtuple("RefValue", ["symbolic", "value"])
+
+
+def update_ref(ref, value, deref=True):
+    assert not value.symbolic
+    ref = _get_ref_internal(ref, deref)[0]
+    ref_path = f"{GIT_DIR}/{ref}"
+    Path.mkdir(ref_path, exist_ok=True)
+    with open(ref_path, "w") as f:
+        f.write(value.value)
+
+
+def get_ref(ref):
+    return _get_ref_internal(ref)[1]
+
+
+def _get_ref_internal(ref):
+    ref_path = f"{GIT_DIR}/{ref}"
+    value = None
+    if Path.is_file(ref_path):
+        with open(ref_path) as f:
+            value = f.read().strip()
+
+    symbolic = bool(value) and value.startswith("ref")
+    if symbolic:
+        value = value.split(":", 1)[1].strip()
+        return _get_ref_internal(value)
+
+    return ref, RefValue(symbolic=False, value=value)
+
+
+def iter_refs():
+    refs = ["HEAD"]
+    for root, _, filenames in Path.walk(f"{GIT_DIR}/refs"):
+        root = PurePath.relative_to(root, GIT_DIR)
+        refs.extend(f"{root}/{name}" for name in filenames)
+
+    for refname in refs:
+        yield refname, get_ref(refname)
+
+
+def hash_object(data, type_="blob"):
+    obj = type_.encode() + b"\x00" + data
+    oid = hashlib.sha1(obj).hexdigest()
    with open(f"{GIT_DIR}/objects/{oid}", "wb") as out:
-        out.write(data)
+        out.write(obj)
    return oid


-def get_object(oid):
+def get_object(oid, expected="blob"):
    with open(f"{GIT_DIR}/objects/{oid}", "rb") as f:
-        return f.read()
+        obj = f.read()
+
+    type_, _, content = obj.partition(b"\x00")
+    type_ = type_.decode()
+
+    if expected is not None:
+        assert type_ == expected, f"Expected {expected}, got {type_}"
+    return content
Author	SHA1	Message	Date
daviddoji	c484b41a89	Don't always derefence ref	2024-07-06 19:50:52 +02:00
daviddoji	fe5ed910a3	Add change 34 to instructions	2024-07-06 19:49:59 +02:00
daviddoji	9c53919802	Dereference refs when reading and writing	2024-06-29 18:48:21 +02:00
daviddoji	30ce8c84e4	Add change 33 to instructions	2024-06-29 18:47:23 +02:00
daviddoji	6841a97d18	Create RefValue container	2024-06-14 16:29:58 +02:00
daviddoji	556c16c081	Add change 32 to instructions	2024-06-14 16:29:25 +02:00
daviddoji	7a0f86e49b	Implement symbolic refs idea	2024-06-09 21:20:53 +02:00
daviddoji	3770c81942	Implement symbolic refs idea	2024-06-09 21:20:38 +02:00
daviddoji	9f8fde3c60	Create new branch	2024-06-05 20:18:40 +02:00
daviddoji	772f631768	Add change 30 to instructions	2024-06-05 20:18:06 +02:00
daviddoji	7fe3e0f497	Use iter_commits_and_parents	2024-05-24 14:45:06 +02:00
daviddoji	7896b80c42	Add change 29 to instructions	2024-05-24 14:44:43 +02:00
daviddoji	b854b4fa18	Render graph	2024-05-22 16:52:44 +02:00
daviddoji	2362d69673	Add change 28 to instructions	2024-05-22 16:52:14 +02:00
daviddoji	d53322c256	Iterate commits and parents	2024-05-16 12:01:30 +02:00
daviddoji	7fbf6640f6	Iterate commits and parents	2024-05-16 11:54:52 +02:00
daviddoji	dad9077515	Add change 27 to instructions	2024-05-16 11:54:28 +02:00
daviddoji	db7d608010	Print refs for k (visualization tool)	2024-05-15 11:01:19 +02:00
daviddoji	c9d8b443ed	Add change 26 to instructions	2024-05-15 10:59:54 +02:00
david	41333f06bc	pass HEAD by default to argparse	2024-05-05 21:04:28 +02:00
david	fe292c02c9	Add change 25 instructions	2024-05-05 21:04:02 +02:00
david	de595261e6	Try different dirextories when searching for a ref	2024-04-23 17:36:41 +02:00
david	81bf86d41b	Add change 24 instructions	2024-04-23 17:36:02 +02:00
david	671fa4b6b1	Resolve name to oid in argparse	2024-04-20 21:38:04 +02:00
david	63dcbeb9e7	Add change 23 instructions	2024-04-20 21:37:31 +02:00
david	edae32dc86	Create the tag ref	2024-04-17 19:33:54 +02:00
david	e85766f671	Add change 22 instructions	2024-04-17 19:33:24 +02:00
david	1f947e6343	Generalize HEAD to refs	2024-04-12 17:19:14 +02:00
david	cb8e744794	Add change 21 instructions	2024-04-12 17:18:48 +02:00
daviddoji	6797bcfabe	Implement CLI command for tagging	2024-04-03 19:47:53 +02:00
daviddoji	95355befb4	Implement CLI command for tagging	2024-04-03 19:47:35 +02:00
daviddoji	b802e1eb9d	Read tree and move HEAD in checkout	2024-03-29 18:27:10 +01:00
daviddoji	817f38f49c	Add change 19 to instructions	2024-03-29 18:26:20 +01:00
daviddoji	d00a7817ab	Add oid parameter to log	2024-03-29 18:10:35 +01:00
daviddoji	8ac5264366	Add change 18 to instructions	2024-03-29 18:10:16 +01:00
daviddoji	cd91f18da6	Implement log	2024-03-20 19:43:58 +01:00
daviddoji	78044a877a	Add change 17 to instructions	2024-03-20 19:43:29 +01:00
daviddoji	b0d8cab498	Set parent to HEAD	2024-03-18 19:01:53 +01:00
daviddoji	450391089f	Add change 16 to instructions	2024-03-18 19:01:16 +01:00
daviddoji	1847cfbb17	Record hash of last commit	2024-03-13 19:38:35 +01:00
daviddoji	6a91c03f40	Add change 15 to instructions	2024-03-13 19:37:46 +01:00
daviddoji	4e13a27f79	Create commit	2024-03-11 19:29:15 +01:00
daviddoji	2c940abd1d	Add change 14 to instructions	2024-03-11 19:28:36 +01:00
daviddoji	c72370f930	Write-tree delete all before read	2024-03-07 19:44:28 +01:00
daviddoji	40a19615aa	Add change 13 instructions	2024-03-07 19:43:13 +01:00
daviddoji	db8c1379c2	Read-tree extract tree from object	2024-03-02 16:18:48 +01:00
daviddoji	6f5fe864a9	Add change 12 instructions	2024-03-02 16:18:19 +01:00
daviddoji	2f8545d48e	Write-tree write tree objects	2024-03-02 16:02:42 +01:00
daviddoji	5faf498917	Add change 11 instructions	2024-03-02 16:01:44 +01:00
daviddoji	4540b98a88	Write-tree hash files	2024-02-28 19:51:24 +01:00
daviddoji	af1928a360	Add change 10 instructions	2024-02-28 19:50:41 +01:00
daviddoji	46a20c8b60	Write-tree ignore .ugit files	2024-02-28 19:45:34 +01:00
daviddoji	fdfcfdbdad	Add change 09 instructions	2024-02-28 19:44:10 +01:00
daviddoji	73eb89d397	Write-tree for listing files	2024-02-26 18:59:58 +01:00
daviddoji	d666efcbd3	Add change 08 instructions	2024-02-26 18:58:52 +01:00
daviddoji	30ee2098ab	Add base module	2024-02-21 20:46:59 +01:00
daviddoji	103837cb73	Add change 07 instructions	2024-02-21 20:36:07 +01:00
daviddoji	2556bde16f	Add types to objects	2024-02-15 20:20:00 +01:00
daviddoji	36f6f88990	Add change 06 instructions	2024-02-15 20:18:45 +01:00