Compare commits

...

59 Commits

Author SHA1 Message Date
c484b41a89 Don't always derefence ref 2024-07-06 19:50:52 +02:00
fe5ed910a3 Add change 34 to instructions 2024-07-06 19:49:59 +02:00
9c53919802 Dereference refs when reading and writing 2024-06-29 18:48:21 +02:00
30ce8c84e4 Add change 33 to instructions 2024-06-29 18:47:23 +02:00
6841a97d18 Create RefValue container 2024-06-14 16:29:58 +02:00
556c16c081 Add change 32 to instructions 2024-06-14 16:29:25 +02:00
7a0f86e49b Implement symbolic refs idea 2024-06-09 21:20:53 +02:00
3770c81942 Implement symbolic refs idea 2024-06-09 21:20:38 +02:00
9f8fde3c60 Create new branch 2024-06-05 20:18:40 +02:00
772f631768 Add change 30 to instructions 2024-06-05 20:18:06 +02:00
7fe3e0f497 Use iter_commits_and_parents 2024-05-24 14:45:06 +02:00
7896b80c42 Add change 29 to instructions 2024-05-24 14:44:43 +02:00
b854b4fa18 Render graph 2024-05-22 16:52:44 +02:00
2362d69673 Add change 28 to instructions 2024-05-22 16:52:14 +02:00
d53322c256 Iterate commits and parents 2024-05-16 12:01:30 +02:00
7fbf6640f6 Iterate commits and parents 2024-05-16 11:54:52 +02:00
dad9077515 Add change 27 to instructions 2024-05-16 11:54:28 +02:00
db7d608010 Print refs for k (visualization tool) 2024-05-15 11:01:19 +02:00
c9d8b443ed Add change 26 to instructions 2024-05-15 10:59:54 +02:00
41333f06bc pass HEAD by default to argparse 2024-05-05 21:04:28 +02:00
fe292c02c9 Add change 25 instructions 2024-05-05 21:04:02 +02:00
de595261e6 Try different dirextories when searching for a ref 2024-04-23 17:36:41 +02:00
81bf86d41b Add change 24 instructions 2024-04-23 17:36:02 +02:00
671fa4b6b1 Resolve name to oid in argparse 2024-04-20 21:38:04 +02:00
63dcbeb9e7 Add change 23 instructions 2024-04-20 21:37:31 +02:00
edae32dc86 Create the tag ref 2024-04-17 19:33:54 +02:00
e85766f671 Add change 22 instructions 2024-04-17 19:33:24 +02:00
1f947e6343 Generalize HEAD to refs 2024-04-12 17:19:14 +02:00
cb8e744794 Add change 21 instructions 2024-04-12 17:18:48 +02:00
6797bcfabe Implement CLI command for tagging 2024-04-03 19:47:53 +02:00
95355befb4 Implement CLI command for tagging 2024-04-03 19:47:35 +02:00
b802e1eb9d Read tree and move HEAD in checkout 2024-03-29 18:27:10 +01:00
817f38f49c Add change 19 to instructions 2024-03-29 18:26:20 +01:00
d00a7817ab Add oid parameter to log 2024-03-29 18:10:35 +01:00
8ac5264366 Add change 18 to instructions 2024-03-29 18:10:16 +01:00
cd91f18da6 Implement log 2024-03-20 19:43:58 +01:00
78044a877a Add change 17 to instructions 2024-03-20 19:43:29 +01:00
b0d8cab498 Set parent to HEAD 2024-03-18 19:01:53 +01:00
450391089f Add change 16 to instructions 2024-03-18 19:01:16 +01:00
1847cfbb17 Record hash of last commit 2024-03-13 19:38:35 +01:00
6a91c03f40 Add change 15 to instructions 2024-03-13 19:37:46 +01:00
4e13a27f79 Create commit 2024-03-11 19:29:15 +01:00
2c940abd1d Add change 14 to instructions 2024-03-11 19:28:36 +01:00
c72370f930 Write-tree delete all before read 2024-03-07 19:44:28 +01:00
40a19615aa Add change 13 instructions 2024-03-07 19:43:13 +01:00
db8c1379c2 Read-tree extract tree from object 2024-03-02 16:18:48 +01:00
6f5fe864a9 Add change 12 instructions 2024-03-02 16:18:19 +01:00
2f8545d48e Write-tree write tree objects 2024-03-02 16:02:42 +01:00
5faf498917 Add change 11 instructions 2024-03-02 16:01:44 +01:00
4540b98a88 Write-tree hash files 2024-02-28 19:51:24 +01:00
af1928a360 Add change 10 instructions 2024-02-28 19:50:41 +01:00
46a20c8b60 Write-tree ignore .ugit files 2024-02-28 19:45:34 +01:00
fdfcfdbdad Add change 09 instructions 2024-02-28 19:44:10 +01:00
73eb89d397 Write-tree for listing files 2024-02-26 18:59:58 +01:00
d666efcbd3 Add change 08 instructions 2024-02-26 18:58:52 +01:00
30ee2098ab Add base module 2024-02-21 20:46:59 +01:00
103837cb73 Add change 07 instructions 2024-02-21 20:36:07 +01:00
2556bde16f Add types to objects 2024-02-15 20:20:00 +01:00
36f6f88990 Add change 06 instructions 2024-02-15 20:18:45 +01:00
33 changed files with 1005 additions and 14 deletions

17
how_to/Change_06.md Normal file
View File

@@ -0,0 +1,17 @@
- data: Add types to objects
As we will soon see, there will be different logical types of objects that are
used in different contexts (even though, from the Object Database's point of
view, they are just all bytes). In order to lower the chance of using an object
in the wrong context we're going to add a type tag for each object.
The type is just a string that's going to be prepended to the start of the file,
followed by a null byte. When reading the file later we'll extract the type and
verify that it's indeed the expected type.
The default type is going to be `blob`, since by default an object is a
collection of bytes with no further semantic meaning.
We can also pass `expected=None` to `get_object()` if we don't want to verify
the type. This is useful for the `cat-file` CLI command which is a debug command
used for printing all objects.

0
how_to/Change_07.md Normal file
View File

26
how_to/Change_08.md Normal file
View File

@@ -0,0 +1,26 @@
- write-tree: List files
The next command is `write-tree`. This command will take the current working
directory and store it to the object database. If `hash-object` was for storing
an individual file, then `write-tree` is for storing a whole directory.
Like `hash-object`, `write-tree` is going to give us an OID after it's done and
we'll be able to use the OID in order to retrieve the directory at a later time.
In Git's lingo a "tree" means a directory.
We'll get into the details in later changes, in this change we'll only prepare
the code around the feature:
+ Create a `write-tree` CLI command
+ Create a `write_tree()` function in base module. Why in base module and not
in data module? Because `write_tree()` is not going to write to disk directly
but use the object database provided by data to store the directory. Hence it
belongs to the higher-level base module.
+ Add code to `write_tree()` to print a directory recursively. For now nothing
is written anywhere, but we just coded the boilerplate to recursively scan a
directory.
We continue in the next change.

8
how_to/Change_09.md Normal file
View File

@@ -0,0 +1,8 @@
- write-tree: Ignore .ugit files
If we run `ugit write-tree`, we will see that it also prints the content of the
.ugit directory. This directory isn't part of the user's files, so let's ignore
it.
Actually, I created a separate `is_ignored()` function. This way if we have any
other files we want to ignore later we have one place to change.

12
how_to/Change_10.md Normal file
View File

@@ -0,0 +1,12 @@
- write-tree: Hash the files
Instead of only printing the file name, let's put all files in the object
database. For now we'll print their OID and their name.
Notice that instead of getting one OID to represent a directory we now get a
separate OID for each file, which isn't very useful. Plus, note that the names
of the files aren't stored in the object database, they are just printed and
then the information is discarded.
So at this stage `write-tree` isn't useful (it just saves a bunch of files as
blobs) but the next change will fix it.

62
how_to/Change_11.md Normal file
View File

@@ -0,0 +1,62 @@
- write-tree: Write tree objects
Now comes the fun part, where we turn a collection of separate files into a
single object that represents a directory.
The idea is that we will create one additional object that collects all the data
necessary to store a complete directory. For example, if we have a directory
with two files:
```
$ ls
cats.txt dogs.txt
```
And we want to save the directory, we will first put the individual files into
the object database:
```
$ ugit hash-object cats.txt
91a7b14a584645c7b995100223e65f8a5a33b707
$ ugit hash-object dogs.txt
fa958e0dd2203e9ad56853a3f51e5945dad317a4
```
Then we will create a "tree" object that has the content of:
```
91a7b14a584645c7b995100223e65f8a5a33b707 cats.txt
fa958e0dd2203e9ad56853a3f51e5945dad317a4 dogs.txt
```
And we will put this tree object into the object database as well. Then the OID
of the tree object will actually represent the entire directory! Why? Because we
can first retrieve the tree object by its OID, then see all the files it
contains (their names and OIDs) and then read all the OIDs of the files to get
their actual content.
What if our directory contains other directories? We'll just create tree objects
for them as well and we'll allow one tree object to point to another:
```
$ ls
cats.txt dogs.txt other/
$ ls other/
shoes.jpg
```
The root tree object will look like this:
```
blob 91a7b14a584645c7b995100223e65f8a5a33b707 cats.txt
blob fa958e0dd2203e9ad56853a3f51e5945dad317a4 dogs.txt
tree 53891a3c27b17e0f8fd96c058f968d19e340428d other
```
Note that we added a type to each entry so that we know if it's a file or a
directory. The tree that represents the "other" directory (OID 53891a3c27b17e0f8fd96c058f968d19e340428d) looks like:
```
blob 0aa186b09fd81e8cf449ba10eee6aff9711cc1ac shoes.jpg
```
We can think about this structure as a tree you know from Computer Science where
each entries' OID as a pointer to either another tree or to a file (leaf node).
Note that we actually save the tree objects with type "tree" in
`data.hash_object()` since we don't want the trees to be confused with regular
files.

28
how_to/Change_12.md Normal file
View File

@@ -0,0 +1,28 @@
- read-tree: Extract tree from object
This command will take an OID of a tree and extract it to the working directory.
Kind of the opposite of `write-tree`.
I divided the implementation into a few layers:
`_iter_tree_entries` is a generator that will take an OID of a tree, tokenize it
line-by-line and yield the raw string values.
`get_tree` uses `_iter_tree_entries` to recursively parse a tree into a
dictionary.
`read_tree` uses `get_tree` to get the file OIDs and writes them into the
working directory.
Now we can actually save versions of the working directory! It's nothing like
proper version control, but we can see that a super basic flow is possible:
+ Imagine you work on some code and you want to save a version.
+ You run ```ugit write-tree```.
+ You remember that OID that was printed out (write it on a post-it note or
something :)).
+ Continue working and repeat steps 2 and 3 as necessary.
+ If you want to return to a previous version, use `ugit read-tree` to restore
it to the working directory.
Is it convenient to use? No. But it's just the beginning!

7
how_to/Change_13.md Normal file
View File

@@ -0,0 +1,7 @@
- read-tree: Delete all existing stuff before reading
This is done so that we won't have any old files left around after a read-tree.
Before this change, if we save tree A which contains only `a.txt`, then we save
tree B which contains `a.txt` and `b.txt` and then we `read-tree` A, we will
have `b.txt` left over in the working directory.

31
how_to/Change_14.md Normal file
View File

@@ -0,0 +1,31 @@
- commit: Create commit
So far we were able to save versions of a directory (with `write-tree`), but
without any additional context. In reality, when we save a snapshot we would
like to attach data such as:
+ Message describing it
+ When the snapshot was created
+ Who created the snapshot
+ ...
We will create a new type of object called a "commit" that will store all this
information. A commit will just be a text file stored in the object database
with the type of `'commit'`.
The first lines in the commit will be key-values, then an empty line will mark
the end of the key-values and then the commit message will follow. Like this:
```
tree 5e550586c91fce59e0006799e0d46b3948f05693
author Nikita Leshenko
time 2019-09-14T09:31:09+00:00
This is the commit message!
```
For now we'll just write the "tree" key and the commit message to the commit
object.
We will create a new `ugit commit` command that will accept a commit message,
snapshot the current directory using `ugit write-tree` and save the resulting
object.

10
how_to/Change_15.md Normal file
View File

@@ -0,0 +1,10 @@
- commit: Record hash of last commit to HEAD
I would like to link new commits to older commits. Right now, if we make changes
in the working directory and make periodic commits, each commit will be a
standalone object, separate from all other commits. The motivation for linking
them together is so that we can look at the commits as a series of snapshots in
some order.
Before we can do it, let's record the OID of the last commit that we created.
We'll call the last commit the "HEAD" and just put the OID in .ugit/HEAD file.

23
how_to/Change_16.md Normal file
View File

@@ -0,0 +1,23 @@
- commit: set parent to HEAD
When creating a new commit, we will use the HEAD to link the new commit to the
previous commit. We'll call the previous commit the "parent commit" and we will
save its OID in the "parent" key on the commit object.
For example, HEAD is currently bd0de093f1a0f90f54913d694a11cccf450bd990 and we
create a new commit, the new commit will look like this in the object store:
```
tree 50bed982245cd21e2798f179e0b032904398485b
parent bd0de093f1a0f90f54913d694a11cccf450bd990
This is the commit message!
```
The first commit in the repository will obviously have no parent.
Now we can retrieve the entire list of commits just by referencing the last
commit! We can start from the HEAD, read the "parent" key on the HEAD commit and
discover the commit before HEAD. Then read the parent of that commit, and go
back on and on... This is basically a linked list implemented over the object
database.

12
how_to/Change_17.md Normal file
View File

@@ -0,0 +1,12 @@
- log: Implement
`log` will walk the list of commits and print them.
We will start by implementing `get_commit()` that will parse a commit object by
OID.
Then in the CLI module we will start from the HEAD commit and walk its parents
until we reach a commit without a parent.
The result is that the entire commit history is printed to the screen once we
run `ugit log`.

5
how_to/Change_18.md Normal file
View File

@@ -0,0 +1,5 @@
- log: Add oid parameter
Just a small cosmetic change: Instead of always printing the list of commits
from HEAD, add an optional parameter to specify an alternative commit OID to
start from. By default it will still be HEAD.

90
how_to/Change_19.md Normal file
View File

@@ -0,0 +1,90 @@
- checkout: Read tree and move HEAD
When given a commit OID, `ugit checkout` will "checkout" that commit, meaning
that it will populate the working directory with the content of the commit and
move HEAD to point to it.
This is a small but important change and it greatly expands the power of ugit in
two ways.
First, it allows us to travel conveniently in history. If we've made a handful
of commits and we would like to revisit a previous commit, we can now "checkout"
that commit to the working directory, play with it (compile, run tests, read
code, whatever we want) and checkout the latest commit again to resume working
where we've left.
You might be wondering why `checkout` is needed when we could just use
`read-tree`, and the answer is that moving HEAD in addition to reading the tree
allows us to record which commit is checked out right now. If we would only use
`read-tree` and later forget which commit we are looking at, we will see a bunch
of files in the working directory and have no idea where they came from. On the
other hand, if we use `checkout`, the commit will be recorded in HEAD and we can
always know what we're looking at (by running `ugit log` for example and seeing the first entry).
The second way by which `checkout` expands the power of ugit is by allowing
multiple branches of history. Let me explain: So far we have set HEAD to point
to the latest commit that was created. It means that all our commits were
linear, each new commit was added on top of the previous. The `checkout`
command now allows us to move HEAD to any commit we wish. Then, new commits will
be created on top of the current HEAD commit, which isn't necessarily the last
created commit.
For example, imagine that we're working on some code. So far, we have created a
few commits, represented by a graph:
```
o-----o-----o-----o
^ ^
first commit HEAD
```
Then we wanted to code a new feature. We created a few commits while working on
the feature (new commits represented by @):
```
o-----o-----o-----o-----@-----@-----@
^ ^
first commit HEAD
```
Now we have an alternative idea for implementing that feature. We would like to
go back in time and try a different implementation, without throwing away the
current implementation. We can remember the current HEAD and run `ugit checkout`
to go back in time, by providing the OID of the commit before the new feature
was implemented (that OID can be discovered with `ugit log`).
```
o-----o-----o-----o-----@-----@-----@
^ ^
first commit HEAD
```
The working directory will effectively go back in time. We can start working on
an alternative implementation and create new commit. The new commits will be on
top of HEAD and look like this (represented by $):
```
o-----o-----o-----o-----@-----@-----@
^ \
first commit ----$-----$
^
HEAD
```
See how the history now contains two "branches". We can actually switch back and
forth between them and work on them in parallel. Finally, we can checkout the
preferred implementation and work from it on future code. Assuming that we liked
the second branch, we'll just keep working from it, and future commits will look
like this:
```
o-----o-----o-----o-----@-----@-----@
^ \
first commit ----$-----$-----o-----o-----o-----o-----o
^
HEAD
```
Pretty useful, right? We've just introduced a simple form of branching history.
Note that something pretty cool happened here: The implementation of checkout is
very simple (we just call `read_tree` and update HEAD) but the implications of
checkout are quite big - we can suddenly have a branching workflow which might
look complicated but it is actually a direct consequence of what we implemented
in previous changes. This is why I believe learning Git internals from the
bottom up is useful - we can see how simple concepts compose into complicated
functionality.

41
how_to/Change_20.md Normal file
View File

@@ -0,0 +1,41 @@
- tag: Implement CLI command
Now that we have branching history we have some OIDs we need to keep track of.
Assume we have two branches (continuing from the example we had for `checkout`):
```
o-----o-----o-----o-----@-----@-----@
^ \ ^
first commit ----$-----$ 6c9f80a187ba39b4...
^
d8d43b0e3a21df0c...
```
If we want to switch back and forth between the two "branches" with `checkout`,
we need to remember both OIDs, which are quite long.
To make our lives easier, let's implement a command to attach a name to an OID.
Then we'll be able to refer to the OID by that name.
The end result will look like this:
```
$ # Make some changes
...
$ ugit commit
d8d43b0e3a21df0c845e185d08be8e4028787069
$ ugit tag my-cool-commit d8d43b0e3a21df0c845e185d08be8e4028787069
$ # Make more changes
...
$ ugit commit
e549f09bbd08a8a888110b07982952e17e8c9669
$ ugit checkout my-cool-commit
or
$ ugit checkout d8d43b0e3a21df0c845e185d08be8e4028787069
```
The last two commands are equivalent, because "my-cool-commit" is a tag that
points to d8d43b0e3a21df0c845e185d08be8e4028787069.
We will implement this in a few steps. The first step is to create a CLI
commmand that call the relevant command in the base module. The base module does
nothing at this stage.

23
how_to/Change_21.md Normal file
View File

@@ -0,0 +1,23 @@
- tag: Generalize HEAD to refs
As part of implementing `tag`, we'll generalize the way we handle HEAD. If you
think about it, HEAD and tags are similar. They are both ways for ugit to attach
a name to an OID. In case of HEAD, the name is hardcoded by ugit; in case of
tags, the name will be provided by the user. It makes sense to handle them
similarly in *data.py*.
In *data.py*, let's extend the function `set_HEAD` and `get_HEAD` to
`update_ref` and `get_ref`. "Ref" is a short for reference, and that's the name
Git uses. The function will now accept the name of the ref and write/read it as
a file under *.ugit* directory. Logically, a ref is a named pointer to an object.
The important change is in *data.py*. The rest of the changes just rename some
functions:
```
- get_HEAD() -> get_ref('HEAD')
- set_HEAD(oid) -> update_ref('HEAD', oid)
```
Note that we didn't change any behaviour of ugit here, this is purely
refactoring.

28
how_to/Change_22.md Normal file
View File

@@ -0,0 +1,28 @@
- tag: Create the tag ref
After we've implemented refs in the previous change, it's time to create a ref
when the user creates a tag.
`create_tag` now calls update_ref with the tag name to actually create the tag.
For namespacing purposes, we'll put all tags under *refs/tags/*. That is, if the
user creates *my-cool-commit* tag, we'll create *refs/tags/my-cool-commit* ref
to point to the desired OID.
Then we'll update *data.py* to handle this "namespaced" ref. Since we can't have
a / in the file name, we'll create directories for it. Now if a ref
*refs/tags/sometag* is created, it will be placed under *.ugit/refs/tags* in a
file named *sometag*.
To verify that this code works, you can run:
```
$ ugit tag test
```
And make sure that the tag points to HEAD:
```
$ cat .ugit/refs/tags/test
$ cat .ugit/HEAD
```
The last two commands should give the same output.

22
how_to/Change_23.md Normal file
View File

@@ -0,0 +1,22 @@
- tag: Resolve name to oid in argparse
It's nice that we can create tags, but now let's actually make them usable from
the CLI.
In *base.py*, we'll create `get_oid` to resolve a "name" to an OID. A name can
either be a ref (in which case `get_oid` will return the OID that the ref points
to) or an OID (in which case `get_oid` will just return that same OID).
Next, we'll modify the argument parser in *cli.py* to call `get_oid` on all
arguments which are expected to be an OID. This way we can pass a ref there
instead of an OID.
At this point we can do something like:
```
$ ugit tag mytag d8d43b0e3a21df0c845e185d08be8e4028787069
$ ugit log refs/tags/mytag
# Will print log of commits starting at d8d43b0e...
$ ugit checkout refs/tags/mytag
# Will checkout commit d8d43b0e...
etc...
```

18
how_to/Change_24.md Normal file
View File

@@ -0,0 +1,18 @@
- base: Try different directories when searching for a ref
In the previous change, you might have noticed that we need to spell out the
full name of a tag (Like *refs/tags/mytag*). This isn't very convenient, we
would like to have shorter command names. For example, if we've created "mytag"
tag, we should be able to do `ugit log mytag` rather than having to specify
`ugit log refs/tags/mytag`.
We'll extend `get_oid` to search in different ref subdirectories when resolving
a name. We'll search in:
```
Root (.ugit): This way we can specify refs/tags/mytag
.ugit/refs: This way we can specify tags/mytag
.ugit/refs/tags: This way we can specify mytag
.ugit/refs/heads: This will be needed for a future change
```
If we find the requested name in any of the directories, return it. Otherwise
assume that the name is an OID.

12
how_to/Change_25.md Normal file
View File

@@ -0,0 +1,12 @@
- cli: pass HEAD by default in argparse
First, make "@" be an alias for HEAD. (Implemented in `get_oid`)
Second, do a little refactoring in *cli.py*. Some commands accept an optional
OID argument and if the argument isn't provided it defaults to HEAD. For example
`git log` can get an OID to start logging from, but by default it logs all
commits before HEAD.
Instead of having each command implement this logic, let's just make "@" (HEAD)
be the default value for those commands. The relevant commands at this stage
are `log` and `tag`. More will follow.

14
how_to/Change_26.md Normal file
View File

@@ -0,0 +1,14 @@
- k: Print refs
Now that we have refs and a potentially branching commit history, it's a good
idea to create a visualization tool to see all the mess that we've created.
The visualization tool will draw all refs and all the commits pointed by the refs.
Our command to run the tool will be called `ugit k`, similar to `gitk` (which is
a graphical visualization tool for Git).
We'll create a new `k` command in *cli.py*. We'll create `iter_refs` which is a
generator which will iterate on all available refs (it will return HEAD from the
ugit root directory and everything under *.ugit/refs*). As a first step, let's
just print all refs when running `k`.

21
how_to/Change_27.md Normal file
View File

@@ -0,0 +1,21 @@
- k: Iterate commits and parents
In addition to printing the refs, we'll also print all OIDs that are reachable
from those refs. We'll create `iter_commits_and_parents`, which is a generator
that returns all commits that it can reach from a given set of OIDs.
Note that `iter_commits_and_parents` will return an OID once, even if it's
reachable from multiple refs. Here, for example:
```
o<----o<----o<----o<----@<----@<----@
^ \ ^
first commit -<--$<----$ refs/tags/tag1
^
refs/tags/tag2
```
We can reach the first commit by following the parents of *tag1* or by following
the parents of *tag2*. Yet if we call `iter_commits_and_parents({tag1, tag2})`,
the first commit will be yielded only once. This property will be useful later.
(Note that nothing is visualized yet, we're preparing for that.)

18
how_to/Change_28.md Normal file
View File

@@ -0,0 +1,18 @@
- k: Render graph
`k` is supposed to be a visualization tool, but so far we've just printed a
bunch of OIDs... Now comes the visualization part!
There's a convenient file format called "dot" that can describe a graph. This is
a textual format. We'll generate a graph of all commits and refs in dot format
and then visualize it using the "dot" utility that comes with Graphviz.
(If you're unfamiliar with dot or Graphviz please look it up online.)
The graph will contain a node for each commit, that points to the parent commit.
The graph will also contain a node for each ref, which points to the relevant
commit.
At this point, `ugit k` is fully functional and I encourage you to play with it.
Create a crazy branching history and a bunch of tags and see for yourself that
`ugit k` can draw all that visually.

9
how_to/Change_29.md Normal file
View File

@@ -0,0 +1,9 @@
- log: Use `iter_commits_and_parents`
Refactoring ahead! Since we have `iter_commits_and_parents` from `k`, let's also
use this function in `log`. We'll need to adjust it a bit to use
`collections.deque` instead of a set so that the order of commits is deterministic.
This generalization might seem unneeded at this point, but it will be useful
later. (Note for the advanced folks: When we implement merge commits that have
multiple parents, this generic way to iterate will come in handy.)

82
how_to/Change_30.md Normal file
View File

@@ -0,0 +1,82 @@
- branch: Create new branch
Tags were an improvement since they freed us from the burden of remembering OIDs
directly. But they are still somewhat inconvenient, since they are static. Let
me illustrate:
```
o-----o-----o-----o-----o-----o-----o
\ ^
----o-----o tag2,HEAD
^
tag1
```
If we have the above situation, we can easily flip between *tag1* and *tag2* with
`checkout`. But what happens if we do
- ugit checkout tag2
- Make some changes
- ugit commit?
Now it looks like this:
```
o-----o-----o-----o-----o-----o-----o-----o
\ ^ ^
----o-----o tag2 HEAD
^
tag1
```
The upper branch has advanced, but *tag2* still points to the previous commit.
This is by design, since tags are supposed to just name a specific OID. So if we
want to remember the new HEAD position we need to create another tag.
But now let's create a ref that will "move forward" as the branch grows. Just
like we have `ugit tag`, we'll create `ugit branch` that will point a branch to
a specific OID. This time the ref will be created under *refs/heads*.
At this stage, `branch` doesn't look any different from tag (the only difference
is that the branch is created under *refs/heads* rather than *refs/tags*). But
the magic will happen once we try to `checkout` a branch.
So far when we checkout anything we update HEAD to point to the OID that we've
just checked out. But if we checkout a branch by name, we'll do something
different, we will update HEAD to point to the **name of the branch!** Assume
that we have a branch here:
```
o-----o-----o-----o-----o-----o-----o
\ ^
----o-----o tag2,branch2
^
tag1
```
Running `ugit checkout branch2` will create the following situation:
```
o-----o-----o-----o-----o-----o-----o
\ ^
----o-----o tag2,branch2 <--- HEAD
^
tag1
```
You see? HEAD points to *branch2* rather than the OID of the commit directly.
Now if we create another commit, ugit will update HEAD to point to the latest
commit (just like it does every time) but as a side effect it will also update
*branch2* to point to the latest commit.
```
o-----o-----o-----o-----o-----o-----o-----o
\ ^ ^
----o-----o tag2 branch2 <--- HEAD
^
tag1
```
This way, if we checkout a branch and create some commits on top of it, the ref
will always point to the latest commit.
But right now HEAD (or any ref for that matter) may only point to an OID. It
can't point to another ref, like I described above. So our next step would be
to implement this concept. To mirror Git's terminology, we will call a ref that
points to another ref a "symbolic ref". Please see the next change for an
implementation of symbolic refs.

5
how_to/Change_31.md Normal file
View File

@@ -0,0 +1,5 @@
- data: Implement symbolic refs idea
If the file that represents a ref contains an OID, we'll assume that the ref
points to an OID. If the file contains the content `ref: <refname>`, we'll
assume that the ref points to `<refname>` and we will dereference it recursively.

8
how_to/Change_32.md Normal file
View File

@@ -0,0 +1,8 @@
- data: Create Refvalue container
To make working with symbolic refs easier, we will create a `Refvalue` container
to represent the value of a ref. `Refvalue` will have a property symbolic that
will say whether it's a symbolic or a direct ref.
This change is just refactoring, we will wrap every OID that is written or read
from a ref in a `RefValue`.

17
how_to/Change_33.md Normal file
View File

@@ -0,0 +1,17 @@
data: Dereference refs when reading and writing
Now we'll dereference symbolic refs not only when reading them but also when
writing them.
We'll implement a helper function called `_get_ref_internal` which will return
the path and the value of the last ref pointed by a symbolic ref. In simple words:
- When given a non-symbolic ref, `_get_ref_internal` will return the ref name
and value.
- When given a symbolic ref, `_get_ref_internal` will dereference the ref
recursively, and then return the name of the last (non-symbolic) ref that points
to an OID, plus its value.
Now `update_ref` will use `_get_ref_internal` to know which ref it needs to update.
Additionally, we'll use `_get_ref_internal` in `get_ref`.

15
how_to/Change_34.md Normal file
View File

@@ -0,0 +1,15 @@
- data: Don't always dereference refs (for `ugit k`)
Actually, it's not always desirable to dereference a ref all the way. Sometimes
we would like to know at which ref a symbolic ref points, rather than the final
OID. Or we would like to update a ref directly, rather then updating the last
ref in the chain.
One such usecase is `ugit k`. When visualizing refs it would be nice to see
which ref points to which ref. We will see another usecase soon.
To accomodate this, we will add a `deref` option to `get_ref`, `iter_refs` and
`update_ref`. If they will be called with `deref=False`, they will work on the
raw value of a ref and not dereference any symbolic refs.
Then we will update `k` to use `deref=False`.

3
ruff.toml Normal file
View File

@@ -0,0 +1,3 @@
[lint]
select = ["E", "F"]
ignore = ["F401"]

176
ugit/base.py Normal file
View File

@@ -0,0 +1,176 @@
import itertools
import operator
import os
import string
from collections import deque, namedtuple
from pathlib import Path, PurePath
from . import data
def write_tree(directory="."):
entries = []
with Path.iterdir(directory) as it:
for entry in it:
full = f"{directory}/{entry.name}"
if is_ignored(full):
continue
if entry.is_file(follow_symlinks=False):
type_ = "blob"
with open(full, "rb") as f:
oid = data.hash_object(f.read())
elif entry.is_dir(follow_symlinks=False):
type_ = "tree"
oid = write_tree(full)
entries.append((entry.name, oid, type_))
tree = "".join(f"{type_} {oid} {name}\n" for name, oid, type_ in sorted(entries))
return data.hash_object(tree.encode(), "tree")
def _iter_tree_entries(oid):
if not oid:
return
tree = data.get_object(oid, "tree")
for entry in tree.decode().splitlines():
type_, oid, name = entry.split(" ", 2)
yield type_, oid, name
def get_tree(oid, base_path=""):
result = {}
for type_, oid, name in _iter_tree_entries(oid):
assert "/" not in name
assert name not in ("..", ".")
path = base_path + name
if type_ == "blob":
result[path] = oid
elif type_ == "tree":
result.update(get_tree(oid, f"{path}/"))
else:
assert False, f"Unknown tree entry {type_}"
return result
def _empty_current_directory():
for root, dirnames, filenames in os.walk(".", topdown=False):
for filename in filenames:
path = PurePath.relative_to(f"{root}/{filename}")
if is_ignored(path) or not Path.is_file(path):
continue
Path.unlink(path)
for dirname in dirnames:
path = PurePath.relative_to(f"{root}/{dirname}")
if is_ignored(path):
continue
try:
Path.rmdir(path)
except (FileNotFoundError, OSError):
# Deletion might fail if the directory contains ignored files,
# so it's OK
pass
def read_tree(tree_oid):
_empty_current_directory()
for path, oid in get_tree(tree_oid, base_path="./").items():
Path.mkdir(PurePath.parent(path), exist_ok=True)
with open(path, "wb") as f:
f.write(data.get_object(oid))
def commit(message):
commit = f"tree {write_tree()}\n"
HEAD = data.get_ref("HEAD").value
if HEAD:
commit += f"parent {HEAD}\n"
commit += "\n"
commit += f"{message}\n"
oid = data.hash_object(commit.encode(), "commit")
data.update_ref("HEAD", data.RefValue(symbolic=False, value=oid))
return oid
def create_tag(name, oid):
data.update_ref(f"refs/tags/{name}", data.RefValue(symbolic=False, value=oid))
def checkout(oid):
commit = get_commit(oid)
read_tree(commit.tree)
data.update_ref("HEAD", data.RefValue(symbolic=False, value=oid))
def create_branch(name, oid):
data.update_ref(f"refs/heads/{name}", data.RefValue(symbolic=False, value=oid))
Commit = namedtuple("Commit", ["tree", "parent", "message"])
def get_commit(oid):
parent = None
commit = data.get_object(oid, "commit").decode()
lines = iter(commit.splitlines())
for line in itertools.takewhile(operator.truth, lines):
key, value = line.split(" ", 1)
if key == "tree":
tree = value
elif key == "parent":
parent = value
else:
assert False, f"Unknown field {key}"
message = "\n".join(lines)
return Commit(tree=tree, parent=parent, message=message)
def iter_commits_and_parents(oids):
oids = deque(oids)
visited = set()
while oids:
oid = oids.popleft()
if not oid or oid in visited:
continue
visited.add(oid)
yield oid
commit = get_commit(oid)
# Return parent next
oids.appendleft(commit.parent)
def get_oid(name):
if name == "@":
name = "HEAD"
# Name is ref
refs_to_try = [
f"{name}",
f"refs/{name}",
f"refs/tags/{name}",
f"refs/heads/{name}",
]
for ref in refs_to_try:
if data.get_ref(ref, deref=False).value:
return data.get_ref(ref).value
# Name is SHA1
is_hex = all(c in string.hexdigits for c in name)
if len(name) == 40 and is_hex:
return name
assert False, f"Unknown name {name}"
def is_ignored(path):
return ".ugit" in path.split("/")

View File

@@ -1,8 +1,11 @@
import argparse
import subprocess
import sys
import textwrap
from pathlib import Path
import argparse
import sys
from . import base
from . import data
@@ -17,17 +20,51 @@ def parse_args():
commands = parser.add_subparsers(dest="command")
commands.required = True
oid = base.get_oid
init_parser = commands.add_parser("init")
init_parser.set_defaults(func=init)
cat_file_parser = commands.add_parser("cat-file")
cat_file_parser.set_defaults(func=cat_file)
cat_file_parser.add_argument("object")
hash_object_parser = commands.add_parser("hash-object")
hash_object_parser.set_defaults(func=hash_object)
hash_object_parser.add_argument("file")
cat_file_parser = commands.add_parser("cat-file")
cat_file_parser.set_defaults(func=cat_file)
cat_file_parser.add_argument("object", type=oid)
write_tree_parser = commands.add_parser("write-tree")
write_tree_parser.set_defaults(func=write_tree)
read_tree_parser = commands.add_parser("read-tree")
read_tree_parser.set_defaults(func=read_tree)
read_tree_parser.add_argument("tree", type=oid)
commit_parser = commands.add_parser("commit")
commit_parser.set_defaults(func=commit)
commit_parser.add_argument("-m", "--message", required=True)
log_parser = commands.add_parser("log")
log_parser.set_defaults(func=log)
log_parser.add_argument("oid", default="@", type=oid, nargs="?")
checkout_parser = commands.add_parser("checkout")
checkout_parser.set_defaults(func=checkout)
checkout_parser.add_argument("oid", type=oid)
tag_parser = commands.add_parser("tag")
tag_parser.set_defaults(func=tag)
tag_parser.add_argument("name")
tag_parser.add_argument("oid", default="@", type=oid, nargs="?")
branch_parser = commands.add_parser("branch")
branch_parser.set_defaults(func=branch)
branch_parser.add_argument("name")
branch_parser.add_argument("start_point", default="@", type=oid, nargs="?")
k_parser = commands.add_parser("k")
k_parser.set_defaults(func=k)
return parser.parse_args()
@@ -43,4 +80,63 @@ def hash_object(args):
def cat_file(args):
sys.stdout.flush()
sys.stdout.buffer.write(data.get_object(args.object))
sys.stdout.buffer.write(data.get_object(args.object), expected=None)
def write_tree(args):
print(base.write_tree())
def read_tree(args):
base.read_tree(args.tree)
def commit(args):
print(base.commit(args.message))
def log(args):
for oid in base.iter_commits_and_parents({args.oid}):
commit = base.get_commit(oid)
print(f"commit {oid}\n")
print(textwrap.indent(commit.message, " "))
print("")
def checkout(args):
base.checkout(args.oid)
def tag(args):
base.create_tag(args.name, args.oid)
def branch(args):
base.create_branch(args.name, args.start_point)
print(f"Branch {args.name} created at {args.start_point[:10]}")
def k(args):
dot = "digraph commits {\n"
oids = set()
for refname, ref in data.iter_refs(deref=False):
dot += f"'{refname}' [shape=note]\n"
dot += f"'{refname}' -> '{ref.value}'\n"
if not ref.symbolic:
oids.add(ref.value)
for oid in base.iter_commits_and_parents(oids):
commit = base.get_commit(oid)
dot += f"'{oid}' [shape=box style=filled label='{oid[:10]}']\n"
if commit.parent:
dot += f"'{oid}' -> '{commit.parent}'\n"
dot += "}"
print(dot)
with subprocess.Popen(
["dot", "-Tgtk", "/dev/stdin"], stdin=subprocess.PIPE
) as proc:
proc.communicate(dot.encode())

View File

@@ -1,6 +1,9 @@
from pathlib import Path
from pathlib import Path, PurePath
import hashlib
import os
from collections import namedtuple
GIT_DIR = ".ugit"
@@ -10,13 +13,62 @@ def init():
Path.mkdir(f"{GIT_DIR}/objects")
def hash_object(data):
oid = hashlib.sha1(data).hexdigest()
RefValue = namedtuple("RefValue", ["symbolic", "value"])
def update_ref(ref, value, deref=True):
assert not value.symbolic
ref = _get_ref_internal(ref, deref)[0]
ref_path = f"{GIT_DIR}/{ref}"
Path.mkdir(ref_path, exist_ok=True)
with open(ref_path, "w") as f:
f.write(value.value)
def get_ref(ref):
return _get_ref_internal(ref)[1]
def _get_ref_internal(ref):
ref_path = f"{GIT_DIR}/{ref}"
value = None
if Path.is_file(ref_path):
with open(ref_path) as f:
value = f.read().strip()
symbolic = bool(value) and value.startswith("ref")
if symbolic:
value = value.split(":", 1)[1].strip()
return _get_ref_internal(value)
return ref, RefValue(symbolic=False, value=value)
def iter_refs():
refs = ["HEAD"]
for root, _, filenames in Path.walk(f"{GIT_DIR}/refs"):
root = PurePath.relative_to(root, GIT_DIR)
refs.extend(f"{root}/{name}" for name in filenames)
for refname in refs:
yield refname, get_ref(refname)
def hash_object(data, type_="blob"):
obj = type_.encode() + b"\x00" + data
oid = hashlib.sha1(obj).hexdigest()
with open(f"{GIT_DIR}/objects/{oid}", "wb") as out:
out.write(data)
out.write(obj)
return oid
def get_object(oid):
def get_object(oid, expected="blob"):
with open(f"{GIT_DIR}/objects/{oid}", "rb") as f:
return f.read()
obj = f.read()
type_, _, content = obj.partition(b"\x00")
type_ = type_.decode()
if expected is not None:
assert type_ == expected, f"Expected {expected}, got {type_}"
return content