Compare commits
67 Commits
41554ea286
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| c484b41a89 | |||
| fe5ed910a3 | |||
| 9c53919802 | |||
| 30ce8c84e4 | |||
| 6841a97d18 | |||
| 556c16c081 | |||
| 7a0f86e49b | |||
| 3770c81942 | |||
| 9f8fde3c60 | |||
| 772f631768 | |||
| 7fe3e0f497 | |||
| 7896b80c42 | |||
| b854b4fa18 | |||
| 2362d69673 | |||
| d53322c256 | |||
| 7fbf6640f6 | |||
| dad9077515 | |||
| db7d608010 | |||
| c9d8b443ed | |||
| 41333f06bc | |||
| fe292c02c9 | |||
| de595261e6 | |||
| 81bf86d41b | |||
| 671fa4b6b1 | |||
| 63dcbeb9e7 | |||
| edae32dc86 | |||
| e85766f671 | |||
| 1f947e6343 | |||
| cb8e744794 | |||
| 6797bcfabe | |||
| 95355befb4 | |||
| b802e1eb9d | |||
| 817f38f49c | |||
| d00a7817ab | |||
| 8ac5264366 | |||
| cd91f18da6 | |||
| 78044a877a | |||
| b0d8cab498 | |||
| 450391089f | |||
| 1847cfbb17 | |||
| 6a91c03f40 | |||
| 4e13a27f79 | |||
| 2c940abd1d | |||
| c72370f930 | |||
| 40a19615aa | |||
| db8c1379c2 | |||
| 6f5fe864a9 | |||
| 2f8545d48e | |||
| 5faf498917 | |||
| 4540b98a88 | |||
| af1928a360 | |||
| 46a20c8b60 | |||
| fdfcfdbdad | |||
| 73eb89d397 | |||
| d666efcbd3 | |||
| 30ee2098ab | |||
| 103837cb73 | |||
| 2556bde16f | |||
| 36f6f88990 | |||
| a010615cf2 | |||
| 9634544d68 | |||
| 71abdf3454 | |||
| c647f99e5c | |||
| 1f7354666b | |||
| 55b2c17913 | |||
| 227d6ccd30 | |||
| 25febeecb1 |
7
how_to/Change_02.md
Normal file
7
how_to/Change_02.md
Normal file
@@ -0,0 +1,7 @@
|
||||
- cli: Add argument parser
|
||||
|
||||
The real Git executable has multiple sub-commands, like 'git init', 'git commit',
|
||||
etc. Let's use Python's built-in argument parser argparse to implement sub-commands.
|
||||
|
||||
You can see on the other side which changes were made. Now we can run 'ugit init'
|
||||
and see "Hello, World!" printed out.
|
||||
18
how_to/Change_03.md
Normal file
18
how_to/Change_03.md
Normal file
@@ -0,0 +1,18 @@
|
||||
- init: Create new .ugit directory
|
||||
|
||||
The 'ugit init' command creates a new empty repository.
|
||||
|
||||
Git stores all repository data locally, in a subdirectory called ".git", so upon
|
||||
initialization we'll create one.
|
||||
|
||||
I named the directory ".ugit" rather than ".git" so that it doesn't clash with
|
||||
Git, but the idea is the same.
|
||||
|
||||
To implement 'init', we could have just called os.makedirs from cli.py, but I
|
||||
want to have some separation between different logical parts of the code:
|
||||
|
||||
+ cli.py - In charge of parsing and processing user input.
|
||||
+ data.py - Manages the data in .ugit directory. Here will be the code that
|
||||
actually touches files on disk.
|
||||
|
||||
This separation will be useful as the code will get larger.
|
||||
63
how_to/Change_04.md
Normal file
63
how_to/Change_04.md
Normal file
@@ -0,0 +1,63 @@
|
||||
- hash-object: Save object
|
||||
|
||||
Let's create the first non-trivial command. This command will take a file and
|
||||
store it in our '.ugit' directory for later retrieval. In Git's lingo, this
|
||||
feature is called "the object database". It allows us to store and retrieve
|
||||
arbitrary blobs, which are called "objects". As far as the Object Database is
|
||||
concerned, the content of the object doesn't have any meaning (just like a
|
||||
filesystem doesn't care about the internal structure of a file).
|
||||
|
||||
Because this command needs the '.ugit' directory, it must be run from the same
|
||||
directory where you did 'ugit init'.
|
||||
|
||||
Note that this is a very low-level Git building block and we're not talking yet
|
||||
about versions or commits or any other things that you might have heard about,
|
||||
we're just talking about an interface for storing some raw bytes.
|
||||
|
||||
So we can store an object, but how would we refer to it later? We could ask the
|
||||
user to provide a name along with the object and retrieve the object later using
|
||||
the name, but there is a nicer way: We can refer to the object using its hash.
|
||||
|
||||
If you haven't heard about hashes and hash functions, I suggest that you pause
|
||||
and do some reading on it. In summary, a hash function can take a blob of
|
||||
arbitrary length and produce a small "fingerprint" with a fixed length. Some
|
||||
hash functions such as SHA-1 guarantee that different blobs are very very very
|
||||
likely to produce different fingerprints (so likely, that Git assumes it's
|
||||
guaranteed). Let's try some strings to see an example:
|
||||
|
||||
```
|
||||
$ echo -n this is cool | sha1sum
|
||||
60f51187e76a9de0ff3df31f051bde04da2da891
|
||||
|
||||
$ echo -n this is cooler | sha1sum
|
||||
f3c953b792f9ab39d1be0bdab7ab5f8350593004
|
||||
```
|
||||
|
||||
You can see that hashing the phrases "this is cool" and "this is cooler" gives
|
||||
completely different hashes even though the difference between the phrases is
|
||||
small.
|
||||
|
||||
We're going to use the hash as the name of object (we'll call this name an
|
||||
"OID"* - object ID).
|
||||
|
||||
So the flow of the command hash-object is:
|
||||
|
||||
+ Get the path of the file to store.
|
||||
+ Read the file.
|
||||
+ Hash the content of the file using SHA-1.
|
||||
+ Store the file under ".ugit/objects/{the SHA-1 hash}".
|
||||
|
||||
This type of storage is called content-addressable storage because the "address"
|
||||
that we use to find a blob is based on the content of the blob itself. (In
|
||||
contrast to name-addressable storage, such as a typical filesystem, where you
|
||||
address a particular file by its name, regardless of its content).
|
||||
Content-addressable storage has nice properties when synchronizing data between
|
||||
different computers - if two repositories have an object with the same OID we
|
||||
can be sure that they are the same object. Also since two different objects are
|
||||
practically guaranteed to have different OIDs, we can't have naming clashes
|
||||
between objects.
|
||||
|
||||
When real Git stores objects it does a few extra things, such as writing the
|
||||
size of the object to the file as well, compressing them and dividing the
|
||||
objects into 256 directories. This is done to avoid having directories with huge
|
||||
number of files, which can hurt performance. We're not going to do this in ugit for simplicity.
|
||||
24
how_to/Change_05.md
Normal file
24
how_to/Change_05.md
Normal file
@@ -0,0 +1,24 @@
|
||||
- cat-file: Print hashed objects
|
||||
|
||||
This command is the "opposite" of `hash-object`: it can print an object by its
|
||||
OID. Its implementation just reads the file at ".ugit/objects/{OID}".
|
||||
|
||||
The names `hash-object` and `cat-file` aren't the clearest of names, but they
|
||||
are the names that Git uses so we'll stick to them for consistency.
|
||||
|
||||
We can now try the full cycle:
|
||||
|
||||
```
|
||||
$ cd /tmp/new
|
||||
$ ugit init
|
||||
Initialized empty ugit repository in /tmp/new/.ugit
|
||||
$ echo some file > bla
|
||||
$ ugit hash-object bla
|
||||
0e08b5e8c10abc3e455b75286ba4a1fbd56e18a5
|
||||
$ ugit cat-file 0e08b5e8c10abc3e455b75286ba4a1fbd56e18a5
|
||||
some file
|
||||
```
|
||||
|
||||
Note that the name of the file (bla) wasn't preserved as part of this process,
|
||||
because, again, the object database is just about storing bytes for later
|
||||
retrieval and it doesn't care which filename the bytes came from.
|
||||
17
how_to/Change_06.md
Normal file
17
how_to/Change_06.md
Normal file
@@ -0,0 +1,17 @@
|
||||
- data: Add types to objects
|
||||
|
||||
As we will soon see, there will be different logical types of objects that are
|
||||
used in different contexts (even though, from the Object Database's point of
|
||||
view, they are just all bytes). In order to lower the chance of using an object
|
||||
in the wrong context we're going to add a type tag for each object.
|
||||
|
||||
The type is just a string that's going to be prepended to the start of the file,
|
||||
followed by a null byte. When reading the file later we'll extract the type and
|
||||
verify that it's indeed the expected type.
|
||||
|
||||
The default type is going to be `blob`, since by default an object is a
|
||||
collection of bytes with no further semantic meaning.
|
||||
|
||||
We can also pass `expected=None` to `get_object()` if we don't want to verify
|
||||
the type. This is useful for the `cat-file` CLI command which is a debug command
|
||||
used for printing all objects.
|
||||
0
how_to/Change_07.md
Normal file
0
how_to/Change_07.md
Normal file
26
how_to/Change_08.md
Normal file
26
how_to/Change_08.md
Normal file
@@ -0,0 +1,26 @@
|
||||
- write-tree: List files
|
||||
|
||||
The next command is `write-tree`. This command will take the current working
|
||||
directory and store it to the object database. If `hash-object` was for storing
|
||||
an individual file, then `write-tree` is for storing a whole directory.
|
||||
|
||||
Like `hash-object`, `write-tree` is going to give us an OID after it's done and
|
||||
we'll be able to use the OID in order to retrieve the directory at a later time.
|
||||
|
||||
In Git's lingo a "tree" means a directory.
|
||||
|
||||
We'll get into the details in later changes, in this change we'll only prepare
|
||||
the code around the feature:
|
||||
|
||||
+ Create a `write-tree` CLI command
|
||||
|
||||
+ Create a `write_tree()` function in base module. Why in base module and not
|
||||
in data module? Because `write_tree()` is not going to write to disk directly
|
||||
but use the object database provided by data to store the directory. Hence it
|
||||
belongs to the higher-level base module.
|
||||
|
||||
+ Add code to `write_tree()` to print a directory recursively. For now nothing
|
||||
is written anywhere, but we just coded the boilerplate to recursively scan a
|
||||
directory.
|
||||
|
||||
We continue in the next change.
|
||||
8
how_to/Change_09.md
Normal file
8
how_to/Change_09.md
Normal file
@@ -0,0 +1,8 @@
|
||||
- write-tree: Ignore .ugit files
|
||||
|
||||
If we run `ugit write-tree`, we will see that it also prints the content of the
|
||||
.ugit directory. This directory isn't part of the user's files, so let's ignore
|
||||
it.
|
||||
|
||||
Actually, I created a separate `is_ignored()` function. This way if we have any
|
||||
other files we want to ignore later we have one place to change.
|
||||
12
how_to/Change_10.md
Normal file
12
how_to/Change_10.md
Normal file
@@ -0,0 +1,12 @@
|
||||
- write-tree: Hash the files
|
||||
|
||||
Instead of only printing the file name, let's put all files in the object
|
||||
database. For now we'll print their OID and their name.
|
||||
|
||||
Notice that instead of getting one OID to represent a directory we now get a
|
||||
separate OID for each file, which isn't very useful. Plus, note that the names
|
||||
of the files aren't stored in the object database, they are just printed and
|
||||
then the information is discarded.
|
||||
|
||||
So at this stage `write-tree` isn't useful (it just saves a bunch of files as
|
||||
blobs) but the next change will fix it.
|
||||
62
how_to/Change_11.md
Normal file
62
how_to/Change_11.md
Normal file
@@ -0,0 +1,62 @@
|
||||
- write-tree: Write tree objects
|
||||
|
||||
Now comes the fun part, where we turn a collection of separate files into a
|
||||
single object that represents a directory.
|
||||
|
||||
|
||||
The idea is that we will create one additional object that collects all the data
|
||||
necessary to store a complete directory. For example, if we have a directory
|
||||
with two files:
|
||||
```
|
||||
$ ls
|
||||
cats.txt dogs.txt
|
||||
```
|
||||
|
||||
And we want to save the directory, we will first put the individual files into
|
||||
the object database:
|
||||
```
|
||||
$ ugit hash-object cats.txt
|
||||
91a7b14a584645c7b995100223e65f8a5a33b707
|
||||
$ ugit hash-object dogs.txt
|
||||
fa958e0dd2203e9ad56853a3f51e5945dad317a4
|
||||
```
|
||||
|
||||
Then we will create a "tree" object that has the content of:
|
||||
```
|
||||
91a7b14a584645c7b995100223e65f8a5a33b707 cats.txt
|
||||
fa958e0dd2203e9ad56853a3f51e5945dad317a4 dogs.txt
|
||||
```
|
||||
|
||||
And we will put this tree object into the object database as well. Then the OID
|
||||
of the tree object will actually represent the entire directory! Why? Because we
|
||||
can first retrieve the tree object by its OID, then see all the files it
|
||||
contains (their names and OIDs) and then read all the OIDs of the files to get
|
||||
their actual content.
|
||||
|
||||
What if our directory contains other directories? We'll just create tree objects
|
||||
for them as well and we'll allow one tree object to point to another:
|
||||
```
|
||||
$ ls
|
||||
cats.txt dogs.txt other/
|
||||
$ ls other/
|
||||
shoes.jpg
|
||||
```
|
||||
|
||||
The root tree object will look like this:
|
||||
```
|
||||
blob 91a7b14a584645c7b995100223e65f8a5a33b707 cats.txt
|
||||
blob fa958e0dd2203e9ad56853a3f51e5945dad317a4 dogs.txt
|
||||
tree 53891a3c27b17e0f8fd96c058f968d19e340428d other
|
||||
```
|
||||
|
||||
Note that we added a type to each entry so that we know if it's a file or a
|
||||
directory. The tree that represents the "other" directory (OID 53891a3c27b17e0f8fd96c058f968d19e340428d) looks like:
|
||||
```
|
||||
blob 0aa186b09fd81e8cf449ba10eee6aff9711cc1ac shoes.jpg
|
||||
```
|
||||
We can think about this structure as a tree you know from Computer Science where
|
||||
each entries' OID as a pointer to either another tree or to a file (leaf node).
|
||||
|
||||
Note that we actually save the tree objects with type "tree" in
|
||||
`data.hash_object()` since we don't want the trees to be confused with regular
|
||||
files.
|
||||
28
how_to/Change_12.md
Normal file
28
how_to/Change_12.md
Normal file
@@ -0,0 +1,28 @@
|
||||
- read-tree: Extract tree from object
|
||||
|
||||
This command will take an OID of a tree and extract it to the working directory.
|
||||
Kind of the opposite of `write-tree`.
|
||||
|
||||
I divided the implementation into a few layers:
|
||||
|
||||
`_iter_tree_entries` is a generator that will take an OID of a tree, tokenize it
|
||||
line-by-line and yield the raw string values.
|
||||
|
||||
`get_tree` uses `_iter_tree_entries` to recursively parse a tree into a
|
||||
dictionary.
|
||||
|
||||
`read_tree` uses `get_tree` to get the file OIDs and writes them into the
|
||||
working directory.
|
||||
|
||||
Now we can actually save versions of the working directory! It's nothing like
|
||||
proper version control, but we can see that a super basic flow is possible:
|
||||
|
||||
+ Imagine you work on some code and you want to save a version.
|
||||
+ You run ```ugit write-tree```.
|
||||
+ You remember that OID that was printed out (write it on a post-it note or
|
||||
something :)).
|
||||
+ Continue working and repeat steps 2 and 3 as necessary.
|
||||
+ If you want to return to a previous version, use `ugit read-tree` to restore
|
||||
it to the working directory.
|
||||
|
||||
Is it convenient to use? No. But it's just the beginning!
|
||||
7
how_to/Change_13.md
Normal file
7
how_to/Change_13.md
Normal file
@@ -0,0 +1,7 @@
|
||||
- read-tree: Delete all existing stuff before reading
|
||||
|
||||
This is done so that we won't have any old files left around after a read-tree.
|
||||
|
||||
Before this change, if we save tree A which contains only `a.txt`, then we save
|
||||
tree B which contains `a.txt` and `b.txt` and then we `read-tree` A, we will
|
||||
have `b.txt` left over in the working directory.
|
||||
31
how_to/Change_14.md
Normal file
31
how_to/Change_14.md
Normal file
@@ -0,0 +1,31 @@
|
||||
- commit: Create commit
|
||||
|
||||
So far we were able to save versions of a directory (with `write-tree`), but
|
||||
without any additional context. In reality, when we save a snapshot we would
|
||||
like to attach data such as:
|
||||
+ Message describing it
|
||||
+ When the snapshot was created
|
||||
+ Who created the snapshot
|
||||
+ ...
|
||||
|
||||
We will create a new type of object called a "commit" that will store all this
|
||||
information. A commit will just be a text file stored in the object database
|
||||
with the type of `'commit'`.
|
||||
|
||||
The first lines in the commit will be key-values, then an empty line will mark
|
||||
the end of the key-values and then the commit message will follow. Like this:
|
||||
|
||||
```
|
||||
tree 5e550586c91fce59e0006799e0d46b3948f05693
|
||||
author Nikita Leshenko
|
||||
time 2019-09-14T09:31:09+00:00
|
||||
|
||||
This is the commit message!
|
||||
```
|
||||
|
||||
For now we'll just write the "tree" key and the commit message to the commit
|
||||
object.
|
||||
|
||||
We will create a new `ugit commit` command that will accept a commit message,
|
||||
snapshot the current directory using `ugit write-tree` and save the resulting
|
||||
object.
|
||||
10
how_to/Change_15.md
Normal file
10
how_to/Change_15.md
Normal file
@@ -0,0 +1,10 @@
|
||||
- commit: Record hash of last commit to HEAD
|
||||
|
||||
I would like to link new commits to older commits. Right now, if we make changes
|
||||
in the working directory and make periodic commits, each commit will be a
|
||||
standalone object, separate from all other commits. The motivation for linking
|
||||
them together is so that we can look at the commits as a series of snapshots in
|
||||
some order.
|
||||
|
||||
Before we can do it, let's record the OID of the last commit that we created.
|
||||
We'll call the last commit the "HEAD" and just put the OID in .ugit/HEAD file.
|
||||
23
how_to/Change_16.md
Normal file
23
how_to/Change_16.md
Normal file
@@ -0,0 +1,23 @@
|
||||
- commit: set parent to HEAD
|
||||
|
||||
When creating a new commit, we will use the HEAD to link the new commit to the
|
||||
previous commit. We'll call the previous commit the "parent commit" and we will
|
||||
save its OID in the "parent" key on the commit object.
|
||||
|
||||
For example, HEAD is currently bd0de093f1a0f90f54913d694a11cccf450bd990 and we
|
||||
create a new commit, the new commit will look like this in the object store:
|
||||
|
||||
```
|
||||
tree 50bed982245cd21e2798f179e0b032904398485b
|
||||
parent bd0de093f1a0f90f54913d694a11cccf450bd990
|
||||
|
||||
This is the commit message!
|
||||
```
|
||||
|
||||
The first commit in the repository will obviously have no parent.
|
||||
|
||||
Now we can retrieve the entire list of commits just by referencing the last
|
||||
commit! We can start from the HEAD, read the "parent" key on the HEAD commit and
|
||||
discover the commit before HEAD. Then read the parent of that commit, and go
|
||||
back on and on... This is basically a linked list implemented over the object
|
||||
database.
|
||||
12
how_to/Change_17.md
Normal file
12
how_to/Change_17.md
Normal file
@@ -0,0 +1,12 @@
|
||||
- log: Implement
|
||||
|
||||
`log` will walk the list of commits and print them.
|
||||
|
||||
We will start by implementing `get_commit()` that will parse a commit object by
|
||||
OID.
|
||||
|
||||
Then in the CLI module we will start from the HEAD commit and walk its parents
|
||||
until we reach a commit without a parent.
|
||||
|
||||
The result is that the entire commit history is printed to the screen once we
|
||||
run `ugit log`.
|
||||
5
how_to/Change_18.md
Normal file
5
how_to/Change_18.md
Normal file
@@ -0,0 +1,5 @@
|
||||
- log: Add oid parameter
|
||||
|
||||
Just a small cosmetic change: Instead of always printing the list of commits
|
||||
from HEAD, add an optional parameter to specify an alternative commit OID to
|
||||
start from. By default it will still be HEAD.
|
||||
90
how_to/Change_19.md
Normal file
90
how_to/Change_19.md
Normal file
@@ -0,0 +1,90 @@
|
||||
- checkout: Read tree and move HEAD
|
||||
|
||||
When given a commit OID, `ugit checkout` will "checkout" that commit, meaning
|
||||
that it will populate the working directory with the content of the commit and
|
||||
move HEAD to point to it.
|
||||
|
||||
This is a small but important change and it greatly expands the power of ugit in
|
||||
two ways.
|
||||
|
||||
First, it allows us to travel conveniently in history. If we've made a handful
|
||||
of commits and we would like to revisit a previous commit, we can now "checkout"
|
||||
that commit to the working directory, play with it (compile, run tests, read
|
||||
code, whatever we want) and checkout the latest commit again to resume working
|
||||
where we've left.
|
||||
|
||||
You might be wondering why `checkout` is needed when we could just use
|
||||
`read-tree`, and the answer is that moving HEAD in addition to reading the tree
|
||||
allows us to record which commit is checked out right now. If we would only use
|
||||
`read-tree` and later forget which commit we are looking at, we will see a bunch
|
||||
of files in the working directory and have no idea where they came from. On the
|
||||
other hand, if we use `checkout`, the commit will be recorded in HEAD and we can
|
||||
always know what we're looking at (by running `ugit log` for example and seeing the first entry).
|
||||
|
||||
The second way by which `checkout` expands the power of ugit is by allowing
|
||||
multiple branches of history. Let me explain: So far we have set HEAD to point
|
||||
to the latest commit that was created. It means that all our commits were
|
||||
linear, each new commit was added on top of the previous. The `checkout`
|
||||
command now allows us to move HEAD to any commit we wish. Then, new commits will
|
||||
be created on top of the current HEAD commit, which isn't necessarily the last
|
||||
created commit.
|
||||
|
||||
For example, imagine that we're working on some code. So far, we have created a
|
||||
few commits, represented by a graph:
|
||||
```
|
||||
o-----o-----o-----o
|
||||
^ ^
|
||||
first commit HEAD
|
||||
```
|
||||
|
||||
Then we wanted to code a new feature. We created a few commits while working on
|
||||
the feature (new commits represented by @):
|
||||
```
|
||||
o-----o-----o-----o-----@-----@-----@
|
||||
^ ^
|
||||
first commit HEAD
|
||||
```
|
||||
|
||||
Now we have an alternative idea for implementing that feature. We would like to
|
||||
go back in time and try a different implementation, without throwing away the
|
||||
current implementation. We can remember the current HEAD and run `ugit checkout`
|
||||
to go back in time, by providing the OID of the commit before the new feature
|
||||
was implemented (that OID can be discovered with `ugit log`).
|
||||
```
|
||||
o-----o-----o-----o-----@-----@-----@
|
||||
^ ^
|
||||
first commit HEAD
|
||||
```
|
||||
|
||||
The working directory will effectively go back in time. We can start working on
|
||||
an alternative implementation and create new commit. The new commits will be on
|
||||
top of HEAD and look like this (represented by $):
|
||||
```
|
||||
o-----o-----o-----o-----@-----@-----@
|
||||
^ \
|
||||
first commit ----$-----$
|
||||
^
|
||||
HEAD
|
||||
```
|
||||
|
||||
See how the history now contains two "branches". We can actually switch back and
|
||||
forth between them and work on them in parallel. Finally, we can checkout the
|
||||
preferred implementation and work from it on future code. Assuming that we liked
|
||||
the second branch, we'll just keep working from it, and future commits will look
|
||||
like this:
|
||||
```
|
||||
o-----o-----o-----o-----@-----@-----@
|
||||
^ \
|
||||
first commit ----$-----$-----o-----o-----o-----o-----o
|
||||
^
|
||||
HEAD
|
||||
```
|
||||
|
||||
Pretty useful, right? We've just introduced a simple form of branching history.
|
||||
Note that something pretty cool happened here: The implementation of checkout is
|
||||
very simple (we just call `read_tree` and update HEAD) but the implications of
|
||||
checkout are quite big - we can suddenly have a branching workflow which might
|
||||
look complicated but it is actually a direct consequence of what we implemented
|
||||
in previous changes. This is why I believe learning Git internals from the
|
||||
bottom up is useful - we can see how simple concepts compose into complicated
|
||||
functionality.
|
||||
41
how_to/Change_20.md
Normal file
41
how_to/Change_20.md
Normal file
@@ -0,0 +1,41 @@
|
||||
- tag: Implement CLI command
|
||||
|
||||
Now that we have branching history we have some OIDs we need to keep track of.
|
||||
Assume we have two branches (continuing from the example we had for `checkout`):
|
||||
```
|
||||
o-----o-----o-----o-----@-----@-----@
|
||||
^ \ ^
|
||||
first commit ----$-----$ 6c9f80a187ba39b4...
|
||||
^
|
||||
d8d43b0e3a21df0c...
|
||||
```
|
||||
|
||||
If we want to switch back and forth between the two "branches" with `checkout`,
|
||||
we need to remember both OIDs, which are quite long.
|
||||
|
||||
To make our lives easier, let's implement a command to attach a name to an OID.
|
||||
Then we'll be able to refer to the OID by that name.
|
||||
|
||||
The end result will look like this:
|
||||
```
|
||||
$ # Make some changes
|
||||
...
|
||||
$ ugit commit
|
||||
d8d43b0e3a21df0c845e185d08be8e4028787069
|
||||
$ ugit tag my-cool-commit d8d43b0e3a21df0c845e185d08be8e4028787069
|
||||
$ # Make more changes
|
||||
...
|
||||
$ ugit commit
|
||||
e549f09bbd08a8a888110b07982952e17e8c9669
|
||||
|
||||
$ ugit checkout my-cool-commit
|
||||
or
|
||||
$ ugit checkout d8d43b0e3a21df0c845e185d08be8e4028787069
|
||||
```
|
||||
|
||||
The last two commands are equivalent, because "my-cool-commit" is a tag that
|
||||
points to d8d43b0e3a21df0c845e185d08be8e4028787069.
|
||||
|
||||
We will implement this in a few steps. The first step is to create a CLI
|
||||
commmand that call the relevant command in the base module. The base module does
|
||||
nothing at this stage.
|
||||
23
how_to/Change_21.md
Normal file
23
how_to/Change_21.md
Normal file
@@ -0,0 +1,23 @@
|
||||
- tag: Generalize HEAD to refs
|
||||
|
||||
As part of implementing `tag`, we'll generalize the way we handle HEAD. If you
|
||||
think about it, HEAD and tags are similar. They are both ways for ugit to attach
|
||||
a name to an OID. In case of HEAD, the name is hardcoded by ugit; in case of
|
||||
tags, the name will be provided by the user. It makes sense to handle them
|
||||
similarly in *data.py*.
|
||||
|
||||
In *data.py*, let's extend the function `set_HEAD` and `get_HEAD` to
|
||||
`update_ref` and `get_ref`. "Ref" is a short for reference, and that's the name
|
||||
Git uses. The function will now accept the name of the ref and write/read it as
|
||||
a file under *.ugit* directory. Logically, a ref is a named pointer to an object.
|
||||
|
||||
The important change is in *data.py*. The rest of the changes just rename some
|
||||
functions:
|
||||
|
||||
```
|
||||
- get_HEAD() -> get_ref('HEAD')
|
||||
- set_HEAD(oid) -> update_ref('HEAD', oid)
|
||||
```
|
||||
|
||||
Note that we didn't change any behaviour of ugit here, this is purely
|
||||
refactoring.
|
||||
28
how_to/Change_22.md
Normal file
28
how_to/Change_22.md
Normal file
@@ -0,0 +1,28 @@
|
||||
- tag: Create the tag ref
|
||||
|
||||
After we've implemented refs in the previous change, it's time to create a ref
|
||||
when the user creates a tag.
|
||||
|
||||
`create_tag` now calls update_ref with the tag name to actually create the tag.
|
||||
|
||||
For namespacing purposes, we'll put all tags under *refs/tags/*. That is, if the
|
||||
user creates *my-cool-commit* tag, we'll create *refs/tags/my-cool-commit* ref
|
||||
to point to the desired OID.
|
||||
|
||||
Then we'll update *data.py* to handle this "namespaced" ref. Since we can't have
|
||||
a / in the file name, we'll create directories for it. Now if a ref
|
||||
*refs/tags/sometag* is created, it will be placed under *.ugit/refs/tags* in a
|
||||
file named *sometag*.
|
||||
|
||||
To verify that this code works, you can run:
|
||||
```
|
||||
$ ugit tag test
|
||||
```
|
||||
|
||||
And make sure that the tag points to HEAD:
|
||||
```
|
||||
$ cat .ugit/refs/tags/test
|
||||
$ cat .ugit/HEAD
|
||||
```
|
||||
|
||||
The last two commands should give the same output.
|
||||
22
how_to/Change_23.md
Normal file
22
how_to/Change_23.md
Normal file
@@ -0,0 +1,22 @@
|
||||
- tag: Resolve name to oid in argparse
|
||||
|
||||
It's nice that we can create tags, but now let's actually make them usable from
|
||||
the CLI.
|
||||
|
||||
In *base.py*, we'll create `get_oid` to resolve a "name" to an OID. A name can
|
||||
either be a ref (in which case `get_oid` will return the OID that the ref points
|
||||
to) or an OID (in which case `get_oid` will just return that same OID).
|
||||
|
||||
Next, we'll modify the argument parser in *cli.py* to call `get_oid` on all
|
||||
arguments which are expected to be an OID. This way we can pass a ref there
|
||||
instead of an OID.
|
||||
|
||||
At this point we can do something like:
|
||||
```
|
||||
$ ugit tag mytag d8d43b0e3a21df0c845e185d08be8e4028787069
|
||||
$ ugit log refs/tags/mytag
|
||||
# Will print log of commits starting at d8d43b0e...
|
||||
$ ugit checkout refs/tags/mytag
|
||||
# Will checkout commit d8d43b0e...
|
||||
etc...
|
||||
```
|
||||
18
how_to/Change_24.md
Normal file
18
how_to/Change_24.md
Normal file
@@ -0,0 +1,18 @@
|
||||
- base: Try different directories when searching for a ref
|
||||
|
||||
In the previous change, you might have noticed that we need to spell out the
|
||||
full name of a tag (Like *refs/tags/mytag*). This isn't very convenient, we
|
||||
would like to have shorter command names. For example, if we've created "mytag"
|
||||
tag, we should be able to do `ugit log mytag` rather than having to specify
|
||||
`ugit log refs/tags/mytag`.
|
||||
|
||||
We'll extend `get_oid` to search in different ref subdirectories when resolving
|
||||
a name. We'll search in:
|
||||
```
|
||||
Root (.ugit): This way we can specify refs/tags/mytag
|
||||
.ugit/refs: This way we can specify tags/mytag
|
||||
.ugit/refs/tags: This way we can specify mytag
|
||||
.ugit/refs/heads: This will be needed for a future change
|
||||
```
|
||||
If we find the requested name in any of the directories, return it. Otherwise
|
||||
assume that the name is an OID.
|
||||
12
how_to/Change_25.md
Normal file
12
how_to/Change_25.md
Normal file
@@ -0,0 +1,12 @@
|
||||
- cli: pass HEAD by default in argparse
|
||||
|
||||
First, make "@" be an alias for HEAD. (Implemented in `get_oid`)
|
||||
|
||||
Second, do a little refactoring in *cli.py*. Some commands accept an optional
|
||||
OID argument and if the argument isn't provided it defaults to HEAD. For example
|
||||
`git log` can get an OID to start logging from, but by default it logs all
|
||||
commits before HEAD.
|
||||
|
||||
Instead of having each command implement this logic, let's just make "@" (HEAD)
|
||||
be the default value for those commands. The relevant commands at this stage
|
||||
are `log` and `tag`. More will follow.
|
||||
14
how_to/Change_26.md
Normal file
14
how_to/Change_26.md
Normal file
@@ -0,0 +1,14 @@
|
||||
- k: Print refs
|
||||
|
||||
Now that we have refs and a potentially branching commit history, it's a good
|
||||
idea to create a visualization tool to see all the mess that we've created.
|
||||
|
||||
The visualization tool will draw all refs and all the commits pointed by the refs.
|
||||
|
||||
Our command to run the tool will be called `ugit k`, similar to `gitk` (which is
|
||||
a graphical visualization tool for Git).
|
||||
|
||||
We'll create a new `k` command in *cli.py*. We'll create `iter_refs` which is a
|
||||
generator which will iterate on all available refs (it will return HEAD from the
|
||||
ugit root directory and everything under *.ugit/refs*). As a first step, let's
|
||||
just print all refs when running `k`.
|
||||
21
how_to/Change_27.md
Normal file
21
how_to/Change_27.md
Normal file
@@ -0,0 +1,21 @@
|
||||
- k: Iterate commits and parents
|
||||
|
||||
In addition to printing the refs, we'll also print all OIDs that are reachable
|
||||
from those refs. We'll create `iter_commits_and_parents`, which is a generator
|
||||
that returns all commits that it can reach from a given set of OIDs.
|
||||
|
||||
Note that `iter_commits_and_parents` will return an OID once, even if it's
|
||||
reachable from multiple refs. Here, for example:
|
||||
```
|
||||
o<----o<----o<----o<----@<----@<----@
|
||||
^ \ ^
|
||||
first commit -<--$<----$ refs/tags/tag1
|
||||
^
|
||||
refs/tags/tag2
|
||||
```
|
||||
|
||||
We can reach the first commit by following the parents of *tag1* or by following
|
||||
the parents of *tag2*. Yet if we call `iter_commits_and_parents({tag1, tag2})`,
|
||||
the first commit will be yielded only once. This property will be useful later.
|
||||
|
||||
(Note that nothing is visualized yet, we're preparing for that.)
|
||||
18
how_to/Change_28.md
Normal file
18
how_to/Change_28.md
Normal file
@@ -0,0 +1,18 @@
|
||||
- k: Render graph
|
||||
|
||||
`k` is supposed to be a visualization tool, but so far we've just printed a
|
||||
bunch of OIDs... Now comes the visualization part!
|
||||
|
||||
There's a convenient file format called "dot" that can describe a graph. This is
|
||||
a textual format. We'll generate a graph of all commits and refs in dot format
|
||||
and then visualize it using the "dot" utility that comes with Graphviz.
|
||||
|
||||
(If you're unfamiliar with dot or Graphviz please look it up online.)
|
||||
|
||||
The graph will contain a node for each commit, that points to the parent commit.
|
||||
The graph will also contain a node for each ref, which points to the relevant
|
||||
commit.
|
||||
|
||||
At this point, `ugit k` is fully functional and I encourage you to play with it.
|
||||
Create a crazy branching history and a bunch of tags and see for yourself that
|
||||
`ugit k` can draw all that visually.
|
||||
9
how_to/Change_29.md
Normal file
9
how_to/Change_29.md
Normal file
@@ -0,0 +1,9 @@
|
||||
- log: Use `iter_commits_and_parents`
|
||||
|
||||
Refactoring ahead! Since we have `iter_commits_and_parents` from `k`, let's also
|
||||
use this function in `log`. We'll need to adjust it a bit to use
|
||||
`collections.deque` instead of a set so that the order of commits is deterministic.
|
||||
|
||||
This generalization might seem unneeded at this point, but it will be useful
|
||||
later. (Note for the advanced folks: When we implement merge commits that have
|
||||
multiple parents, this generic way to iterate will come in handy.)
|
||||
82
how_to/Change_30.md
Normal file
82
how_to/Change_30.md
Normal file
@@ -0,0 +1,82 @@
|
||||
- branch: Create new branch
|
||||
|
||||
Tags were an improvement since they freed us from the burden of remembering OIDs
|
||||
directly. But they are still somewhat inconvenient, since they are static. Let
|
||||
me illustrate:
|
||||
```
|
||||
o-----o-----o-----o-----o-----o-----o
|
||||
\ ^
|
||||
----o-----o tag2,HEAD
|
||||
^
|
||||
tag1
|
||||
```
|
||||
|
||||
If we have the above situation, we can easily flip between *tag1* and *tag2* with
|
||||
`checkout`. But what happens if we do
|
||||
|
||||
- ugit checkout tag2
|
||||
- Make some changes
|
||||
- ugit commit?
|
||||
|
||||
Now it looks like this:
|
||||
```
|
||||
o-----o-----o-----o-----o-----o-----o-----o
|
||||
\ ^ ^
|
||||
----o-----o tag2 HEAD
|
||||
^
|
||||
tag1
|
||||
```
|
||||
|
||||
The upper branch has advanced, but *tag2* still points to the previous commit.
|
||||
This is by design, since tags are supposed to just name a specific OID. So if we
|
||||
want to remember the new HEAD position we need to create another tag.
|
||||
|
||||
But now let's create a ref that will "move forward" as the branch grows. Just
|
||||
like we have `ugit tag`, we'll create `ugit branch` that will point a branch to
|
||||
a specific OID. This time the ref will be created under *refs/heads*.
|
||||
|
||||
At this stage, `branch` doesn't look any different from tag (the only difference
|
||||
is that the branch is created under *refs/heads* rather than *refs/tags*). But
|
||||
the magic will happen once we try to `checkout` a branch.
|
||||
|
||||
So far when we checkout anything we update HEAD to point to the OID that we've
|
||||
just checked out. But if we checkout a branch by name, we'll do something
|
||||
different, we will update HEAD to point to the **name of the branch!** Assume
|
||||
that we have a branch here:
|
||||
```
|
||||
o-----o-----o-----o-----o-----o-----o
|
||||
\ ^
|
||||
----o-----o tag2,branch2
|
||||
^
|
||||
tag1
|
||||
```
|
||||
|
||||
Running `ugit checkout branch2` will create the following situation:
|
||||
```
|
||||
o-----o-----o-----o-----o-----o-----o
|
||||
\ ^
|
||||
----o-----o tag2,branch2 <--- HEAD
|
||||
^
|
||||
tag1
|
||||
```
|
||||
|
||||
You see? HEAD points to *branch2* rather than the OID of the commit directly.
|
||||
Now if we create another commit, ugit will update HEAD to point to the latest
|
||||
commit (just like it does every time) but as a side effect it will also update
|
||||
*branch2* to point to the latest commit.
|
||||
```
|
||||
o-----o-----o-----o-----o-----o-----o-----o
|
||||
\ ^ ^
|
||||
----o-----o tag2 branch2 <--- HEAD
|
||||
^
|
||||
tag1
|
||||
```
|
||||
|
||||
This way, if we checkout a branch and create some commits on top of it, the ref
|
||||
will always point to the latest commit.
|
||||
|
||||
But right now HEAD (or any ref for that matter) may only point to an OID. It
|
||||
can't point to another ref, like I described above. So our next step would be
|
||||
to implement this concept. To mirror Git's terminology, we will call a ref that
|
||||
points to another ref a "symbolic ref". Please see the next change for an
|
||||
implementation of symbolic refs.
|
||||
5
how_to/Change_31.md
Normal file
5
how_to/Change_31.md
Normal file
@@ -0,0 +1,5 @@
|
||||
- data: Implement symbolic refs idea
|
||||
|
||||
If the file that represents a ref contains an OID, we'll assume that the ref
|
||||
points to an OID. If the file contains the content `ref: <refname>`, we'll
|
||||
assume that the ref points to `<refname>` and we will dereference it recursively.
|
||||
8
how_to/Change_32.md
Normal file
8
how_to/Change_32.md
Normal file
@@ -0,0 +1,8 @@
|
||||
- data: Create Refvalue container
|
||||
|
||||
To make working with symbolic refs easier, we will create a `Refvalue` container
|
||||
to represent the value of a ref. `Refvalue` will have a property symbolic that
|
||||
will say whether it's a symbolic or a direct ref.
|
||||
|
||||
This change is just refactoring, we will wrap every OID that is written or read
|
||||
from a ref in a `RefValue`.
|
||||
17
how_to/Change_33.md
Normal file
17
how_to/Change_33.md
Normal file
@@ -0,0 +1,17 @@
|
||||
data: Dereference refs when reading and writing
|
||||
|
||||
Now we'll dereference symbolic refs not only when reading them but also when
|
||||
writing them.
|
||||
|
||||
We'll implement a helper function called `_get_ref_internal` which will return
|
||||
the path and the value of the last ref pointed by a symbolic ref. In simple words:
|
||||
|
||||
- When given a non-symbolic ref, `_get_ref_internal` will return the ref name
|
||||
and value.
|
||||
- When given a symbolic ref, `_get_ref_internal` will dereference the ref
|
||||
recursively, and then return the name of the last (non-symbolic) ref that points
|
||||
to an OID, plus its value.
|
||||
|
||||
Now `update_ref` will use `_get_ref_internal` to know which ref it needs to update.
|
||||
|
||||
Additionally, we'll use `_get_ref_internal` in `get_ref`.
|
||||
15
how_to/Change_34.md
Normal file
15
how_to/Change_34.md
Normal file
@@ -0,0 +1,15 @@
|
||||
- data: Don't always dereference refs (for `ugit k`)
|
||||
|
||||
Actually, it's not always desirable to dereference a ref all the way. Sometimes
|
||||
we would like to know at which ref a symbolic ref points, rather than the final
|
||||
OID. Or we would like to update a ref directly, rather then updating the last
|
||||
ref in the chain.
|
||||
|
||||
One such usecase is `ugit k`. When visualizing refs it would be nice to see
|
||||
which ref points to which ref. We will see another usecase soon.
|
||||
|
||||
To accomodate this, we will add a `deref` option to `get_ref`, `iter_refs` and
|
||||
`update_ref`. If they will be called with `deref=False`, they will work on the
|
||||
raw value of a ref and not dereference any symbolic refs.
|
||||
|
||||
Then we will update `k` to use `deref=False`.
|
||||
176
ugit/base.py
Normal file
176
ugit/base.py
Normal file
@@ -0,0 +1,176 @@
|
||||
import itertools
|
||||
import operator
|
||||
import os
|
||||
import string
|
||||
|
||||
from collections import deque, namedtuple
|
||||
from pathlib import Path, PurePath
|
||||
|
||||
from . import data
|
||||
|
||||
|
||||
def write_tree(directory="."):
|
||||
entries = []
|
||||
with Path.iterdir(directory) as it:
|
||||
for entry in it:
|
||||
full = f"{directory}/{entry.name}"
|
||||
if is_ignored(full):
|
||||
continue
|
||||
if entry.is_file(follow_symlinks=False):
|
||||
type_ = "blob"
|
||||
with open(full, "rb") as f:
|
||||
oid = data.hash_object(f.read())
|
||||
elif entry.is_dir(follow_symlinks=False):
|
||||
type_ = "tree"
|
||||
oid = write_tree(full)
|
||||
entries.append((entry.name, oid, type_))
|
||||
|
||||
tree = "".join(f"{type_} {oid} {name}\n" for name, oid, type_ in sorted(entries))
|
||||
|
||||
return data.hash_object(tree.encode(), "tree")
|
||||
|
||||
|
||||
def _iter_tree_entries(oid):
|
||||
if not oid:
|
||||
return
|
||||
tree = data.get_object(oid, "tree")
|
||||
for entry in tree.decode().splitlines():
|
||||
type_, oid, name = entry.split(" ", 2)
|
||||
yield type_, oid, name
|
||||
|
||||
|
||||
def get_tree(oid, base_path=""):
|
||||
result = {}
|
||||
for type_, oid, name in _iter_tree_entries(oid):
|
||||
assert "/" not in name
|
||||
assert name not in ("..", ".")
|
||||
path = base_path + name
|
||||
if type_ == "blob":
|
||||
result[path] = oid
|
||||
elif type_ == "tree":
|
||||
result.update(get_tree(oid, f"{path}/"))
|
||||
else:
|
||||
assert False, f"Unknown tree entry {type_}"
|
||||
return result
|
||||
|
||||
|
||||
def _empty_current_directory():
|
||||
for root, dirnames, filenames in os.walk(".", topdown=False):
|
||||
for filename in filenames:
|
||||
path = PurePath.relative_to(f"{root}/{filename}")
|
||||
if is_ignored(path) or not Path.is_file(path):
|
||||
continue
|
||||
Path.unlink(path)
|
||||
for dirname in dirnames:
|
||||
path = PurePath.relative_to(f"{root}/{dirname}")
|
||||
if is_ignored(path):
|
||||
continue
|
||||
try:
|
||||
Path.rmdir(path)
|
||||
except (FileNotFoundError, OSError):
|
||||
# Deletion might fail if the directory contains ignored files,
|
||||
# so it's OK
|
||||
pass
|
||||
|
||||
|
||||
def read_tree(tree_oid):
|
||||
_empty_current_directory()
|
||||
for path, oid in get_tree(tree_oid, base_path="./").items():
|
||||
Path.mkdir(PurePath.parent(path), exist_ok=True)
|
||||
with open(path, "wb") as f:
|
||||
f.write(data.get_object(oid))
|
||||
|
||||
|
||||
def commit(message):
|
||||
commit = f"tree {write_tree()}\n"
|
||||
|
||||
HEAD = data.get_ref("HEAD").value
|
||||
if HEAD:
|
||||
commit += f"parent {HEAD}\n"
|
||||
|
||||
commit += "\n"
|
||||
commit += f"{message}\n"
|
||||
|
||||
oid = data.hash_object(commit.encode(), "commit")
|
||||
|
||||
data.update_ref("HEAD", data.RefValue(symbolic=False, value=oid))
|
||||
|
||||
return oid
|
||||
|
||||
|
||||
def create_tag(name, oid):
|
||||
data.update_ref(f"refs/tags/{name}", data.RefValue(symbolic=False, value=oid))
|
||||
|
||||
|
||||
def checkout(oid):
|
||||
commit = get_commit(oid)
|
||||
read_tree(commit.tree)
|
||||
data.update_ref("HEAD", data.RefValue(symbolic=False, value=oid))
|
||||
|
||||
|
||||
def create_branch(name, oid):
|
||||
data.update_ref(f"refs/heads/{name}", data.RefValue(symbolic=False, value=oid))
|
||||
|
||||
|
||||
Commit = namedtuple("Commit", ["tree", "parent", "message"])
|
||||
|
||||
|
||||
def get_commit(oid):
|
||||
parent = None
|
||||
|
||||
commit = data.get_object(oid, "commit").decode()
|
||||
lines = iter(commit.splitlines())
|
||||
for line in itertools.takewhile(operator.truth, lines):
|
||||
key, value = line.split(" ", 1)
|
||||
if key == "tree":
|
||||
tree = value
|
||||
elif key == "parent":
|
||||
parent = value
|
||||
else:
|
||||
assert False, f"Unknown field {key}"
|
||||
|
||||
message = "\n".join(lines)
|
||||
return Commit(tree=tree, parent=parent, message=message)
|
||||
|
||||
|
||||
def iter_commits_and_parents(oids):
|
||||
oids = deque(oids)
|
||||
visited = set()
|
||||
|
||||
while oids:
|
||||
oid = oids.popleft()
|
||||
if not oid or oid in visited:
|
||||
continue
|
||||
visited.add(oid)
|
||||
yield oid
|
||||
|
||||
commit = get_commit(oid)
|
||||
# Return parent next
|
||||
oids.appendleft(commit.parent)
|
||||
|
||||
|
||||
def get_oid(name):
|
||||
if name == "@":
|
||||
name = "HEAD"
|
||||
|
||||
# Name is ref
|
||||
refs_to_try = [
|
||||
f"{name}",
|
||||
f"refs/{name}",
|
||||
f"refs/tags/{name}",
|
||||
f"refs/heads/{name}",
|
||||
]
|
||||
for ref in refs_to_try:
|
||||
if data.get_ref(ref, deref=False).value:
|
||||
return data.get_ref(ref).value
|
||||
|
||||
# Name is SHA1
|
||||
is_hex = all(c in string.hexdigits for c in name)
|
||||
if len(name) == 40 and is_hex:
|
||||
return name
|
||||
|
||||
assert False, f"Unknown name {name}"
|
||||
|
||||
|
||||
def is_ignored(path):
|
||||
return ".ugit" in path.split("/")
|
||||
142
ugit/cli.py
142
ugit/cli.py
@@ -1,2 +1,142 @@
|
||||
import argparse
|
||||
import subprocess
|
||||
import sys
|
||||
import textwrap
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from . import base
|
||||
from . import data
|
||||
|
||||
|
||||
def main():
|
||||
print("Hello, World!")
|
||||
args = parse_args()
|
||||
args.func(args)
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
commands = parser.add_subparsers(dest="command")
|
||||
commands.required = True
|
||||
|
||||
oid = base.get_oid
|
||||
|
||||
init_parser = commands.add_parser("init")
|
||||
init_parser.set_defaults(func=init)
|
||||
|
||||
hash_object_parser = commands.add_parser("hash-object")
|
||||
hash_object_parser.set_defaults(func=hash_object)
|
||||
hash_object_parser.add_argument("file")
|
||||
|
||||
cat_file_parser = commands.add_parser("cat-file")
|
||||
cat_file_parser.set_defaults(func=cat_file)
|
||||
cat_file_parser.add_argument("object", type=oid)
|
||||
|
||||
write_tree_parser = commands.add_parser("write-tree")
|
||||
write_tree_parser.set_defaults(func=write_tree)
|
||||
|
||||
read_tree_parser = commands.add_parser("read-tree")
|
||||
read_tree_parser.set_defaults(func=read_tree)
|
||||
read_tree_parser.add_argument("tree", type=oid)
|
||||
|
||||
commit_parser = commands.add_parser("commit")
|
||||
commit_parser.set_defaults(func=commit)
|
||||
commit_parser.add_argument("-m", "--message", required=True)
|
||||
|
||||
log_parser = commands.add_parser("log")
|
||||
log_parser.set_defaults(func=log)
|
||||
log_parser.add_argument("oid", default="@", type=oid, nargs="?")
|
||||
|
||||
checkout_parser = commands.add_parser("checkout")
|
||||
checkout_parser.set_defaults(func=checkout)
|
||||
checkout_parser.add_argument("oid", type=oid)
|
||||
|
||||
tag_parser = commands.add_parser("tag")
|
||||
tag_parser.set_defaults(func=tag)
|
||||
tag_parser.add_argument("name")
|
||||
tag_parser.add_argument("oid", default="@", type=oid, nargs="?")
|
||||
|
||||
branch_parser = commands.add_parser("branch")
|
||||
branch_parser.set_defaults(func=branch)
|
||||
branch_parser.add_argument("name")
|
||||
branch_parser.add_argument("start_point", default="@", type=oid, nargs="?")
|
||||
|
||||
k_parser = commands.add_parser("k")
|
||||
k_parser.set_defaults(func=k)
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def init(args):
|
||||
data.init()
|
||||
print(f"Initialized empty ugit repository in {Path.cwd()}/{data.GIT_DIR}")
|
||||
|
||||
|
||||
def hash_object(args):
|
||||
with open(args.file, "rb") as f:
|
||||
print(data.hash_object(f.read()))
|
||||
|
||||
|
||||
def cat_file(args):
|
||||
sys.stdout.flush()
|
||||
sys.stdout.buffer.write(data.get_object(args.object), expected=None)
|
||||
|
||||
|
||||
def write_tree(args):
|
||||
print(base.write_tree())
|
||||
|
||||
|
||||
def read_tree(args):
|
||||
base.read_tree(args.tree)
|
||||
|
||||
|
||||
def commit(args):
|
||||
print(base.commit(args.message))
|
||||
|
||||
|
||||
def log(args):
|
||||
for oid in base.iter_commits_and_parents({args.oid}):
|
||||
commit = base.get_commit(oid)
|
||||
|
||||
print(f"commit {oid}\n")
|
||||
print(textwrap.indent(commit.message, " "))
|
||||
print("")
|
||||
|
||||
|
||||
def checkout(args):
|
||||
base.checkout(args.oid)
|
||||
|
||||
|
||||
def tag(args):
|
||||
base.create_tag(args.name, args.oid)
|
||||
|
||||
|
||||
def branch(args):
|
||||
base.create_branch(args.name, args.start_point)
|
||||
print(f"Branch {args.name} created at {args.start_point[:10]}")
|
||||
|
||||
|
||||
def k(args):
|
||||
dot = "digraph commits {\n"
|
||||
|
||||
oids = set()
|
||||
for refname, ref in data.iter_refs(deref=False):
|
||||
dot += f"'{refname}' [shape=note]\n"
|
||||
dot += f"'{refname}' -> '{ref.value}'\n"
|
||||
if not ref.symbolic:
|
||||
oids.add(ref.value)
|
||||
|
||||
for oid in base.iter_commits_and_parents(oids):
|
||||
commit = base.get_commit(oid)
|
||||
dot += f"'{oid}' [shape=box style=filled label='{oid[:10]}']\n"
|
||||
if commit.parent:
|
||||
dot += f"'{oid}' -> '{commit.parent}'\n"
|
||||
|
||||
dot += "}"
|
||||
print(dot)
|
||||
|
||||
with subprocess.Popen(
|
||||
["dot", "-Tgtk", "/dev/stdin"], stdin=subprocess.PIPE
|
||||
) as proc:
|
||||
proc.communicate(dot.encode())
|
||||
|
||||
74
ugit/data.py
Normal file
74
ugit/data.py
Normal file
@@ -0,0 +1,74 @@
|
||||
from pathlib import Path, PurePath
|
||||
|
||||
import hashlib
|
||||
import os
|
||||
|
||||
from collections import namedtuple
|
||||
|
||||
GIT_DIR = ".ugit"
|
||||
|
||||
|
||||
def init():
|
||||
Path.mkdir(GIT_DIR)
|
||||
Path.mkdir(f"{GIT_DIR}/objects")
|
||||
|
||||
|
||||
RefValue = namedtuple("RefValue", ["symbolic", "value"])
|
||||
|
||||
|
||||
def update_ref(ref, value, deref=True):
|
||||
assert not value.symbolic
|
||||
ref = _get_ref_internal(ref, deref)[0]
|
||||
ref_path = f"{GIT_DIR}/{ref}"
|
||||
Path.mkdir(ref_path, exist_ok=True)
|
||||
with open(ref_path, "w") as f:
|
||||
f.write(value.value)
|
||||
|
||||
|
||||
def get_ref(ref):
|
||||
return _get_ref_internal(ref)[1]
|
||||
|
||||
|
||||
def _get_ref_internal(ref):
|
||||
ref_path = f"{GIT_DIR}/{ref}"
|
||||
value = None
|
||||
if Path.is_file(ref_path):
|
||||
with open(ref_path) as f:
|
||||
value = f.read().strip()
|
||||
|
||||
symbolic = bool(value) and value.startswith("ref")
|
||||
if symbolic:
|
||||
value = value.split(":", 1)[1].strip()
|
||||
return _get_ref_internal(value)
|
||||
|
||||
return ref, RefValue(symbolic=False, value=value)
|
||||
|
||||
|
||||
def iter_refs():
|
||||
refs = ["HEAD"]
|
||||
for root, _, filenames in Path.walk(f"{GIT_DIR}/refs"):
|
||||
root = PurePath.relative_to(root, GIT_DIR)
|
||||
refs.extend(f"{root}/{name}" for name in filenames)
|
||||
|
||||
for refname in refs:
|
||||
yield refname, get_ref(refname)
|
||||
|
||||
|
||||
def hash_object(data, type_="blob"):
|
||||
obj = type_.encode() + b"\x00" + data
|
||||
oid = hashlib.sha1(obj).hexdigest()
|
||||
with open(f"{GIT_DIR}/objects/{oid}", "wb") as out:
|
||||
out.write(obj)
|
||||
return oid
|
||||
|
||||
|
||||
def get_object(oid, expected="blob"):
|
||||
with open(f"{GIT_DIR}/objects/{oid}", "rb") as f:
|
||||
obj = f.read()
|
||||
|
||||
type_, _, content = obj.partition(b"\x00")
|
||||
type_ = type_.decode()
|
||||
|
||||
if expected is not None:
|
||||
assert type_ == expected, f"Expected {expected}, got {type_}"
|
||||
return content
|
||||
Reference in New Issue
Block a user