Add change 04 instructions
This commit is contained in:
parent
1f7354666b
commit
c647f99e5c
63
how_to/Change_04.md
Normal file
63
how_to/Change_04.md
Normal file
@ -0,0 +1,63 @@
|
||||
- hash-object: Save object
|
||||
|
||||
Let's create the first non-trivial command. This command will take a file and
|
||||
store it in our '.ugit' directory for later retrieval. In Git's lingo, this
|
||||
feature is called "the object database". It allows us to store and retrieve
|
||||
arbitrary blobs, which are called "objects". As far as the Object Database is
|
||||
concerned, the content of the object doesn't have any meaning (just like a
|
||||
filesystem doesn't care about the internal structure of a file).
|
||||
|
||||
Because this command needs the '.ugit' directory, it must be run from the same
|
||||
directory where you did 'ugit init'.
|
||||
|
||||
Note that this is a very low-level Git building block and we're not talking yet
|
||||
about versions or commits or any other things that you might have heard about,
|
||||
we're just talking about an interface for storing some raw bytes.
|
||||
|
||||
So we can store an object, but how would we refer to it later? We could ask the
|
||||
user to provide a name along with the object and retrieve the object later using
|
||||
the name, but there is a nicer way: We can refer to the object using its hash.
|
||||
|
||||
If you haven't heard about hashes and hash functions, I suggest that you pause
|
||||
and do some reading on it. In summary, a hash function can take a blob of
|
||||
arbitrary length and produce a small "fingerprint" with a fixed length. Some
|
||||
hash functions such as SHA-1 guarantee that different blobs are very very very
|
||||
likely to produce different fingerprints (so likely, that Git assumes it's
|
||||
guaranteed). Let's try some strings to see an example:
|
||||
|
||||
```
|
||||
$ echo -n this is cool | sha1sum
|
||||
60f51187e76a9de0ff3df31f051bde04da2da891
|
||||
|
||||
$ echo -n this is cooler | sha1sum
|
||||
f3c953b792f9ab39d1be0bdab7ab5f8350593004
|
||||
```
|
||||
|
||||
You can see that hashing the phrases "this is cool" and "this is cooler" gives
|
||||
completely different hashes even though the difference between the phrases is
|
||||
small.
|
||||
|
||||
We're going to use the hash as the name of object (we'll call this name an
|
||||
"OID"* - object ID).
|
||||
|
||||
So the flow of the command hash-object is:
|
||||
|
||||
+ Get the path of the file to store.
|
||||
+ Read the file.
|
||||
+ Hash the content of the file using SHA-1.
|
||||
+ Store the file under ".ugit/objects/{the SHA-1 hash}".
|
||||
|
||||
This type of storage is called content-addressable storage because the "address"
|
||||
that we use to find a blob is based on the content of the blob itself. (In
|
||||
contrast to name-addressable storage, such as a typical filesystem, where you
|
||||
address a particular file by its name, regardless of its content).
|
||||
Content-addressable storage has nice properties when synchronizing data between
|
||||
different computers - if two repositories have an object with the same OID we
|
||||
can be sure that they are the same object. Also since two different objects are
|
||||
practically guaranteed to have different OIDs, we can't have naming clashes
|
||||
between objects.
|
||||
|
||||
When real Git stores objects it does a few extra things, such as writing the
|
||||
size of the object to the file as well, compressing them and dividing the
|
||||
objects into 256 directories. This is done to avoid having directories with huge
|
||||
number of files, which can hurt performance. We're not going to do this in ugit for simplicity.
|
Loading…
Reference in New Issue
Block a user