- hash-object: Save object Let's create the first non-trivial command. This command will take a file and store it in our '.ugit' directory for later retrieval. In Git's lingo, this feature is called "the object database". It allows us to store and retrieve arbitrary blobs, which are called "objects". As far as the Object Database is concerned, the content of the object doesn't have any meaning (just like a filesystem doesn't care about the internal structure of a file). Because this command needs the '.ugit' directory, it must be run from the same directory where you did 'ugit init'. Note that this is a very low-level Git building block and we're not talking yet about versions or commits or any other things that you might have heard about, we're just talking about an interface for storing some raw bytes. So we can store an object, but how would we refer to it later? We could ask the user to provide a name along with the object and retrieve the object later using the name, but there is a nicer way: We can refer to the object using its hash. If you haven't heard about hashes and hash functions, I suggest that you pause and do some reading on it. In summary, a hash function can take a blob of arbitrary length and produce a small "fingerprint" with a fixed length. Some hash functions such as SHA-1 guarantee that different blobs are very very very likely to produce different fingerprints (so likely, that Git assumes it's guaranteed). Let's try some strings to see an example: ``` $ echo -n this is cool | sha1sum 60f51187e76a9de0ff3df31f051bde04da2da891 $ echo -n this is cooler | sha1sum f3c953b792f9ab39d1be0bdab7ab5f8350593004 ``` You can see that hashing the phrases "this is cool" and "this is cooler" gives completely different hashes even though the difference between the phrases is small. We're going to use the hash as the name of object (we'll call this name an "OID"* - object ID). So the flow of the command hash-object is: + Get the path of the file to store. + Read the file. + Hash the content of the file using SHA-1. + Store the file under ".ugit/objects/{the SHA-1 hash}". This type of storage is called content-addressable storage because the "address" that we use to find a blob is based on the content of the blob itself. (In contrast to name-addressable storage, such as a typical filesystem, where you address a particular file by its name, regardless of its content). Content-addressable storage has nice properties when synchronizing data between different computers - if two repositories have an object with the same OID we can be sure that they are the same object. Also since two different objects are practically guaranteed to have different OIDs, we can't have naming clashes between objects. When real Git stores objects it does a few extra things, such as writing the size of the object to the file as well, compressing them and dividing the objects into 256 directories. This is done to avoid having directories with huge number of files, which can hurt performance. We're not going to do this in ugit for simplicity.