I was wondering how should I store a hash In the Fossil SCM, SHA1 hashes are stored as text of length 40.
CREATE TABLE blob(
rid INTEGER PRIMARY KEY,
rcvid INTEGER,
size INTEGER,
uuid TEXT UNIQUE NOT NULL,
content BLOB,
CHECK( length(uuid)==40 AND rid>0 )
);
sqlite> select * from blob;
1|1|169|6fc9d28454d4d070ca863bbbdbf9835f3505d585|
2|2|687|f59c73c1dbdea48cd2330d5a309445d756fc6901|
3|2|221|84ddeef14a657366246e6d9dcb11e2b3669cd896|
4|3|695|0311113ca8c18fb3e83c9e35e0e49e373c089f08|
5|3|224|5c577d268419caea733544ba5c81932beead3bf7|
To a layman like myself it seems inefficient each character needs 8 bits, and gives 4 (0-f). I also I found that the MySQL docs to agree with me
The size penalty for storing the hex string in a CHAR column is at least two times, up to eight times if the value is stored in a column that uses the utf8 character set (where each character uses 4 bytes). Storing the string also results in slower comparisons because of the larger values and the need to take character set collation rules into account.
Is it that this column is not used as a key, and thus its size isn't such a big deal? No sir! From src/content.c@content_put:475
we can see
db_prepare(&s1, "SELECT rid, size FROM blob WHERE uuid=%B", &hash);
The fossil developer is smarter than me, so the hash is probably somehow stored in a compact binary form, however I don't understand how exactly is that happening.
The OP is right, it's inefficient. However it helps debugging the software, and it takes relatively small amount of space, so it's a compromise between developer convenience and efficiency.