Why does the Fossil SCM use TEXT to store hashes?

698 views Asked by At

I was wondering how should I store a hash In the Fossil SCM, SHA1 hashes are stored as text of length 40.

CREATE TABLE blob(
  rid INTEGER PRIMARY KEY,
  rcvid INTEGER,
  size INTEGER,
  uuid TEXT UNIQUE NOT NULL,
  content BLOB,
  CHECK( length(uuid)==40 AND rid>0 )
);
sqlite> select * from blob;
1|1|169|6fc9d28454d4d070ca863bbbdbf9835f3505d585|
2|2|687|f59c73c1dbdea48cd2330d5a309445d756fc6901|
3|2|221|84ddeef14a657366246e6d9dcb11e2b3669cd896|
4|3|695|0311113ca8c18fb3e83c9e35e0e49e373c089f08|
5|3|224|5c577d268419caea733544ba5c81932beead3bf7|

To a layman like myself it seems inefficient each character needs 8 bits, and gives 4 (0-f). I also I found that the MySQL docs to agree with me

The size penalty for storing the hex string in a CHAR column is at least two times, up to eight times if the value is stored in a column that uses the utf8 character set (where each character uses 4 bytes). Storing the string also results in slower comparisons because of the larger values and the need to take character set collation rules into account.

Is it that this column is not used as a key, and thus its size isn't such a big deal? No sir! From src/content.c@content_put:475 we can see

db_prepare(&s1, "SELECT rid, size FROM blob WHERE uuid=%B", &hash);

The fossil developer is smarter than me, so the hash is probably somehow stored in a compact binary form, however I don't understand how exactly is that happening.

2

There are 2 answers

0
Elazar Leibovich On BEST ANSWER

The OP is right, it's inefficient. However it helps debugging the software, and it takes relatively small amount of space, so it's a compromise between developer convenience and efficiency.

3
Benoit On

Fossil does not rely on a MySQL database at all, but on an SQLite database. An SQLite database has weak typing.