My platform:
- Server Name: localhost/3050
- Server Version: WI-V3.0.7.33374 Firebird 3.0
- Server Implementation: Firebird/Windows/AMD/Intel/x64
- Service Version: 2
I have been told that record identifiers should be of integer type, because that gives the best select and join performance. So, for database apps in the past, I used a BIGINT as the primary key. This posses problems when migrating data from a production system (let's not go into that nightmare).
I then realized I needed to record primary key identifiers issued in non-sequential order, so I used GUIDs: CHAR(36) CHARACTER SET ASCII populated on insert as = UUID_TO_CHAR(GEN_UUID()). I did speed tests before I built my app (we're talking many years ago) using GUIDs as primary keys and the speed test showed GUIDs to be as fast as integers, so I though I'd be OK. The performance of the app was OK at first, but as the app matured, a couple years later it got slower and slower, and had to be re-written.
Now, I'm building a different project and I read somewhere that the solution to my problem (non-sequentially issued unique identifiers + top performance) was to use octets, populated on insert as GEN_UUID(). I built a test table with 5,000,000 records, and only 3 fields. MY_BIGINT BIGINT, MY_GUID CHAR(36) CHARACTER SET ASCII, MY_UUID CHAR(16) CHARACTER SET OCTETS. I put a primary key on MY_BIGINT and a unique on MY_GUID and MY_UUID. I find a value of each from one record somewhere near the end of the table and issue a SELECT ... WHERE on each of the fields, looking for one record.
My results show there is no difference between any of the field types when selecting. This goes against what I have read, and what I have personally experienced in the past. Does anyone have any experience using these different field types as primary keys and can you tell me for certain which field types provide the best performance for selects and joins? Should I use VARCHAR(16) OCTET or GUID CHAR(36) CHARACTER SET ASCII? I cannot use BIGINT or INTEGER.
gstat -u sysdba -p masterkey -a -t TBL1 "D:\app_vlt\db\fdb\SPEED_TEST.FDB"
gstat results of table:
Database "D:\APP_VLT\DB\FDB\SPEED_TEST.FDB"
Gstat execution time Sun Feb 11 07:17:39 2024
Database header page information:
Flags 0
Generation 821
System Change Number 0
Page size 16384
ODS version 12.0
Oldest transaction 710
Oldest active 711
Oldest snapshot 711
Next transaction 711
Sequence number 0
Next attachment ID 75
Implementation HW=AMD/Intel/x64 little-endian OS=Windows CC=MSVC
Shadow count 0
Page buffers 256
Next header page 0
Database dialect 3
Creation date Feb 10, 2024 10:56:30
Attributes force write
Variable header data:
Sweep interval: 20000
*END*
Database file sequence:
File D:\APP_VLT\DB\FDB\SPEED_TEST.FDB is the only file
Analyzing database pages ...
TBL1 (128)
Primary pointer page: 167, Index root page: 168
Pointer pages: 21, data page slots: 67208
Data pages: 67208, average fill: 77%
Primary pages: 67208, secondary pages: 0, swept pages: 0
Empty pages: 6, full pages: 67201
Fill distribution:
0 - 19% = 6
20 - 39% = 1
40 - 59% = 0
60 - 79% = 67201
80 - 99% = 0
Index PK_TBL1_UUID (0)
Root page: 5241, depth: 3, leaf buckets: 17496, nodes: 10000000
Average node length: 19.75, total dup: 0, max dup: 0
Average key length: 16.76, compression ratio: 0.95
Average prefix length: 2.24, average data length: 13.76
Clustering factor: 9999855, ratio: 1.00
Fill distribution:
0 - 19% = 40
20 - 39% = 0
40 - 59% = 5246
60 - 79% = 7738
80 - 99% = 4472
Index UNQ_MY_BIGINT (1)
Root page: 19342, depth: 3, leaf buckets: 6930, nodes: 10000000
Average node length: 11.22, total dup: 0, max dup: 0
Average key length: 8.22, compression ratio: 1.09
Average prefix length: 3.78, average data length: 5.22
Clustering factor: 67202, ratio: 0.01
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 6929
Index UNQ_MY_GUID (2)
Root page: 1614, depth: 3, leaf buckets: 32999, nodes: 10000000
Average node length: 36.85, total dup: 0, max dup: 0
Average key length: 33.86, compression ratio: 1.06
Average prefix length: 5.14, average data length: 30.86
Clustering factor: 9999857, ratio: 1.00
Fill distribution:
0 - 19% = 144
20 - 39% = 1
40 - 59% = 10726
60 - 79% = 13874
80 - 99% = 8254
Gstat completion time Sun Feb 11 07:17:42 2024
Here are the SQLs that I ran to get the results described. They give the exact same response times, same level of PLAN (same number of steps)
/* Each select is run in it's own connection to the database */
SELECT
MY_BIGINT, MY_GUID, UUID_TO_CHAR(MY_UUID) AS "MY_UUID"
FROM
TBL1
WHERE
MY_BIGINT = 12999998
--
SELECT
MY_BIGINT, MY_GUID, UUID_TO_CHAR(MY_UUID) AS "MY_UUID"
FROM
TBL1
WHERE
MY_GUID = '38DD0E53-A80D-4E0A-976E-7AA6C80C11A8'
--
SELECT
MY_BIGINT, MY_GUID, UUID_TO_CHAR(MY_UUID) AS "MY_UUID"
FROM
TBL1
WHERE
MY_UUID = CHAR_TO_UUID('6B81D7EC-BACE-4C3D-963F-CF251ED0202C')
---
The nightmare I refer to is when trying to merge 3 production databases into 1, every record in the target DBs and feeder DBs is linked by sequentially issued integer identifiers, being issued every second, you can't shut down any of it, so you end up with a "moving hole" problem, which I describe in my book, available on Amazon, which I'm probably not allowed to mention here.