In the beginning I used VARCHAR, at least until I found out that it supposedly doesn't support Unicode, for which reason I then switched to NVARCHAR. However, I encountered the need to insert invalid Unicode data into the database (stemming from the fact that on Linux paths are arbitrary bytes and therefore can contain messed up Unicode data).
So now I have to switch the data type again, but I'm stumped on the possibilities. After looking at SQLite docs; the following is "only a small subset of the datatype names that SQLite will accept":
- CHARACTER
- VARCHAR
- VARYING CHARACTER
- NCHAR
- NATIVE CHARACTER
- NVARCHAR
- TEXT
- CLOB
I'm aware of the possibility to store file paths with BLOB, but it doesn't feel right to store data that is text most of the times as BLOB.
Is there a preferred/idiomatic SQL data type to use for this use-case: text that's Unicode compliant most of the time?
If possible, that data type should transparently convert to and from Python's str data type with surrogateescapes (surrogateescapes are Python's way to still be able to represent faulty file paths with str).
I couldn't find help for this use-case by googling, so I will appreciate any help or pointers here!