How to Merge Multiple Database files in SQLite?

21.9k views Asked by At

I have multiple database files which exist in multiple locations with exactly similar structure. I understand the attach function can be used to connect multiple files to one database connection, however, this treats them as seperate databases. I want to do something like:

SELECT uid, name FROM ALL_DATABASES.Users;

Also,

SELECT uid, name FROM DB1.Users UNION SELECT uid, name FROM DB2.Users ;

is NOT a valid answer because I have an arbitrary number of database files that I need to merge. Lastly, the database files, must stay seperate. anyone know how to accomplish this?

EDIT: an answer gave me the idea: would it be possible to create a view which is a combination of all the different tables? Is it possible to query for all database files and which databases they 'mount' and then use that inside the view query to create the 'master table'?

2

There are 2 answers

4
Blrfl On

Because SQLite imposes a limit on the number of databases that can be attached at one time, there is no way to do what you want in a single query.

If the number can be guaranteed to be within SQLite's limit (which violates the definition of "arbitrary"), there's nothing that prevents you from generating a query with the right set of UNIONs at the time you need to execute it.

To support truly arbitrary numbers of tables, your only real option is to create a table in an unrelated database and repeatedly INSERT rows from each candidate:

ATTACH DATABASE '/path/to/candidate/database' AS candidate;
INSERT INTO some_table (uid, name) SELECT uid, name FROM candidate.User;
DETACH DATABASE candidate;
0
Free advice giver On

Some cleverness in the schema would take care of this.

You will generally have 2 types of tables: reference tables, and dynamic tables. Reference tables have the same content across all databases, for example country codes, department codes, etc.

Dynamic data is data that will be unique to each DB, for example time series, sales statistics,etc.

The reference data should be maintained in a master DB, and replicated to the dynamic databases after changes.

The dynamic tables should all have a column for DB_ID, which would be part of a compound primary key, for example your time series might use db_id,measurement_id,time_stamp. You could also use a hash on DB_ID to generate primary keys, use same pk generator for all tables in DB. When merging these from different DBS , the data will be unique.

So you will have 3 types of databases:

  • Reference master -> replicated to all others

  • individual dynamic -> replicated to full dynamic

  • full dynamic -> replicated from reference master and all individual dynamic.

Then, it is up to you how you will do this replication, pseudo-realtime or brute force, truncate and rebuild the full dynamic every day or as needed.