Currently I am setting up Master/Master Replication with bucardo between 5 Nodes on different locations (should provide location transparency). The database holds ~500 Tables which should be replicated. I grouped them into smaller replication herds of 50 Tables at maximum based on their dependency on each other. All tables have primary keys defined and the sequencers on each node are set up to provide system wide unique identities (based on residue class)
To get an initial database on each node, I made a --data-only custom format pg_dump into a File and restored this on each node via pg_restore. Bucardo sync is setup with the bucardo_latest strategy to resolve conflicts. Now when I start syncing bucardo is deleting all datasets in the origin database first and inserting it again from one of the restored nodes, because all restored datasets have a "later timestamp" (the point in time when I called pg_restore). This ultimately prohibits the inital startup as bucardo needs very much time and also fails, as there are lots of datasets to solve and timeouts often too short.
I also have 'last_modified' timestamps on each table which are managed by UPDATE triggers, but as I understand it, pg_dump inserts data via COPY, and therefore these triggers don't get fired.
- Which timestamp does bucardo use to find out who is
bucardo_latest? - Do I have to call
pg_dumpwith something likeset SESSION_REPLICATION_ROLE = 'replica';?
I just want bucardo to keep track of every new change, not executing pseudo changes because of the restore.
EDIT: pg_restore has definitely fired several triggers at restore time...as said I keep track on user and last modification date in each table, and those values are set to the user and timestamp when the restore was done. I am aware, that I can set SESSION_REPLICATION_ROLE for a plain text format restore via psql. Is this also possible for pg_restore somehow?
The common approach is make the dump/restore process before configure the replication.
So an option will be:
bucardoschema in each databasebucardo removefor each object (most of them allow useall, likebucardo remove table allonetimecopy=0. It's the default but I feel safer making it explicit.Which timestamp does bucardo use to find out who is bucardo_latest?
bucardohandles its own timestamp value. Each table should have a trigger named likebucardo.delta_myschema_mytablethat makes and insert in a table named likebucardo.delta_myschema_mytable. This table has a columntxntime timestamp with time zone not null default now()and this is the timestamp used.Do I have to call pg_dump with something like set SESSION_REPLICATION_ROLE = 'replica';?
AFAIK, if
bucardotriggers are already set in the tables, the option--disable-triggersofpg_restoreshould do the trick.You can also check these articles about working with large databases and the use of session_replication_role