A have a PITR configuration with postgresql 9.6, with a master server, a intermediate server, and two slave servers, hot-standby but manually switched, such as this:
Master
|
I1
/ \
S1 S2
A failure writing to disk caused the master server to crash. A partner in the development team corrected the error and then restarted the master database (instead of promoting the intermediate server, which is the established procedure). Because of this, there is a corrupt partial WAL and a whole WAL missing from the sequence.
Now, I have no transactions missing but the slave 1 as well the intermediate server complain about the missing wal, (ERROR: requested WAL segment [...] has already been removed) even as they are still updating; s2 complains as well (same as above, but preceeded by (FATAL: could not receive data from WAL stream:), and it is not updating.
Since the transactions happening when the master server went down have -already- been executed, I do not care for the missing wals. So the proper questions are:
1) How do get rid of the nagging about the missing wal? I already tried pg_resetxlog -l (next valid WAL file) -f (which does not complain anymore, but does not update) and pg_basebackup which, not surprisingly on second tought, returns to the situation described above.
2) Why is one of the slaves updating (unexpected) while the other one is not (expected)? I thought first that perhaps the updating slave was directly connecting to the master, but it is not; I have checked the configuration files and they are identical in both slaves.
Thanks for your attention