Proper standby status update in streaming replication protocol

586 views Asked by At

Question is about streaming replication protocol. Which is extremely simple, and was made for physical replication and is capable of:

  • Sending server status > Primary keepalive message
  • Receiving replica status > Standby status update
  • Send WAL data > XLogData

There is also logical decoding, uses same XLogData frames to send data decoded from WAL by plugin, like pglogical instead of raw WAL.

Streaming Replication expects me to commit Standby status update, in order to free resources and remove old WAL, which according to docs

The location of the last WAL byte + 1 ...

pglogical returns its own LSN positions using its own messages within XLogData frames, but those are not usable.

Logical decoding does not work, when data is written to a different DB. And slot position yet needs to be updated, otherwise slot would be lost. So, the only way is to send LSN positions from Primary keepalive message, which according to docs sends

The current end of WAL on the server.

Which is confusing. What if slot is on position 100 and the server is already on 200?

So,

after experimenting and checking sources of pg_recvlogical, understood, that Primary keepalive message does not mean "The current end of WAL on the server", but in reality gradually increases from slot position up to the current (pg_current_wal_lsn()) server LSN. Having XLogData frames in between (if any). Seems, like messages are received sequentially ordered by LSN.

And now, questions:

Q1) Is it documented somewhere?

Q2) Does it make sense? Did I misunderstand anything?

Q3) Are streaming messages always sorted by LSN?

Q4) Is it okay to commit positions from Primary keepalive message?

1

There are 1 answers

3
ubombi On BEST ANSWER

According to this

writePtr is the location up to which the WAL is sent. It is essentially
the same as sentPtr but in some cases, we need to send keep alive before
sentPtr is updated like when skipping empty transactions.

Which is then written in

pq_sendint64(&output_message, XLogRecPtrIsInvalid(writePtr) ? sentPtr : writePtr);

This means, that The current end of WAL on the server. is actually the location up to which the WAL is sent

Also here in walreceiver.c it says:

'walEnd' and 'sendTime' are the end-of-WAL and timestamp of the latest message, reported by primary.