Slony only replicates when intially starting

265 views Asked by At

So I followed the Slony tutorial and was able to replicate my databases, but I've noticed it only works when first starting up. If I leave the replication running any new data never makes it to the slave. The only way I have found to remedy this is to uninstall the cluster/nodes and reinstate them then again replication only happens when starting up.

I followed this tutorial here

My steps currently are:

  1. Start postgres on both Master and Slave

  2. Uninstall cluster/nodes With this script(I have another one for the slave node with the host as that node.

    #!/bin/sh
    slonik <<_EOF_
    cluster name = $CLUSTERNAME;
    
    node 1 admin conninfo = 'dbname=$TEST_DB host=$MASTERHOST user=test';
    
    uninstall node ( id = 1 );
    _EOF_
    
  3. Setup cluster

    #!/bin/sh
    slonik <<_EOF_
    cluster name = $CLUSTERNAME;
    node 1 admin conninfo = 'dbname=$TEST_DB host=$MASTERHOST user=test';
    node 2 admin conninfo = 'dbname=$TEST_DB host=$SLAVEHOST user=test';
    init cluster (id=1, comment = 'Master Node');
    
    create set (id=1, origin=1, comment='All test tables');
    set add table (set id=1, origin=1, id=1, fully qualified name = 'test.amqp_status', comment='amqp status');
    set add table (set id=1, origin=1, id=2, fully qualified name = 'test.corba_status', comment='corba status');
    set add table (set id=1, origin=1, id=3, fully qualified name = 'test.icmp_status', comment='ping status');
    set add table (set id=1, origin=1, id=4, fully qualified name = 'test.test_status', comment='teststatus');
    set add table (set id=1, origin=1, id=5, fully qualified name = 'test.ntp_status', comment='ntp status');
    set add table (set id=1, origin=1, id=6, fully qualified name = 'test.snmp_status', comment='snmp status');
    set add table (set id=1, origin=1, id=7, fully qualified name = 'test.subsystem_service_status', comment='subsystem_service status');
    set add table (set id=1, origin=1, id=8, fully qualified name = 'test.subsystem_status', comment='subsystem status');
    set add table (set id=1, origin=1, id=9, fully qualified name = 'test.switch_device_file', comment='switch_device_file');
    set add table (set id=1, origin=1, id=10, fully qualified name = 'test.host_status', comment='host status');
    
    store node (id=2, comment = 'Slave Node', event node=1);
    store path (server = 1, client = 2, conninfo='dbname=$TEST_DB host=$MASTERHOST user=test');
    store path (server = 2, client = 1, conninfo='dbname=$TEST_DB host=$SLAVEHOST user=test');
    _EOF_
    
  4. Run slon on each node with this command:

    slon $CLUSTERNAME "dbname=$TEST_DB user=test host=$MASTERHOST"
    
  5. Run replication script on Master (I've tried forward on no and yes with no difference.)

    #!/bin/sh
    slonik <<_EOF_
    cluster name = $CLUSTERNAME;
    
    node 1 admin conninfo = 'dbname=$TEST_DB host=$MASTERHOST user=test';
    node 2 admin conninfo = 'dbname=$TEST_DB host=$SLAVEHOST user=test';
    
    subscribe set (id = 1, provider = 1, receiver = 2, forward = yes);
    _EOF_
    

Once this last script is run within a second my tables have copied over to the slave and I can see the SYNC happening in the slon output of each host, but even thought I see this SYNC message I do not see the tables being updated anymore.

I have manually logged in to PostgreSQL and inserted into the tables. I have also tried using the PSQL command instead and java inserting into postgres. Nothing seems to be seen by Slony past the initial copy.

as for postgres settings I have replication set to 'replica', but have not changed much else as the Slony documentation didn't suggest anything.

I figure I am missing something basic, but please help me thanks.

1

There are 1 answers

4
Christopher Browne On BEST ANSWER

A couple minor details...

  • I would suggest leaving the "replication" mode alone; Slony manages that itself, and mucking around with it is liable to cause things to break more confusingly.

  • FYI, having forwarding on or off is pretty irrelevant if you only have 2 nodes; that won't be important.

My first thought was that perhaps you were only running a slon process against the subscriber node, and not against the origin; that would cause the perceived phenomenon. The slon running against the origin doesn't do any replication work, but it does mark SYNC events, so that subscribers know that they have data to pull. No slon against the master means subscribers think they have no work to do.

A next thought is to see if changes are being successfully captured on the origin. Update some data on the master, and, on the master, look in tables [SlonySchemaName].sl_log_1 and .sl_log_2. The data changes should be captured there; if they're not, then we'd be able start looking to why not.

A further thought... Turn the debugging level up a bit. Info level (log_level = 0) should be enough normally, but when something confusing is happening, head to log_level = 1 which is DEBUG1.

On the origin, all you'll see, for the most part, is that, when busy, SYNCs get generated fairly frequently, and, if not busy, SYNCs get generated infrequently.

The action takes place on the subscriber, and, in the logs, at DEBUG1, you'll get a fair bit more indication of what replication work is going on.

The documentation about Log Analysis should be fairly helpful; see http://www.slony.info/documentation/2.2/loganalysis.html