Cassandra global snapshot

180 views Asked by At

I am running a cluster with 3 nodes(EC2 instances) and replication factor=2. I execute a script from the first node which runs nodetool snapshot on all the nodes using pssh (parallel-ssh) utility. But the snapshot data for each node gets stored on that node itself. Is there a way we can get snapshot data of all nodes to the node from where I ran the script so that my script can easily copy the data to S3 from a single place?

Also, Suppose if I have a 5 node cluster and I have snapshots for each node. Now I want to restore this data to a 10 node clusters and a 2 node cluster with different replication factors. Is the below process correct for restore?

  1. copy snapshot data from all the 5 nodes and merge all the files into a single folder.

  2. run sstableloader command passing all the IP addresses (which are 10 or 2 in number) and single folder location. Will this properly split the data from 5 node to 10 or 2 nodes after restore ?

1

There are 1 answers

0
Alex Ott On

I strongly suggest to use the Medusa tool (doc) for backup & restore of your Cassandra cluster(s) - it's able to backup data to cloud storage, and you can restore data to clusters, even with different topologies.