Google Cloud Bigtable compression

1.1k views Asked by At

I'm looking into how BigTable compresses my data.

I've loaded 1,5GB into 1 table; about 500k rows containing 1 column, on average each cell holds about 3kb. In further tests more columns will be added to these rows containing similar data with similar size.

The data in each cell is currently a JSON serialized array of dictionaries [10 elems on avg], like:

[{
    "field1": "100.10",
    "field2": "EUR",
    "field3": "10000",
    "field4": "0",
    "field5": "1",
    "field6": "1",
    "field7": "0",
    "field8": "100",
    "field9": "110.20",
    "field10": "100-char field",
    "dateField1": "1970-01-01",
    "dateField2": "1970-01-01",
    "dateTimeField": "1970-01-01T10:10:10Z"
},{
    "field1": "200.20",
    "field2": "EUR",
    "field3": "10001",
    "field4": "0",
    "field5": "1",
    "field6": "0",
    "field7": "0",
    "field8": "100",
    "field9": "220.30",
    "field10": "100-char field",
    "dateField1": "1970-01-01",
    "dateField2": "1970-01-01",
    "dateTimeField": "1970-01-01T20:20:20Z"
}, ...]

The BigTable console shows me that the cluster holds 1,2GB. It thus compressed the 1,5GB I inserted to roughly 80% of the original size. Gzipping a typical string as they are stored in the cells however gives me a compression ratio of about 20%.

This compression performance of BigTable thus seems low to me, given that the data I'm inserting holds a lot of repetitive values (e.g. the dictionary keys). I understand that BigTable trades of compression for speed, but I'd hoped it to perform better on my data.

Is a compression ratio of 80% ok for data like above, or are lower values to be expected? Are there any techniques to improve the compression, apart from remodeling the data I'm uploading?

Thanks!

1

There are 1 answers

1
Max On BEST ANSWER

Lower values are definitely expected. We've found and fixed a bug related to use of compression in Cloud Bigtable, which is now live in production.

For data such as the example you posted, you should now be seeing a much higher compression ratio and lower disk usage!