Is Excel Power Pivot “double compression” a recognised effect?

60 views Asked by At

I have a large SQL query which is held in Power Pivot, every month I download the whole thing again but with the latest month included. Each month is around 1.3m rows. At one point it had 161m rows (January 2014 to August 2023 data) and was 4 GB in size. However the next month’s download produced a memory overload error. So I redownloaded the data but this time for January 2015 to September 2023 (cutting off the first year). This significantly reduced the size to 2.8 GB! The table still has 148m rows. In comparison, earlier on the file had January 2014 to May 2022 data and was 137m rows, but was 3.37 GB in size.

148m rows for 2.8GB vs 137m rows for 3.37 GB? How can this be true? I’ve checked everything and there’s no difference between the data models or the calculated columns, also nothing special about later months, every month contains the same kind of data, so the compression is working with the same cardinality, data types etc. I also checked in DAX studio and there are no differences in the memory taken up by each column, it’s just the same across the board, no particular column is spiking in memory storage.

Also when I added the next month of data, the size increased significantly from 2.8 GB to 3.4 GB, just to go from 148m to 149m rows. So whatever this effect was it seemed to have undid itself the next time.

I need to understand this effect because I need to keep the file as small as possible to help user performance. The only thing I can think of is that when I cut off the first year of data, it downloaded the rest of the data but the compression routines “recognised” this data and could somehow compress it more effectively the second time round? Is this a recognised phenomenon in Vertipaq compression? Are there any techniques/articles/books you know on the subject? Any help greatly appreciated, thanks.

1

There are 1 answers

2
davidebacci On

The Definitive Guide to DAX has a couple of chapters dedicated to Vertipaq.

https://www.amazon.co.uk/Definitive-Guide-DAX-intelligence-Microsoft/dp/1509306978

Compression is all down to cardinality and the engine will use some heuristics to decide which compression algorithm to use for each column. I'm 99.9% sure that each refresh involves a full compression routine and that is does not reuse metadata or data from the previous refresh.