I am currently experimenting with the ZstdNet library for compressing small text using a pre-generated dictionary. While the compression works correctly in most cases, I encounter an exception with the message "Src size is incorrect" under certain circumstances.
I have created a minimal test to reproduce the issue:
var text = "bla bla, bla bla bla";
var bytes = Encoding.UTF8.GetBytes(text);
var dic = ZstdNet.DictBuilder.TrainFromBuffer(text.Split(' ').Select(Encoding.UTF8.GetBytes) );
Any insights or suggestions?
Check first if this is related to the warning included in the comment of that method:
(This is from the C++
facebook/zstd, but the same idea applies to the wrapper libraryskbkontur/ZstdNet)From that comment, the warning is:
That method expects a collection of byte arrays as training samples for dictionary creation, but the way you are generating these samples may not be suitable for all scenarios, especially with the minimal and repetitive text example you provided.
So make sure the input text provides enough unique samples for dictionary training. A larger and more varied dataset might be necessary.
Try and manually creating a larger and more diverse set of samples for the dictionary training process if your use case involves small or very specific text samples.