I have this scenario in JAVA-spring app, where I have to export large amount of data, after categorizing them in a directory structure, as a zip file. As data is huge, i'm fetching it in batches(lets say 500 results at a time), arranging them in categories and subcategories and writing them into a zip file. As fetching and arranging data takes considerable amount of time, I cannot keep my zipOutputStream
open for such long time, so I'm making new zip for every batch, and then appending its data to master zip file.
After 1st fetch, let's say i made this zip file:
batch_1.zip
|_Catg A
|_SubCatg A
|_FILE 1
|_FILE 2
|_SubCatg B
|_FILE 3
|_Catg B
|_SubCatg C
|_FILE 498
|_SubCatg D
|_FILE 499
|_FILE 500
Similarly, I create a zip for next 500 resuls:
batch_2.zip
|_Catg A
|_SubCatg A
|_FILE 501
|_Catg B
|_SubCatg D
|_FILE 999
|_Catg C
|_SubCatg E
|_FILE 555
Now, I have to merge these 2 files. Both zip files may contain directories for categories and subcategories. To check for existing folders, I'll need to iterate over all the entries in both zip files, which is again too expensive, as there can be over 5000 results with 500+ categories,2500+ subcategories.
Is there any other approach for handling this scenario.