Using Oak version 1.42 (with java 17), I created a segment NodeStore, backed by a FileBlobStore, like this:
FileBlobStore fileBlobStore = new FileBlobStore("…");
SegmentGCOptions gcOptions = SegmentGCOptions.defaultGCOptions().setEstimationDisabled(true);
FileStore fileStore = FileStoreBuilder.
fileStoreBuilder(new File("…")).
withBlobStore(blobStore).
withGCOptions(gcOptions).
build();
Repository repository = new Jcr(new Oak(nodeStore)).createRepository()
Then, I create a blob and later I delete it.
The file associated to the blob is still on the file system, and, in my understanding, a garbage collection is needed in order to actually have it removed.
What's the proper way to run the Garbage Collector?
By reading the documentation - and taking into account that the whole thing is NOT run within an Osgi container - it seems that you should call MarkSweepGarbageCollector#collectGarbage(false)
, but it's not clear to me what are supposed to be the arguments of the constructor (or whether I can get a reference to an instance from some other data structure) - for instance:
MarkSweepGarbageCollector garbageCollector = new MarkSweepGarbageCollector(
new SegmentBlobReferenceRetriever(fileStore),
blobStore,
executor,
"some path…",
10, // do not have any idea of what this parameter is and what is supposed to be set
1, // are these millis? Or millis from epoch? No idea…
null) // no idea what it is this - but it's allowed to be null
This way the blob is NOT deleted - with very little surprise, given that it's unclear to me what the parameters exactly are, and the exact purpose/meaning of some phases (e.g. the blobs are "marked", but I wasn't able to find out what this "mark" means - even by looking at the code…).
I have also tried to call fileStore.fullGC();
before collectGarbage
(thinking that maybe the blob wasn't deleted because of some "stale" data), but with no luck.
Maybe the blob is not deleted because it's referenced in the versions history?
If so, what's the proper way to get rid of the blob - firstly delete the version history (how?) and then run the garabage collection?
I did some more investigations, and I also created a mini-program that replicate the issue:
- you can find the program here on github
- by following the program flow step-by-step, imo the issue is that, in the "mark" phase (that is when the GC looks for blobs that are actually still "in use"), the deleted blob reference is still there, i.e. it is still present in the
BinaryReferencesIndex
; therefore, it is marked, and consequently, begin both marked and available, it is not even considered a "candidate" for sweeping.
I think that maybe this could have something to do with the way I add the blob: code follows, please refer to the above github for full context:
Node rootFolder = session.getRootNode();
Node fileNode = rootFolder.addNode(temporaryFile.getName(), "nt:file");
fileNode.addMixin("mix:referenceable");
Node fileContentNode = fileNode.addNode("jcr:content", "nt:resource");
fileContentNode.setProperty("jcr:data", "");
session.save();
Blob blob = nodeStore.createBlob(FileUtils.openInputStream(temporaryFile));
NodeBuilder rootBuilder = nodeStore.getRoot().builder();
NodeBuilder fileContentNodeBuilder = getNodeBuilder(fileContentNode, rootBuilder);
fileContentNodeBuilder.setProperty("jcr:data", blob);
nodeStore.merge(rootBuilder, EmptyHook.INSTANCE, CommitInfo.EMPTY);
session.save();
At the present time, my best guess is that this is a bug.The bug has been filed here.
Developers from Oak commented within the above issue and turns out that this is not a bug, but, in order to GC to be effective, it requires a "compact" action to be run first, this way: