We are facing some issues using graphdb 9.1 free edition: our disk usage its increasing in an exponential way. I used the https://graphdb.ontotext.com/documentation/10.0/requirements.html#hardware-sizing reference along with the following query to count triples:
SELECT (COUNT(?s) AS ?triples) WHERE { ?s ?p ?o }
Below, you can see the size of each folder of the graphdb installation we have:
March 7
83G ./data
4.9G ./logs
30M ./work
4.0K ./conf
88G .
Triples count 192.567.172
March 14
109G ./data
5.0G ./logs
30M ./work
4.0K ./conf
114G .
Triples count 199.287.593
In 7 days, we incresed the number of triples by 3% and the disk usage by almost 30%. Could anyone give us any help to understand this behavior?
We double checked that we are not ingesting extra triples in our applications and everything seems to be working fine.
It may be caused by the rule set and related settings on your repository that can cause additional triples to be inserted.
By default if you go to your repo in the workbench you would see
RDFS-Plus is the default setting on a repo. This will cause inferences to add triples to your repo when triples are inserted. The triples that are added by the ruleset, can cause further triples to be added. This happens at insert and is actually added to the triple store, not at query time.
If you want no inferences you should have "no inference" selected:
It will be difficult to find triples that have been added by the ruleset. You can find the rules in configs/rules/builtin_RdfsRules-optimized.pie.