Exponential disk usage in Ontotext GraphDB Platform

107 views Asked by At

We are facing some issues using graphdb 9.1 free edition: our disk usage its increasing in an exponential way. I used the https://graphdb.ontotext.com/documentation/10.0/requirements.html#hardware-sizing reference along with the following query to count triples:

SELECT (COUNT(?s) AS ?triples) WHERE { ?s ?p ?o }

Below, you can see the size of each folder of the graphdb installation we have:

March 7

83G ./data

4.9G ./logs

30M ./work

4.0K ./conf

88G .

Triples count 192.567.172

March 14

109G ./data

5.0G ./logs

30M ./work

4.0K ./conf

114G .

Triples count 199.287.593

In 7 days, we incresed the number of triples by 3% and the disk usage by almost 30%. Could anyone give us any help to understand this behavior?

We double checked that we are not ingesting extra triples in our applications and everything seems to be working fine.

2

There are 2 answers

3
Henriette Harmse On

It may be caused by the rule set and related settings on your repository that can cause additional triples to be inserted.

By default if you go to your repo in the workbench you would see Ruleset

RDFS-Plus is the default setting on a repo. This will cause inferences to add triples to your repo when triples are inserted. The triples that are added by the ruleset, can cause further triples to be added. This happens at insert and is actually added to the triple store, not at query time.

If you want no inferences you should have "no inference" selected:

No inference

It will be difficult to find triples that have been added by the ruleset. You can find the rules in configs/rules/builtin_RdfsRules-optimized.pie.

0
Sava Savov On

If you are using inference, GraphDB utilizes forward changing reasoning, meaning all statements are materialized on load.

You could read more about this using following link: https://graphdb.ontotext.com/documentation/10.2/architecture-components.html?highlight=forward%20chaining#reasoner-trree-engine

Obviously this is a corner case of an untypically long chain of entities related via transitive property. Still, it gives you an idea of what schema change can result in big inference. Another example would be defining two properties as inverse to one another or adding a new super class, which ends up having plenty of instances after inference.

Did you had changes like that? Or did you loaded a new version of a 3rd party dataset, which can contain such change?