Ontologies, OWL, Sparql: Modelling that "something is not there" and performance considerations

658 views Asked by At

we want to model that "something is not there" as opposed to missing information, e.g. an explicit statement that "a patient did not get chemotherapy" or that "a patient does not have dyspnea" is different from missing information about whether a patient has dyspnea.

We thought about several approaches, e.g.

  • Using a negation class: "No_Dyspnea". But that seems semantically problematic, since what type would that class be? It cannot be a descendant of the "Dyspnea" class.
  • Using "not there" object properties, e.g. "denies" or "does_not_have" and then an individual of the Dyspnea root class as the object of that triple.
  • Using blank nodes that describe that the individual belongs to the group of things that do not have dyspnea. E.g.:

    dat:PatientW2 a [ rdf:type owl:Class;
                  owl:complementOf [
                    rdf:type owl:Restriction ;
                        owl:onProperty roo:has_finding;
                        owl:someValuesFrom nci:Dyspnea;
                  ]
                ] .
    

We feel like the 3rd option is the most "ontologically correct" way of expressing this. However, when playing around with it we encountered severe performance problems in simple scenarios.

We are using Sesame with an OWLIM-Lite store and imported the NCI thesaurus (280MB, about 80,000 concepts) and another very small ontology into the store and added two individuals, one having that complementOf/restriction class.

The following query took forever to execute and I terminated it after 15 minutes:

select *
where {
  ?s a [ rdf:type owl:Class;
                      owl:complementOf [
                        rdf:type owl:Restriction ;
                        owl:onProperty roo:has_finding;
                        owl:someValuesFrom nci:Dyspnea;
                  ]
                ] .
} Limit 100

Does anybody know why? I would assume that this approach creates a lot of blank nodes and the query engine has to go through the entire NCI thesaurus and compare all blank nodes with this one?

If I put this triple in a separate graph and only query that graph, the query returns the result instantaneously.

To sum things up. The two basic questions are:

  • Is the third approach really the best for modelling "something is not there"
  • Is this going to affect query performance?

EDIT 1

We discussed the proposed options. It actually helped us in clarifying what we are really trying to achieve:

  • We want to be able to state that "Patient has Dyspnea" or "Patient does not have Dyspnea" at a particular point in time.

  • In the future there may/will be more information about that patient, e.g. that he/she now has dyspnea.

  • We want to be able to write Sparql queries that ask for "all patients that have dyspnea" and "all patients that do not have dyspnea".

  • We want to keep the Sparql as simple and intuitive as possible. E.g. only use one property "has_finding" rather than having to know about two properties (one for "has_exclusion"). Or having to know about some complex blank node construct.

We played around with options:

  • Negative Property Assertions: This sounded like the best solution to this problem since we are stating that one individual is not related to another individual on that property. The issues are that we have to create an individual of Dyspnea for the sake of having something as owl:targetIndividual. And we cannot find a way of querying the negative assertion easily other then going through the whole owl:sourceIndividual and owl:targetIndividual chain. Which makes the Sparql quite lengthy and puts a burden on the person writing the query to know about it.
  • Blank node with complementOf: We would be stating something with this that we do not want to state. This would state that "Patient1 can never have a finding of dyspnea". Whereas we want to state the "Patient1 does not have a dyspnea finding now (or at date X)". So we should not use this approach.

  • Using an Exclusion/Inclusion Types (Option 1 and 2): After a closer look a Jeen's suggestion we believe that using general :Exclusion and :Inclusion classes along with only one property has_finding and giving the dyspnea individual the inclusion/exclusion type is the easiest to understand, query and provides enough reasoning abilities. Example:

    :Patient1 a :Patient . :Dyspnea1 a :Dyspnea . :Dyspnea1 a :Exclusion. :Patient1 ex:has_finding :Dyspnea1 .

That way, the person writing the Sparql query only has to know that:

  • There is one property has_finding, which represents the intentions properly. Since "No dyspnea" is technically a finding as well.
  • But just querying using has_finding will not give sufficient information about whether the person actually has it or not. The query also needs to contain a triple about whether the dyspnea individual is a :Exclusion (or inclusion depending on the goal of the query).
  • While this puts some additional burden on the query writer, it is less than negative property assertions and easier to understand.

We would really appreciate some feedback on these conclusions!

2

There are 2 answers

6
Jeen Broekstra On BEST ANSWER

With respect to the modeling question, I'd like to offer a fourth alternative, which is, in fact, a mix of your options 1 and 2: introduce a separate class (hierarchy) for these 'excluded/missing' symptoms, diseases or treatments, and have the specific exclusions as instances:

 :Exclusion a owl:Class .
 :ExcludedSymptom rdfs:subClassOf :Exclusion .
 :ExcludedTreatment rdfs:subClassOf :Exclusion .

 :excludedDyspnea a :ExcludedSymptom .
 :excludedChemo a :ExcludedTreatment .

 :Patient a owl:Class ;
          owl:equivalentClass [ a owl:Restriction ;
                                owl:onProperty :excluded ;
                                owl:allValuesFrom :Exclusion ] .

 // john is a patient without Dyspnea 
 :john a :Patient ;
       :excluded :excludedDyspnea .

Optionally, you can link the exclusion instances semantically with the treatment/symptom/diseases:

  :excludedDyspnea :ofSymptom :Dyspnea . 

In my view, this is just as "ontologically correct" (this kind of thing is quite subjective to be honest) as your other options, and possibly a lot easier to maintain, query, and indeed reason with.

As for your second question: while I can't speak for the behavior of the particular reasoner you're using, in general any construction involving complementOf is computationally very heavy, but perhaps more importantly, it probably does not capture what you intend.

OWL has an open world assumption, which (in broad terms) means that we cannot decide a certain fact is untrue simply because that fact is currently unknown. Your complementOf construction will logically be an empty class, because for any individual X, even if we currently do not know that X has been diagnosed with Dyspnea, there is a possibility that in the future that fact will become known, and therefore X will not be in the complement class.

EDIT

In response to your edit, with the proposal using a single :hasFinding property, I think that generally looks good, though I would perhaps modify it slightly:

   :patient1 a :Patient;
             :hasFinding :dyspneaFinding1 .

   :dyspneaFinding1 a :Finding ;
                    :of :Dyspnea ;
                    :conclusion false .

You have now separated the 'finding' as a concept a bit more cleanly from the symptom/treatment that it is a finding of. Also, whether or not the finding is positive or negative is explicitly modeled (rather than implied by the presence/absense of an 'excluded' property or a 'Exclusion' type).

(As an aside: since we link an individual with a class here via a non-typing relation (... :of :Dyspnea) we must rely on OWL 2 punning to make this valid in OWL DL)

To query for a patient with a finding (whether positive or negative) about Dyspnea:

 SELECT ?x 
 WHERE {
    ?x a :Patient; 
       :hasFinding [ :of :Dyspnea ] .
 }

And to query for patients with confirmed absense of Dyspnea:

 SELECT ?x 
 WHERE {
    ?x a :Patient; 
       :hasFinding [ :of :Dyspnea ;
                     :conclusion false ] .
 }
4
Joshua Taylor On

If your diseases are represented as individuals, then you can use negative object property assertions to literally say, e.g.,

         ¬hasFinding(john,Dyspnea)

        NegativeObjectPropertyAssertion(hasFinding john Dyspnea)

Of course, if you have lots of things that aren't the case, then this might get a bit involved. It's probably the most semantically correct, though. It also means that your query could match directly against the data in the ontology, which might make for quicker results. (Of course, you'd still have the issues of trying to infer when the negative object property holds.)

This doesn't work if diseases are represented as classes, though. If diseases are represented by classes, then you can use class expressions, similar to what you propose. E.g.,

        (∀ hasFinding.¬Dyspnea)(john)

        ClassAssertion(ObjectAllValuesFrom(hasFinding ObjectComplementOf(Dyspnea)) john)

This is similar to your third option, but I wonder if it might perform better. It seems like a slightly more direct way of saying what you're trying to say (i.e., if someone has a disease, it's not one of these diseases).

I do agree with Jeen's answer, though; there's a lot of subjectivity here, and a great deal of getting it "right" is actually just a matter of finding something that's reasonable to work with, performs well enough for you, and that seems not entirely unnatural.