How to give certain operation priority in matcher Spacy

38 views Asked by At

I would like to give a certain match rule priority in Spacy's matcher. For example the sentence: "There is no apple or is there an apple?, I would like to give the no apple priority. So actually if that happens once is should return no string_id. Now I use a pattern to check both "no apple" and "apple". Here is some reproducible example:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)

pattern = [
    [{"LOWER": {"NOT_IN": ["no"]}}, {"LOWER": "apple"}],
    [{"LOWER": "apple"}]
]

matcher.add("apple", pattern)

doc = nlp("There is no apple or is there an apple?")
matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end] 
    print(match_id, string_id, start, end, span.text)

Output:

8566208034543834098 apple 3 4 apple
8566208034543834098 apple 7 9 an apple
8566208034543834098 apple 8 9 apple

Now it matches the apple multiple times because of the second statement in the pattern. An option could be creating a separate pattern especially for No like this:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
 
pattern = [
    [{"LOWER": "apple"}],
]
no_pattern = [
        [{"LOWER": "no"}, {"LOWER": "apple"}],
]

matcher.add("apple", pattern)
matcher.add("no_apple", no_pattern)

doc = nlp("There is no apple or is there an apple?")
matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end] 
    print(match_id, string_id, start, end, span.text)

Output:

14541201340755442066 no_apple 2 4 no apple
8566208034543834098 apple 3 4 apple
8566208034543834098 apple 8 9 apple

Now it show the no apple as a pattern which can be used for the outcome. But I was wondering if it possible to let spacy know to prioritize a statement? This would prevent it from making multiple patterns.

0

There are 0 answers