Named Entity Extraction - for Currency

3.4k views Asked by At

I have a pretty simple problem - recognize money/currency in text. Sample test case: "Pocket money should NOT exceed INR 4000 (USD 100) per annum." Fails on the default Stanford parser - online - (with the 7 class model, including Currency) http://nlp.stanford.edu:8080/ner/process - works only with text like "$ 100".

On the Alchemy demo site - https://alchemy-language-demo.mybluemix.net/ , "$ 100" is recognised as an Entity, while "USD 100" is recogised as a Concept - United States Dollar

1

There are 1 answers

1
jhl On

Not sure this is still useful after all this time, but here goes:

I think you have two options:

1) replace "USD" by "$" - this would be a simple find and replace and can be done in any tool you're likely to be using.

2) use a different tool or program.

Stanford NLP is great, but there are also other tools available.

Depending on what system/language you are using, there are many packages that already do the job for you.

For Python I'd recommend SpaCy:


# pip install spacy
# python -m spacy download en_core_web_sm

import spacy

# Load English tokenizer, tagger, parser, NER and word vectors

nlp = spacy.load("en_core_web_sm")

text = "Pocket money should NOT exceed INR 4000 (USD 100) per annum."

doc = nlp(text)

print("Money in USD:", [ent.lemma_ for ent in doc if ent.ent_type_ == "MONEY"])
# Money in USD: ['100']

This is just a simple example, you can find a more detailed script here.