I am not an expert in Machine Learning, so I will try to be as accurate as possible...
I am currently analyzing financial documents that are giving information on a specific fund. What I would like to do is to be able to extract the fund name.
For this, I am using Named Entity Recognition (NER) in Azure Machine Learning platform. After analyzing approx. 100 documents, I get results classified as Organizations. In most cases, they are really organizations. This is great, but my problem is that the fund name is also categorized as an organization. I am not able to distinguish between a company name and a fund name.
From some readings on Internet, I could discover that Gazette system could help so that we can match the recognized organizations against a list of funds, and therefore make sure that we have a fund name.
Do you think this would be a good approach? Or is there any other algorithm that I should try to improve the results?
Thanks for any suggestion!
NER has its origins in identifying text identifying broad semantic categories, like the names of people or organizations (companies) in your case. Reading the description of question, I don't think this is the problem you really want to solve. Specifically you mention:
I suspect the problem you really want to solve is one of semantic interoperability - you want text from your NLP program to match a list you have that is part of another system. In that case, the only accepted way you are going to solve your problem is to map all of the input text to a list/common standard - ie) use the gazetteer. So you are on the right path.
The only caveat is that if you only need to distinguish between funds and other types of organizations - without the need to match the results against a list. If that is the case, you write a classifier to distinguish funds from everything else and you can avoid mapping to your list entirely. Otherwise use a gazetteer.