I used the tool AIDA (a kind of named entity tool) to annotate a corpus and get the format like this:
2 Germany http://en.wikipedia.org/wiki/Germany 11867 /m/0345h
6 United_Kingdom http://en.wikipedia.org/wiki/United_Kingdom 31717 /m/07ssc
the column 3 is the corresponding Wikipedia URL of the entity and the column 4 is the corresponding Wikipedia ID of the entity. Is there a way to map the url or the id to the Freebase MID like the last column? The last column was the other person's work. I have no clue how he did it and can't find a way in the other place.
Here is the AIDA link: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/
It's easy to map from both of those EN Wikipedia IDs to a Freebase topic and it's various identifiers, including the MID, using either the Freebase API or the Freebase data dumps. Which one is best to use will depend on the volume of data that you need to map.
All Wikipedia IDs are stored in the namespace rooted at
/authority/wikipedia
in Freebase. The numerical IDs (ie article numbers) are stored in/authority/wikipedia/en_id
for the English Wikipedia, so you can use http://freebase.com/authority/wikipedia/en_id/11867 as one of the aliases for the Germany topic.All the other sub-namespaces are listed here: https://www.freebase.com/authority/wikipedia?ns= but the other two that are relevant for English Wikipedia are
en
anden_title
, both of which contain keys using the alpha Wikipedia article names. The latter is the canonical ID and is unique while the former contains that ID, plus the IDs for all the redirect pages that point to it.Both of these URLs are also aliases for Germany:
https://www.freebase.com/authority/wikipedia/en/Germany https://www.freebase.com/authority/wikipedia/en_title/Germany
To use the MQLRead query API, construct a query like this:
and parse the resulting JSON
to get the MID. The full query URL would look like this:
https://www.googleapis.com/freebase/v1/mqlread/?lang=%2Flang%2Fen&query=%5B%7B+%22id%22%3A+%22%2Fauthority%2Fwikipedia%2Fen_id%2F11867%22%2C+%22mid%22%3A+null%2C+%22name%22%3A+null+%7D%5D
You could do the same thing with the alpha keys in the other namespaces, but the keys need to be escaped for special characters and it's not worth the hassle to describe it since you've got the numeric identifiers. MQL Key Escaping is described here if anyone else needs it: http://wiki.freebase.com/wiki/MQL_key_escaping