How to map between Freebase and Wikipedia?

2.1k views Asked by At

I used the tool AIDA (a kind of named entity tool) to annotate a corpus and get the format like this:

2   Germany http://en.wikipedia.org/wiki/Germany    11867   /m/0345h
6   United_Kingdom  http://en.wikipedia.org/wiki/United_Kingdom 31717   /m/07ssc

the column 3 is the corresponding Wikipedia URL of the entity and the column 4 is the corresponding Wikipedia ID of the entity. Is there a way to map the url or the id to the Freebase MID like the last column? The last column was the other person's work. I have no clue how he did it and can't find a way in the other place.

Here is the AIDA link: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/

2

There are 2 answers

3
Tom Morris On BEST ANSWER

It's easy to map from both of those EN Wikipedia IDs to a Freebase topic and it's various identifiers, including the MID, using either the Freebase API or the Freebase data dumps. Which one is best to use will depend on the volume of data that you need to map.

All Wikipedia IDs are stored in the namespace rooted at /authority/wikipedia in Freebase. The numerical IDs (ie article numbers) are stored in /authority/wikipedia/en_id for the English Wikipedia, so you can use http://freebase.com/authority/wikipedia/en_id/11867 as one of the aliases for the Germany topic.

All the other sub-namespaces are listed here: https://www.freebase.com/authority/wikipedia?ns= but the other two that are relevant for English Wikipedia are en and en_title, both of which contain keys using the alpha Wikipedia article names. The latter is the canonical ID and is unique while the former contains that ID, plus the IDs for all the redirect pages that point to it.

Both of these URLs are also aliases for Germany:

https://www.freebase.com/authority/wikipedia/en/Germany https://www.freebase.com/authority/wikipedia/en_title/Germany

To use the MQLRead query API, construct a query like this:

[{
  "id": "/authority/wikipedia/en_id/11867",
  "mid": null,
  "name": null
}]

and parse the resulting JSON

{
  "result": [{
    "id": "/authority/wikipedia/en_id/11867",
    "mid": "/m/0345h",
    "name": "Germany"
  }]
}

to get the MID. The full query URL would look like this:

https://www.googleapis.com/freebase/v1/mqlread/?lang=%2Flang%2Fen&query=%5B%7B+%22id%22%3A+%22%2Fauthority%2Fwikipedia%2Fen_id%2F11867%22%2C+%22mid%22%3A+null%2C+%22name%22%3A+null+%7D%5D

You could do the same thing with the alpha keys in the other namespaces, but the keys need to be escaped for special characters and it's not worth the hassle to describe it since you've got the numeric identifiers. MQL Key Escaping is described here if anyone else needs it: http://wiki.freebase.com/wiki/MQL_key_escaping

0
akb On

You could query Freebase with the Wikipedia info, see the Freebase API docs. Query on the /common/topic/topic_equivalent_webpage property. However, Freebase will be shutting now in the near future so I don't recommend putting much effort into that.