How to extract information (e.g. types and subtypes) from Wikipedia?

177 views Asked by At

I somehow want extract type information from Wikipedia. For example, I want to find:

  • list of all "Carolina Panthers players"
  • list of all "colors"
  • list of all "NFL teams"
  • list of all "month"

Any ideas if there is a clean way of doing this?

Clearly one alternative is using the API, but as far as I'm aware, it's not trivial to use the existing API to extract such information from Wiki.

3

There are 3 answers

0
Wasi Ahmad On

It seems like you need to extract all the categories from Wikipedia and build the category taxonomy. Once you build the category taxonomy, you will be able to retrieve related categories as well.

Using category information, you can also retrieve all Wikipedia articles associated to a particular category as well.

I believe my project on mining Wikipedia may help you in this regard. I have pre-processed information about Wikipedia articles and categories which are publicly available for use.

0
RQDQ On

It looks like Wikipedia has an API. I would start here:

https://m.mediawiki.org/wiki/API:Main_page