How to extract "wikibase_item" from wikipedia dump?

143 views Asked by At

Hi everyone I'm looking to extract the value of "wikibase_item" for every article in Wikipedia by using wiki dump via bz2 (that I already downloaded). example for what value I want to achieve ("Q2263"):

{"batchcomplete":"","query":{"pages":{"43568":{"pageid":43568,"ns":0,"title":"Tom Hanks","pageprops":{"defaultsort":"Hanks, Tom","page_image_free":"Tom_Hanks_TIFF_2019.jpg","wikibase-shortdesc":"American actor and film producer","wikibase_item":"Q2263"}}}}}

That example provided by query to the API (Which I don't want to do).

I tried to open the xml file that in the bz2 file and find (ctrl-f) for "wikibase_item" or the value of specific entity that in there and I didn't get nothing. I wondering if there any option to get this value from the wiki dump at all? and if there is another options to get this I would like to hear about it?

Note - my code is taken from this github: https://github.com/jeffheaton/present/tree/master/youtube/wikipedia/process that code providing "id" of article which isn't the same in different language, that's why I want to get "wikibase_item" value.

Any comment will be appreciate, Thanks!

0

There are 0 answers