download articles from wikipedia using special export

Question

download articles from wikipedia using special export

700 views Asked by no_freedom At 31 October 2011 at 18:47

I want to be able to download full histories of a few thousand articles from http://en.wikipedia.org/wiki/Special:Export and I am looking for a programmatic approach to automate it. I want to save result as XML.

Here is my Wikipedia query. I started the following in Python, but that doesn't get any useful result.

#!/usr/bin/python

import urllib
import codecs

f =  codecs.open('workfile.xml', 'w',"utf-8" )

class AppURLopener(urllib.FancyURLopener):
    version = "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11"
urllib._urlopener = AppURLopener()

query = "http://en.wikipedia.org/w/index.php?title=Special:Export&action=submit"
data = { 'catname':'English-language_Indian_films','addcat':'', 'wpDownload':1 }
data = urllib.urlencode(data)
f = urllib.urlopen(query, data)
s = f.read()
print (s)

Original Q&A

There are 1 answers

**Snakes and Coffee** · Answer 1 · 2012-03-06T06:40:14+00:00

I would honestly suggest using Mechanize to get the page, then using lxml or another xml parser to get the information you want. Usually I use the firefox user-agent as many program user-agents are blocked. Note that with Mechanize you can actually fill out the form and "click" enter, then "click" export.

TechQA.

download articles from wikipedia using special export

There are 1 answers

Related Questions in PYTHON

Related Questions in XML

Related Questions in WIKIPEDIA

Related Questions in WIKIMEDIA

Popular Questions

Popular Tags

Trending Questions