How can I create a Python Script with BeautifulSoup on Windows to download the highest resolution of each picture in a WIkimedia Commons folder?

Question

How can I create a Python Script with BeautifulSoup on Windows to download the highest resolution of each picture in a WIkimedia Commons folder?

160 views Asked by user3610033 At 27 January 2021 at 16:31

So, I'm a big fan of Gustave Doré, and I would like to download all his engravings from the Wikimedia Commons folders that are neatly organized.

So, given a Wikimedia Commons folder I need to download all the pictures in it in the highest resolution.

I started writing something, but I'm not that good, so it's just a template:

import os, requests, bs4

url = 'URL OF THE WIKIMEDIA COMMONS FOLDER'

os.makedirs('NAME OF THE FOLDER', exist_ok=True)
for n in range(NUMBER OF PICTURES IN THE PAGE - 1):
    print('I am downloading page number %s...' %(n+1))
    res = requests.get(url)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text, 'html.parser')

    #STUFF I STILL NEED TO ADD
    
print('Done')

For example, I would feed this as the URL of the folder:

https://commons.wikimedia.org/wiki/Category:Crusades_by_Gustave_Dor%C3%A9

Then I would like to click every link and go to the picture page, like this one:

https://commons.wikimedia.org/wiki/File:Astonishment_of_the_Crusaders_at_the_Wealth_of_the_East.jpg

And then download the 'original file' by clicking the link below the picture that says 'original file'. Except sometimes the pic has no higher resolution available, like in this case:

https://commons.wikimedia.org/wiki/File:Andel_krizaci.jpg

And it would just need to click the link below the picture to download it.

I am completely stuck, thanks in advance for your help!

Bonus points if the pic has the name stated in its page when saved

(e.g. in the second link the picture should be saved as 'Astonishment of the Crusaders at the Wealth of the East.jpg')

Original Q&A

There are 1 answers

**Epsi95** · Accepted Answer · 2021-01-27T17:11:13+00:00

Hey big fan of Gustave Doré, here is a way you can do it

r = requests.get('https://commons.wikimedia.org/wiki/Category:Crusades_by_Gustave_Dor%C3%A9')
soup = BeautifulSoup(r.text, 'html.parser')
links = [i.find('img').get('src') for i in soup.find_all('a', class_='image')]
links = ['/'.join(i.split('/')[:-1]).replace('/thumb', '') for i in links]
for l in links:
    im = requests.get(l)
    with open(l.split('/')[-1], 'wb') as f:
        f.write(im.content)

TechQA.

How can I create a Python Script with BeautifulSoup on Windows to download the highest resolution of each picture in a WIkimedia Commons folder?

There are 1 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in WIKIMEDIA-COMMONS

Popular Questions

Trending Questions