Improving and Simplifying python BeautifulSoup code

147 views Asked by At

I have this code that uses BeautifulSoup to gather some data from a website

import requests
from bs4 import BeautifulSoup

url = "http://hearthstone.gamepedia.com/Patches"
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")

variable = soup.find('div',{"id":"mw-content-text"})
variable = variable.find_all('ul')[2]
variable = variable.find('li')
variable = variable.find_all('a')[1]

print(variable.text)

Output should be:

Patch 7.0.0.15590

in this order, I am able to locate the exact a tags that I want.

How could I make this a single line code in order to simplify it?

Variable = harsoup.find('div',{"id":"mw-content-text"}).find_all('ul')[2].find('li').find_all('a')[1]

I wanted to achieve something like this but it doest seem to work the same way.

1

There are 1 answers

0
宏杰李 On BEST ANSWER
soup.find_all(href=re.compile(r'/Patch_'))

out:

[<a href="/Patch_7.0.0.15590" title="Patch 7.0.0.15590">Patch 7.0.0.15590</a>,
 <a href="/Patch_6.2.0.15300" title="Patch 6.2.0.15300">Patch 6.2.0.15300</a>,
 <a href="/Patch_6.2.0.15181" title="Patch 6.2.0.15181">Patch 6.2.0.15181</a>,
 <a href="/Patch_6.1.3.14830" title="Patch 6.1.3.14830">Patch 6.1.3.14830</a>,
 <a href="/Patch_6.1.1.14406" title="Patch 6.1.1.14406">Patch 6.1.1.14406</a>,
 <a href="/Patch_6.0.0.13921" title="Patch 6.0.0.13921">Patch 6.0.0.13921</a>,
 <a href="/Patch_5.2.2.13807" title="Patch 5.2.2.13807">Patch 5.2.2.13807</a>,
 <a href="/Patch_5.2.0.13740" title="Patch 5.2.0.13740">Patch 5.2.0.13740</a>,
 <a href="/Patch_5.2.0.13714" title="Patch 5.2.0.13714">Patch 5.2.0.13714</a>,
 <a href="/Patch_5.2.0.13619" title="Patch 5.2.0.13619">Patch 5.2.0.13619</a>,
 <a href="/Patch_5.0.0.13030" title="Patch 5.0.0.13030">Patch 5.0.0.13030</a>,
 <a href="/Patch_5.0.0.12574" title="Patch 5.0.0.12574">Patch 5.0.0.12574</a>,
 <a href="/Patch_4.3.0.12266" title="Patch 4.3.0.12266">Patch 4.3.0.12266</a>,
 <a href="/Patch_4.2.0.12051" title="Patch 4.2.0.12051">Patch 4.2.0.12051</a>,
 <a href="/Patch_4.1.0.10956" title="Patch 4.1.0.10956">Patch 4.1.0.10956</a>,
 <a href="/Patch_4.0.0.10833" title="Patch 4.0.0.10833">Patch 4.0.0.10833 - The League of Explorers</a>,
 <a href="/Patch_3.2.0.10604" title="Patch 3.2.0.10604">Patch 3.2.0.10604</a>,
 <a href="/Patch_3.1.0.10357" title="Patch 3.1.0.10357">Patch 3.1.0.10357</a>,
 <a href="/Patch_3.0.0.9786" title="Patch 3.0.0.9786">Patch 3.0.0.9786 - The Grand Tournament Draws Near</a>,
 <a href="/Patch_2.8.0.9554" title="Patch 2.8.0.9554">Patch 2.8.0.9554</a>,
 <a href="/Patch_2.7.0.9166" title="Patch 2.7.0.9166">Patch 2.7.0.9166</a>,
 <a href="/Patch_2.6.0.8834" title="Patch 2.6.0.8834">Patch 2.6.0.8834</a>,

use re to fileter the a tag you want.

There are five filters that can be used in the find() or find_all():

  1. A string
  2. A regular expression
  3. A list
  4. True
  5. A function