Find_all not not finding all clases

41 views Asked by At

I wrote this code to find all firms links, but it finds only first two, then it stops. Any idea why and how can I change it?

import requests
from bs4 import BeautifulSoup

url = "https://www.gelbeseiten.de/branchen/rechtsanwalt/mannheim"
req = requests.get(url)
src = req.text
soup = BeautifulSoup(src, "lxml")
all_firmas = soup.find_all("article", class_="mod mod-Treffer")
for i in all_firmas:
    i_2 = i.next_element.next_element
    print(i_2.get("href"))
print("Category done!")
3

There are 3 answers

2
rochard4u On BEST ANSWER

Following your link, only two articles have the class "mod mod-Treffer". The other articles have the class "mod mod-Treffer mod-Treffer--kurz"

The following code also get the other articles using regex (import re).

all_firmas = soup.find_all("article", class_=re.compile("mod mod-Treffer.+"))
0
sammyhawkrad On

Using one class works, since all the articles have the mod-Treffer and mod is also applied to other elements you can just find with mod-Treffer like this

all_firmas = soup.find_all("article", class_="mod-Treffer")

To be more specific you can go with

all_firmas = soup.find("div", id="gs_treffer").find_all("article", class_="mod-Treffer")
0
Reyot On

You can use just use select with CSS Selector. It is similar to find_all.

all_firmas = soup.select("article.mod.mod-Treffer")
for i in all_firmas:
    print(i.a["href"])