Python-- webscraping for the content in "expand" button with beautifulsoup

1.1k views Asked by At

I am scraping a yellow page to get the name of all physiotherapists in a city. With the url I get the list of 50 physiotherapists, however, when I expand the page, the url does not change. How do I get the full list of names?

This is how I get the list of physiotherapist in city of Rostock.

url = 'https://www.gelbeseiten.de/Suche/Physiotherapie%20praxis/Rostock'
req = requests.get(url, headers= header)
soup = BeautifulSoup(req.content, 'html.parser')

names = []

business_name = soup.find_all('h2', attrs ={"data-wipe-name":"Titel"})
for name in business_name:
    
    names.append(name.get_text())

At the buttom of the url there is a button called Mehr Anzeigen, basically saying "show more". If I click there, the number of entries for physiotherapists changes from 50-60. There are entries for 90 physiotherapists. When I click the button multiple times, showing all the entries, the button disappears. This lists all the physiotherapists in the city, I want to get this.

How do I get all the entries I get after clicking "show more"?

2

There are 2 answers

0
joni On BEST ANSWER

There's no need to use Selenium for this simple task. By using Chrome's developer tools, you can observe that the website uses a simple POST request to https://www.gelbeseiten.de/AjaxSuche when pressing the 'Mehr anzeigen' button containing the following data:

umkreis: -1
WAS: Physiotherapie praxis
WO: rostock
position: 51
anzahl: 10
sortierung: relevanz

The json response contains a html key containing all your search results. Additionally, there are gesamtanzahlTreffer and anzahlTreffer keys inside the response. Unfortunately, it's not possible to get all search results with a single POST request by setting position=0 and anzahl=100. However, the first POST request contains the first 50 results (similar to the website) and by each new POST request we can obtain the next 10 results.

Long story short, you can parse all the results like this:

def post_ajax_search(was: str, wo: str, pos: int):
    req = requests.post("https://www.gelbeseiten.de/AjaxSuche", data={
        'umkreis': -1, 'WAS': was, 'WO': wo, 'position': pos, 'sortierung': 'relevanz'})
    r = req.json()
    return [r[key] for key in ("gesamtanzahlTreffer", "html", "anzahlTreffer")]


def parse_html(html: str) -> list[str]:
    soup = BeautifulSoup(html, "lxml")
    return [i.text for i in soup.find_all("h2", {"data-wipe-name": "Titel"})]


def parser(was: str, wo: str) -> list[str]:
    total_treffer, html, parsed_treffer = post_ajax_search(was, wo, 0)
    all_items = parse_html(html)
    i = 0
    while parsed_treffer < total_treffer:
        _, html, treffer = post_ajax_search(was, wo, 51 + i)
        all_items += parse_html(html)
        parsed_treffer += treffer
        i += 10
    return all_items

for praxis in (praxen := parser("Physiotherapie praxis", "rostock")):
    print(praxis)

Output:

Göllner Sabine Krankengymnastik & Physiotherapie
Friemel Physiotherapie Inh. B. Neumann Krankengymnastik & Physiotherapie
Nehrenberg Dorothee Physiotherapie
Physiotherapiezentrum Marcel Frank
Silke Thiede Physiotherapie
Physiotherapie Kollmorgen
Buller Olaf Physiotherapie
Gemeinschaftspraxis Physiotherapie Möller & Norden
Physiotherapie Annekathrin Hinz
Physiotherapie Hinz Annekathrin Praxis für Physiotherapie
Physiotherapie K. Schuldt
Physiotherapie Richter Ralf-Uwe Physiotherapie
Sport-Physio Rostock, Inh. Tschiersch, Daniel Physiotherapie
Klimt Dagmar Physiotherapie
MedPrevio
Pause Andrea Physiotherapiepraxis
Sörgel Steffen
Doremans Monika Physiotherapie
Doremans Monika Physiotherapie
Friemel B. Physiotherapie
Physiotherapie Vital Speicher Katja Oestreich
Jürß Katherina Physiotherapie
Pietralczyk Regina Physiotherapie
Stoll Sven Physiotherapie
Tübbecke Carola Physiotherapie
Physiotherapie Reiser u. Behrens
Physiotherapeutische Praxis Rose
Arndt K. Physiotherapie
Arndt K. Physiotherapie
Hieke Gunnar Praxis für Physiotherapie
PTB Physiopraxis
PTB Physiopraxis
Physiotherapie Rhea Brüdigam
Duske Sandra
Achsnig Marion Physiotherapie
Berthold Physiopraxis
Bohn Katharina Praxis für Physiotherapie
Erdmann L. Physiotherapie
Hennig Heidlinde Physiotherapie
Klatt Gabriele Physiotherapie
Physio- & Hydrotherapie Evelyn Ruß-Deuschle
Physiometik-Physiotherapie und Kosmetik
PhysioPlus Martin Berthold
Physiotherapie Elke Wegener
Physiotherapie Inh. Doreen Bastian
Therapiewelten Fromm Inh. Andrea Fromm Physiotherapie
Therapiewelten Fromm Inh. Andrea Fromm Physiotherapie
Therapiewelten Fromm Inh. Andrea Fromm Physiotherapie
vital & physio GmbH Portwich, Rene & Kristina
Neumann Andre Physiotherapie
Physiotherapie Heike Braun u.Gisela Wessel-Schutz
Physiotherapie Monika Laasch
Physiotherapiepraxis Briese Inke u. Engel Katrin
Schawaller, Mertens Physiotherapie
Ahrens Ch. Hoffmann B. Kautz K. Wiechert M. Physiotherapiepraxis
Lenz Andrea Praxis für Physiotherapie
PhysioKiDa
Physiotherapie Birgit Paul
Physiotherapie Hirsch U.
Maaß Ingrid Physiotherapie
Physiotherapie Birgit Vogt
Müller Holger Physiotherapie
Physiotherapie A. Fischer-Pifrement
Physiotherapie Schuberth Simone
Skupin Anne, Praxis für Physiotherapie und Kinderphysiotherapie
Stoll Sven Physiotherapie
Physiotherapiepraxis Lasch
Physiotherapie Leyer
Simon Petra Physiotherapie
Erdmann Petra Physiotherapeutische Praxis
Doremans-Harms Monika Physiotherapie
Holz-Gräfe Ulrike Physiotherapie
Kannenberg u. Swensson Praxisgemeinschaft für Physiotherapie
Keßler Dirk Physiotherapie
Physiotherapie Ahrens Ch., Hoffmann B., Kautz K. u. Wiechert M.
Physiotherapie Dorit Schumacher Praxis für Physiotherapie
Physiotherapie Höhnerbach
Physiotherapie Kerstin Wikert Physiotherapeutin
Physiotherapie Kollmorgen
Physiotherapie Neumann
Physiotherapie Physikalische Therapie Inh. Karin Hellmuth
Physiotherapiepraxis Angela Keller
Pöschmann Kathleen Menschen"s"kinder Physiotherapie
PTB Physiopraxis
Roberto Kollmorgen
Rothkirch Physiotherapie Ramona
Schmidt Josephine Praxis für Physiotherapie
Stoll Sven Physiotherapie
Strauß Arne
Thoms Christiane Physiotherapie
2
Ashok Arora On

BeautifulSoup is an HTML parser.

If you need to click buttons on an HTML page, use a tool that utilizes a real browser, like selenium.

Incase if you don't wish to learn about Selenium, a hacky solution is to download the HTML after clicking the Mehr Anzeigen and then parse that using BeautifulSoup. Here's a paste of the HTML after all the 90 entries are displayed: https://pastebin.pl/view/raw/277d9ea1