I am scraping a yellow page to get the name of all physiotherapists in a city. With the url I get the list of 50 physiotherapists, however, when I expand the page, the url does not change. How do I get the full list of names?
This is how I get the list of physiotherapist in city of Rostock.
url = 'https://www.gelbeseiten.de/Suche/Physiotherapie%20praxis/Rostock'
req = requests.get(url, headers= header)
soup = BeautifulSoup(req.content, 'html.parser')
names = []
business_name = soup.find_all('h2', attrs ={"data-wipe-name":"Titel"})
for name in business_name:
names.append(name.get_text())
At the buttom of the url there is a button called Mehr Anzeigen
, basically saying "show more". If I click there, the number of entries for physiotherapists changes from 50-60. There are entries for 90 physiotherapists. When I click the button multiple times, showing all the entries, the button disappears. This lists all the physiotherapists in the city, I want to get this.
How do I get all the entries I get after clicking "show more"?
There's no need to use Selenium for this simple task. By using Chrome's developer tools, you can observe that the website uses a simple POST request to
https://www.gelbeseiten.de/AjaxSuche
when pressing the 'Mehr anzeigen' button containing the following data:The json response contains a
html
key containing all your search results. Additionally, there aregesamtanzahlTreffer
andanzahlTreffer
keys inside the response. Unfortunately, it's not possible to get all search results with a single POST request by settingposition=0
andanzahl=100
. However, the first POST request contains the first 50 results (similar to the website) and by each new POST request we can obtain the next 10 results.Long story short, you can parse all the results like this:
Output: