Chain multiple ajax requests in website to show more pages and get full list in single page

Question

Chain multiple ajax requests in website to show more pages and get full list in single page

40 views Asked by Alessandro Bitetto At 27 March 2024 at 13:57

I would like to get the full page https://icomarks.ai/icos/ when scrolling down while clicking Show More button. It should show around 8000 elements.

The Show More button activates a POST request 'https://icomarks.ai/icos/ajax_more'.

I tried both

import requests
from bs4 import BeautifulSoup

with requests.Session() as session:

    req = session.get('https://icomarks.ai/icos/')
    req = session.post('https://icomarks.ai/icos/ajax_more')
    req = session.post('https://icomarks.ai/icos/ajax_more')    # just for a couple
    soup = BeautifulSoup(req.content, "html.parser")

and

import requests
from bs4 import BeautifulSoup

s = requests.Session()
t=s.post('https://icomarks.ai/icos/')
r=s.get('https://icomarks.ai/icos/ajax_more')
r=s.get('https://icomarks.ai/icos/ajax_more')    # just for a couple
soup = BeautifulSoup(r.content, "html.parser")

with no success.

I expect that soup.find_all('a', class_="icoListItem__title") should find the elements in the list that must be loaded:

[<a class="icoListItem__title" href="/ico/5th-scape">5th Scape <sup class="sup_is_premium">★ Promoted</sup> <sup class="sup_views">128 Views</sup>
 </a>,
 <a class="icoListItem__title" href="/ico/pood-inu">Pood INU <sup class="sup_is_premium">★ Promoted</sup> <sup class="sup_views">330 Views</sup>
 </a>,
 <a class="icoListItem__title" href="/ico/etuktuk">eTukTuk <sup class="sup_is_premium">★ Promoted</sup> <sup class="sup_views">794 Views</sup>
...

Original Q&A

There are 2 answers

**srn** · Answer 1 · 2024-03-27T15:15:31+00:00

Neither can work, because each time you're overwriting the variable you store the result in: req and r respectively, so at best you keep the last reply. You need to process the return after each request or store them for later processing.

Basically right now, you're doing something like:

a = 1
a = 2
a = 3

And of course a will be 3, without any trace of the other integers.

A naive example to structure your code might look like this:

all_them_links = []
found_links = []

while True:
    with requests.Session() as session:
        req = session.post('https://icomarks.ai/icos/ajax_more')
        soup = BeautifulSoup(req.content, "html.parser")
        found_links = soup.find_all('a', class_="icoListItem__title")
        if not found_links:
            break
        all_them_links.extend(found_links)
        found_links = []

**Andrea Francia** · Answer 2 · 2024-03-27T17:31:59+00:00

Improving the response from srn. I've fixed the parsing as the response is actually a JSON object where property "content" contains the actual HTML and printing the href of the anchor <a> element.

import json
from pprint import pprint

import requests
from bs4 import BeautifulSoup


def main():
    all_them_links = []

    while True:
        with requests.Session() as session:
            req = session.post('https://icomarks.ai/icos/ajax_more')
            response = json.loads(req.content)
            pprint(response['offset'])
            soup = BeautifulSoup(response["content"], "html.parser")
            found_links = soup.find_all("a", class_="icoListItem__title")
            for a in found_links:
                all_them_links.append([a["href"], a.get_text()])
            else:
                break

    pprint(all_them_links)


main()

TechQA.

Chain multiple ajax requests in website to show more pages and get full list in single page

There are 2 answers

Related Questions in PYTHON

Related Questions in BEAUTIFULSOUP

Related Questions in PYTHON-REQUESTS

Popular Questions

Trending Questions