Chain multiple ajax requests in website to show more pages and get full list in single page

40 views Asked by At

I would like to get the full page https://icomarks.ai/icos/ when scrolling down while clicking Show More button. It should show around 8000 elements.

The Show More button activates a POST request 'https://icomarks.ai/icos/ajax_more'.

I tried both

import requests
from bs4 import BeautifulSoup

with requests.Session() as session:

    req = session.get('https://icomarks.ai/icos/')
    req = session.post('https://icomarks.ai/icos/ajax_more')
    req = session.post('https://icomarks.ai/icos/ajax_more')    # just for a couple
    soup = BeautifulSoup(req.content, "html.parser")

and

import requests
from bs4 import BeautifulSoup

s = requests.Session()
t=s.post('https://icomarks.ai/icos/')
r=s.get('https://icomarks.ai/icos/ajax_more')
r=s.get('https://icomarks.ai/icos/ajax_more')    # just for a couple
soup = BeautifulSoup(r.content, "html.parser")

with no success.

I expect that soup.find_all('a', class_="icoListItem__title") should find the elements in the list that must be loaded:

[<a class="icoListItem__title" href="/ico/5th-scape">5th Scape <sup class="sup_is_premium">★ Promoted</sup> <sup class="sup_views">128 Views</sup>
 </a>,
 <a class="icoListItem__title" href="/ico/pood-inu">Pood INU <sup class="sup_is_premium">★ Promoted</sup> <sup class="sup_views">330 Views</sup>
 </a>,
 <a class="icoListItem__title" href="/ico/etuktuk">eTukTuk <sup class="sup_is_premium">★ Promoted</sup> <sup class="sup_views">794 Views</sup>
...
2

There are 2 answers

0
srn On

Neither can work, because each time you're overwriting the variable you store the result in: req and r respectively, so at best you keep the last reply. You need to process the return after each request or store them for later processing.

Basically right now, you're doing something like:

a = 1
a = 2
a = 3

And of course a will be 3, without any trace of the other integers.

A naive example to structure your code might look like this:

all_them_links = []
found_links = []

while True:
    with requests.Session() as session:
        req = session.post('https://icomarks.ai/icos/ajax_more')
        soup = BeautifulSoup(req.content, "html.parser")
        found_links = soup.find_all('a', class_="icoListItem__title")
        if not found_links:
            break
        all_them_links.extend(found_links)
        found_links = []
0
Andrea Francia On

Improving the response from srn. I've fixed the parsing as the response is actually a JSON object where property "content" contains the actual HTML and printing the href of the anchor <a> element.

import json
from pprint import pprint

import requests
from bs4 import BeautifulSoup


def main():
    all_them_links = []

    while True:
        with requests.Session() as session:
            req = session.post('https://icomarks.ai/icos/ajax_more')
            response = json.loads(req.content)
            pprint(response['offset'])
            soup = BeautifulSoup(response["content"], "html.parser")
            found_links = soup.find_all("a", class_="icoListItem__title")
            for a in found_links:
                all_them_links.append([a["href"], a.get_text()])
            else:
                break

    pprint(all_them_links)


main()