Using scrapy-playwright, how to create new context for each request?

419 views Asked by At

Initially, I wanted to change the user-agent of the context for each request, but I encountered difficulties in doing so. Now, I am attempting to create a new context for every request in scrapy-playwright. The problem arises when the number of concurrent contexts reaches the maximum limit set by the PLAYWRIGHT_MAX_CONTEXTS setting. In this particular case, only one context is successfully opened and processed, but subsequent contexts are not being created. As a result, the spider continues running without performing any further actions.

Any assistance or guidance would be greatly appreciated. Could someone please help me resolve this issue? Thank you in advance!

Here is my current code:

import scrapy
from .function.general import *

load_dotenv()

class TestSpider(scrapy.Spider):
    name = "test"
    custom_settings = {
        "PLAYWRIGHT_MAX_CONTEXTS" : 1
    }

    def start_requests(self):
        url = "https://bot.sannysoft.com"
        
        for i in range(2):
            yield scrapy.Request(
                url=url,
                callback=self.parse,
                dont_filter=True,
                meta=dict(
                    url=url,
                    playwright=True,
                    playwright_include_page=True,
                    playwright_page_init_callback=init_page,
                    number=i,
                    playwright_context= f"new-{i}",
                    playwright_context_kwargs= get_context_args()
                ),
            )

    async def parse(self, response):
        page = response.meta["playwright_page"]
        number = response.meta["number"]
        await page.screenshot(path=f"headless-test-result-{number}.png")
        await page.close()

0

There are 0 answers