Encountering Problems After Modifying Items in Scrapy Project

30 views Asked by At

I've previously developed a project that was functioning correctly. However, after changing the dictionary to "items," it began experiencing issues. While the page appears to be scraping correctly, with each link showing a status code of 200, it's not fetching profile data. I suspect there might be an issue with the parse_add_page function or something within the item.py file, as changes only seem to apply within that scope.

Here is my spider code:

import scrapy
from pakwheel.items import PakwheelItem

class PakSpider(scrapy.Spider):
    name = "pw"
    allowed_domains = ["www.pakwheels.com"]
    start_urls = ["https://www.pakwheels.com/used-cars/karachi/24857"]

    def parse(self, response):
        b_url = "https://www.pakwheels.com/used-cars/karachi/24857"
        for page in range(1, 100):#457):
            r_url = f"{b_url}?page={page}"
            yield scrapy.Request(url=r_url, callback=self.parse_add_page)

    def parse_add_page(self, response):
        for car_info in response.css('div.listing-unit'):
            pkwheel = PakwheelItem()
            pkwheel["Title"] = car_info.css("a.car-name.ad-detail-path::text").get()
            pkwheel["Price_in_lacs"] = car_info.css("div.price-details.generic-dark-grey::text").get()
            pkwheel["Auction_rating"] = car_info.css("span.auction-rating::text").get()
            pkwheel["img"] = car_info.css("div.total-pictures-bar.fs12 img::attr(src)").get()
            pkwheel["Model"] = car_info.css("ul.list-unstyled.search-vehicle-info-2.fs13 li:nth-of-type(1)::text").get()
            pkwheel["driven_in_kms"] = car_info.css("ul.list-unstyled.search-vehicle-info-2.fs13 li:nth-of-type(2)::text").get()
            pkwheel["Type"] = car_info.css("ul.list-unstyled.search-vehicle-info-2.fs13 li:nth-of-type(3)::text").get()
            pkwheel["Engine"] = car_info.css("ul.list-unstyled.search-vehicle-info-2.fs13 li:nth-of-type(4)::text").get()
            pkwheel["Transmission_type"] = car_info.css("ul.list-unstyled.search-vehicle-info-2.fs13 li:nth-of-type(5)::text").get()
            yield pkwheel

Here is my item.py file code:

import scrapy
from scrapy.item import Item,Field


class PakwheelItem(Item):
    # define the fields for your item here like:
    Title = Field()
    price_in_lacs=Field()
    Auction_rating=scrapy.Field()
    img=Field()
    Model=Field()
    driven_in_kms=Field()
    Type=Field()
    Engine=Field()
    Transmission_type=Field()
    

Output:

2024-02-16 13:00:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=46> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=42> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=43> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=40> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=39> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=45> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=44> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=41> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=38> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=36> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=34> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=35> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=37> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=33> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=32> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=31> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=29> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=28> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=30> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=27> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=26> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=24> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=25> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=23> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=22> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=21> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=20> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=13> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=15> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pakwheels.com/used-cars/karachi/24857?page=14> (referer: https://www.pakwheels.com/used-cars/karachi/24857)
2024-02-16 13:00:49 [scrapy.core.engine] INFO: Closing spider (finished)
2024-02-16 13:00:49 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 91747,
 'downloader/request_count': 101,
 'downloader/request_method_count/GET': 101,
 'downloader/response_bytes': 7336737,
 'downloader/response_count': 101,
 'downloader/response_status_count/200': 101,
 'elapsed_time_seconds': 20.229765,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2024, 2, 16, 21, 0, 49, 453296, tzinfo=datetime.timezone.utc),
 'httpcompression/response_bytes': 71393176,
 'httpcompression/response_count': 101,
 'log_count/DEBUG': 104,
 'log_count/INFO': 10,
 'request_depth_max': 1,
 'response_received_count': 101,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/200': 1,
 'scheduler/dequeued': 100,
 'scheduler/dequeued/memory': 100,
 'scheduler/enqueued': 100,
 'scheduler/enqueued/memory': 100,
 'start_time': datetime.datetime(2024, 2, 16, 21, 0, 29, 223531, tzinfo=datetime.timezone.utc)}
2024-02-16 13:00:49 [scrapy.core.engine] INFO: Spider closed (finished)
0

There are 0 answers