Checking Item type passed to a Pipeline - scrapy.item.ItemMeta

75 views Asked by At

The Short Problem

When checking what type of an item is being passed to the pipeline, Scrapy is giving me a class of scrapy.item.ItemMeta instead of the seemingly obvious class.

The Context

pipelines.py:

def process_item(self, item, spider):
    print(type(item))
    print(type(WikiItem))

The above statement yields

<class 'MyScrapers.items.WikiItem'>
<class 'scrapy.item.ItemMeta'>

Why is the second type() statement not printing a value of WikiItem despite having the class explicitly passed? How can I make it so that they do match?

Additional Info

In the pipeline, there is a simple if statement using an if isintance() statement to perform actions depending on what kind of item is passed. The condition is never met. To debug this problem, I simply put two print statements to print the type of item being passed to the Scrapy Pipeline and the type of item it was being checked against.

Import Statement in pipelines.py

from openpyxl import Workbook
from items import ModelItem, WikiItem

Import Statement in items.py

import re
from scrapy.loader import ItemLoader
from itemloaders.processors import TakeFirst
from scrapy import Item, Field

...

class WikiItem(Item):
    model_number = Field(default='', output_processor = TakeFirst())

...
1

There are 1 answers

0
SuperUser On

In the first type() statement, you are printing the type of the item instance, which is an instance of WikiItem class. That's why it prints <class 'MyScrapers.items.WikiItem'>.

In the second type() statement, you are printing the type of WikiItem class itself, which is represented by ItemMeta in Scrapy framework. When you define a Scrapy item class like WikiItem, Scrapy generates a metaclass ItemMeta for it, which is responsible for creating instances of the item class. This metaclass is a subclass of Python's built-in type. So, when you print the type of WikiItem, it shows <class 'scrapy.item.ItemMeta'>.

You can use isinstance normally: if isinstance(item, WikiItem): # do something (See the code below).

How can I make it so that they do match?

If you specifically want to work with your class then just add parenthesis:

>>> from itemloaders.processors import TakeFirst
>>> from scrapy import Item, Field
>>>
>>> class WikiItem(Item):
...     model_number = Field(default='', output_processor = TakeFirst())
...
>>> example_item=WikiItem()
>>> print(type(example_item))
<class '__main__.WikiItem'>
>>> print(type(WikiItem)) # without parenthesis
<class 'scrapy.item.ItemMeta'>
>>> print(type(WikiItem())) # with parenthesis
<class '__main__.WikiItem'>
>>> isinstance(example_item, WikiItem) # regular isinstance
True