Scrapy only outputting an open bracket

Question

Scrapy only outputting an open bracket

111 views Asked by ingleback At 18 June 2015 at 21:30

I'm trying to scrape the title and URL of all khan academy pages under the math/science/economics pages. However, currently it is only outputting an open bracket, and before this happened it would only scrape the start URL.

from openbar_index.items import OpenBarIndexItem
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor


class OpenBarSpider(CrawlSpider):
    """
    scrapes website URLs from educational websites and commits urls/webpage names/text to a document
    """

    name = 'openbar'
    allowed_domains = 'khanacademy.org'
    start_urls = [

        "https://www.khanacademy.org"

    ]

     rules = [

            Rule(SgmlLinkExtractor(allow = ['/math/']), callback='parse_item', follow = True),
             Rule(SgmlLinkExtractor(allow = ['/science/']), callback='parse_item', follow=True),
             Rule(SgmlLinkExtractor(allow = ['/economics-finance-domain/']), callback='parse_item', follow=True)
    ]

    def parse_item(self, response):

         item = OpenBarIndexItem()
         url = response.url
         item['url'] = url
         item['title'] = response.xpath('/html/head/title/text()').extract()
         yield item

Does anyone have an idea why this is happening or tips on how to fix it?

Original Q&A

There are 1 answers

**Frank Martin** · Answer 1 · 2015-06-20T12:11:29+00:00

The problem is the assignment to allowed_domains. This must not be a string but a list according to the documentation. With the string the potentially results are filtered by scrapy as offsite requests because there is no valid domain.

So adding square brackets like in next line should fix it

    allowed_domains = ['khanacademy.org']

TechQA.

Scrapy only outputting an open bracket

There are 1 answers

Related Questions in PYTHON

Related Questions in URL

Related Questions in WEB-CRAWLER

Related Questions in SCRAPY

Related Questions in SCRAPE

Popular Questions

Popular Tags

Trending Questions