Regular expression for Scrapy rules

2.1k views Asked by At

I want to crawl data from pages with format: http://www.vesselfinder.com/vessels?page=i where i is from 0 to some integer.

Is the following regex correct for this pattern:

start_urls = [
        "http://www.vesselfinder.com/vessels"
    ]

rules = (
    Rule(LinkExtractor(allow=r"com/vessels\?page=[1-100]"),
         callback='parse_item', follow=True),
)
1

There are 1 answers

4
Wiktor Stribiżew On BEST ANSWER

For the 1-100 range, you can use

r"com/vessels\?page=(?:[1-9][0-9]?|100)\b"

See demo

In case you need any number, just use \d+:

r"com/vessels\?page=\d+"

See demo 2