I want to crawl data from pages with format: http://www.vesselfinder.com/vessels?page=i
where i
is from 0
to some integer.
Is the following regex correct for this pattern:
start_urls = [
"http://www.vesselfinder.com/vessels"
]
rules = (
Rule(LinkExtractor(allow=r"com/vessels\?page=[1-100]"),
callback='parse_item', follow=True),
)
For the 1-100 range, you can use
See demo
In case you need any number, just use
\d+
:See demo 2