import.io and portia regex url patterns

506 views Asked by At

I am using data scrapers: Import.io & Portia.

They both allow you to define a regular expression for the crawler to abide by. for example the url: https://weedmaps.com/dispensaries/pdi-medical

how would I account for the ending "pdi-medical"?

I've looked all over and understand how to use regex in a JS environment, but I'm a little confused as to what I'd exactly put in the input on Portia/Import.io

Something like this? https://weedmaps.com/dispensaries//^[a-zA-Z0-9-_]+$/

1

There are 1 answers

2
Valdir Stumm Junior On

For Portia, if you want your crawler to follow any URLs starting with https://weedmaps.com/dispensaries/, you can just add a crawling rule with the following regex:

^https?://weedmaps.com/dispensaries/