how to crowd source my web crawling

416 views Asked by At

My web application requires downloading content from the user URL specified. Currently this request go through my server, which is inefficient and could get my server IP blocked.

Is there a way to let the user download the URL content directly? The same-origin policy seems to prevent using AJAX or an iframe to download and reuse this content.

Any ideas? For example is there a way via flash to download and reuse URL content?

2

There are 2 answers

1
Martin v. Löwis On

If it's a specific web side, I recommend to talk to the website operators rather than trying to crawl anonymously.

1
Paul Dixon On

You could use Tor to mask your requests, but if you're having to go such lengths to crawl a website perhaps you shouldn't be doing it?

Also, with your approach the iframe request will include your page URL as the referrer, which makes identifying these requests at the server end pretty straightforward...