I have an asp.net form I want to automatically submit in order to scrape the result (BTW, everything I do is legal).
Some of the form drop-down input fields are populated on the fly via ajax. One field is a "Region" field, once you select a region the "City" drop-down is then populated. If I just try to input the city after I input the region via Goutte web-crawler, an invalid value error is raised.
Can this even be done via Goutte, or should I use something else?
The target form itself is written in asp.net with a viewstate
and eventvalidation
fields.
It's mostly hard or even impossible to interact with the sites which are heavily implemented using client-side javascript code. Headless browser emulators like Goutte and BrowserKit have no idea about such client-side code and are not able to execute it. You need to utilize a browser controller like Selenium or Sahi.
Have a look at Behat's Mink which has various drivers for both headless emulators as well as full-fledged browser controllers. Using its selenium2 driver, you can simply interact with the target page. Here's an example:
It's untested code, but you get the idea. Also, see how
wait()
works.Also, you might want to have a look at facebook/php-webdriver; A PHP client for selenium webdriver, instead of utilizing Mink.