Goutte: How to submit a form with input populated via Javascript

1.3k views Asked by At

I have an asp.net form I want to automatically submit in order to scrape the result (BTW, everything I do is legal).

Some of the form drop-down input fields are populated on the fly via ajax. One field is a "Region" field, once you select a region the "City" drop-down is then populated. If I just try to input the city after I input the region via Goutte web-crawler, an invalid value error is raised.

Can this even be done via Goutte, or should I use something else?

The target form itself is written in asp.net with a viewstate and eventvalidation fields.

1

There are 1 answers

0
sepehr On

It's mostly hard or even impossible to interact with the sites which are heavily implemented using client-side javascript code. Headless browser emulators like Goutte and BrowserKit have no idea about such client-side code and are not able to execute it. You need to utilize a browser controller like Selenium or Sahi.

Have a look at Behat's Mink which has various drivers for both headless emulators as well as full-fledged browser controllers. Using its selenium2 driver, you can simply interact with the target page. Here's an example:

<?php
// You need to run selenium-*.jar for this to work.

use Behat\Mink\{Mink, Session, Driver\Selenium2Driver};

$mink = new Mink([
    'selenium2' => new Session(
        new Selenium2Driver('firefox', null, 'http://example.com')
    ),
));

$page = $mink->getSession('selenium2')->getPage();

$page->findField('regoin-select-field-name')
     ->selectOption('target-region-value');

$page->wait(5000, 'JS code to check if the select is now populated...')
     ->select('city-select-field-name)
     ->selectOption('target-city-value');

It's untested code, but you get the idea. Also, see how wait() works.

Also, you might want to have a look at facebook/php-webdriver; A PHP client for selenium webdriver, instead of utilizing Mink.