HTMLUnit not working with Ajax/Javascript

1k views Asked by At

I am trying to extract data for a class project from a webpage (a page that shows search results). Specifically, it's this page:

http://www.target.com/c/xbox-one-games-video/-/N-55krw#navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to

I just want to extract the titles of the products.

I'm using the following code:

final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage(itemPageURL);
int tries = 20;  // Amount of tries to avoid infinite loop
while (tries > 0) {
    tries--;
    synchronized(page) {
       page.wait(2000);  // How often to check
    }
}
int numThreads = webClient.waitForBackgroundJavaScript(1000000l);

PrintWriter pw = new PrintWriter("test-target-search.txt");
pw.println(page.asXml());
pw.close();

The page that results does not have the product information that's shown on the web browser. I imagine the AJAX calls haven't completed? (not sure though.)

Any help would greatly be appreciated. Thanks!

1

There are 1 answers

5
Arya On BEST ANSWER

You can use GET requests for such task. Control the page by the "pageCount" and "offset" argument in the URL, after retrieving the page (the example below does this for one page) you can use regex or whatever the content is in (JSON?) to extract the titles.

public static void main(String[] args)
{
    try
    {
        WebClient webClient = new WebClient();

        URL url = new URL(
                "http://tws.target.com/searchservice/item/search_results/v1/by_keyword?callback=getPlpResponse&navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to");
        WebRequest requestSettings = new WebRequest(url, HttpMethod.GET);

        requestSettings.setAdditionalHeader("Accept", "*/*");
        requestSettings.setAdditionalHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
        requestSettings.setAdditionalHeader("Referer", "http://www.target.com/c/xbox-one-games-video/-/N-55krw");
        requestSettings.setAdditionalHeader("Accept-Language", "en-US,en;q=0.8");
        requestSettings.setAdditionalHeader("Accept-Encoding", "gzip,deflate,sdch");
        requestSettings.setAdditionalHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");

        Page page = webClient.getPage(requestSettings);

        System.out.println(page.getWebResponse().getContentAsString());
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }
}