HTMLUnit not working with Ajax/Javascript

Question

HTMLUnit not working with Ajax/Javascript

1k views Asked by Aswath Manoharan At 10 June 2015 at 01:49

I am trying to extract data for a class project from a webpage (a page that shows search results). Specifically, it's this page:

http://www.target.com/c/xbox-one-games-video/-/N-55krw#navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to

I just want to extract the titles of the products.

I'm using the following code:

final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage(itemPageURL);
int tries = 20;  // Amount of tries to avoid infinite loop
while (tries > 0) {
    tries--;
    synchronized(page) {
       page.wait(2000);  // How often to check
    }
}
int numThreads = webClient.waitForBackgroundJavaScript(1000000l);

PrintWriter pw = new PrintWriter("test-target-search.txt");
pw.println(page.asXml());
pw.close();

The page that results does not have the product information that's shown on the web browser. I imagine the AJAX calls haven't completed? (not sure though.)

Any help would greatly be appreciated. Thanks!

Original Q&A

There are 1 answers

**Arya** · Accepted Answer · 2015-06-11T21:29:49+00:00

You can use GET requests for such task. Control the page by the "pageCount" and "offset" argument in the URL, after retrieving the page (the example below does this for one page) you can use regex or whatever the content is in (JSON?) to extract the titles.

public static void main(String[] args)
{
    try
    {
        WebClient webClient = new WebClient();

        URL url = new URL(
                "http://tws.target.com/searchservice/item/search_results/v1/by_keyword?callback=getPlpResponse&navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to");
        WebRequest requestSettings = new WebRequest(url, HttpMethod.GET);

        requestSettings.setAdditionalHeader("Accept", "*/*");
        requestSettings.setAdditionalHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
        requestSettings.setAdditionalHeader("Referer", "http://www.target.com/c/xbox-one-games-video/-/N-55krw");
        requestSettings.setAdditionalHeader("Accept-Language", "en-US,en;q=0.8");
        requestSettings.setAdditionalHeader("Accept-Encoding", "gzip,deflate,sdch");
        requestSettings.setAdditionalHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");

        Page page = webClient.getPage(requestSettings);

        System.out.println(page.getWebResponse().getContentAsString());
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }
}

TechQA.

HTMLUnit not working with Ajax/Javascript

There are 1 answers

Related Questions in JAVASCRIPT

Related Questions in AJAX

Related Questions in HTMLUNIT

Popular Questions

Popular Tags

Trending Questions