HtmlUnit scraping google+ page javascript. Click show more button not working

509 views Asked by At

i am trying to scrap this page https://plus.google.com/115016587855962294424/about. Everything works fine but when i try to click show more to load more reviews nothing happens here is my code

final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24); 
page = webClient.getPage("https://plus.google.com/115016587855962294424/about"); 
assertEquals(200,page.getWebResponse().getStatusCode()); 
assertEquals("OK",page.getWebResponse().getStatusMessage()); 
System.out.println(page.getWebResponse().getStatusCode()); 

Clicking show more here

HtmlSpan advancedSearchAn = (HtmlSpan) page.getFirstByXPath("//*[@id=\"115016587855962294424-about-page\"]/div/div[1]/div/div/div[2]/div[3]/span[1]"); 
    page = advancedSearchAn.click(); 

but nothing happens i even tried

//            webClient.waitForBackgroundJavaScript(10 * 1000); 
//            webClient.setAjaxController(new NicelyResynchronizingAjaxController()); 
//            webClient.setAjaxController(new AjaxController(){ 
//                @Override 
//                public boolean processSynchron(HtmlPage page, WebRequest request, boolean async) 
//                { 
//                    return true; 
//                } 
//            }); 

Any suggestions ?

UPDATE:

*i was adviced to modify the incoming JavaScript code by subclass HttpWebConnection and override getResponse() as:*

new WebConnectionWrapper(webClient) { 
         public WebResponse getResponse(WebRequest request) throws IOException { 
      // System.out.println("content"); 
            WebResponse response = super.getResponse(request); 
        if          (request.getUrl().toExternalForm().contains("https://plus.google.com/115016587855962294424/about")) { 
           String content = response.getContentAsString("UTF-8"); 

        //change content -- what is need to be changed 

          System.out.println("content "+content); 
                   WebResponseData data = new WebResponseData(content.getBytes("UTF-8"), 
                           response.getStatusCode(), response.getStatusMessage(), response.getResponseHeaders()); 
                   response = new WebResponse(data, request, response.getLoadTime()); 
               } 
               System.out.println("content "+response.getContentAsString()); 
               return response; 
           } 

Any suggestions on how this can be done exactly and whats needed to be modified, i tried the following API's htmlunit jsoup webharvest selenium

1

There are 1 answers

1
coding_idiot On

Clicking more leads to the submission of an ajax request, which on return changes the DOM

Htmlunit's javascript support is not good, so just analyze the request being sent using a proxy tool and code it manually.

I use Fiddler as a proxy tool.