i am trying to scrap this page https://plus.google.com/115016587855962294424/about. Everything works fine but when i try to click show more to load more reviews nothing happens here is my code
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
page = webClient.getPage("https://plus.google.com/115016587855962294424/about");
assertEquals(200,page.getWebResponse().getStatusCode());
assertEquals("OK",page.getWebResponse().getStatusMessage());
System.out.println(page.getWebResponse().getStatusCode());
Clicking show more here
HtmlSpan advancedSearchAn = (HtmlSpan) page.getFirstByXPath("//*[@id=\"115016587855962294424-about-page\"]/div/div[1]/div/div/div[2]/div[3]/span[1]");
page = advancedSearchAn.click();
but nothing happens i even tried
// webClient.waitForBackgroundJavaScript(10 * 1000);
// webClient.setAjaxController(new NicelyResynchronizingAjaxController());
// webClient.setAjaxController(new AjaxController(){
// @Override
// public boolean processSynchron(HtmlPage page, WebRequest request, boolean async)
// {
// return true;
// }
// });
Any suggestions ?
UPDATE:
*i was adviced to modify the incoming JavaScript code by subclass HttpWebConnection and override getResponse() as:*
new WebConnectionWrapper(webClient) {
public WebResponse getResponse(WebRequest request) throws IOException {
// System.out.println("content");
WebResponse response = super.getResponse(request);
if (request.getUrl().toExternalForm().contains("https://plus.google.com/115016587855962294424/about")) {
String content = response.getContentAsString("UTF-8");
//change content -- what is need to be changed
System.out.println("content "+content);
WebResponseData data = new WebResponseData(content.getBytes("UTF-8"),
response.getStatusCode(), response.getStatusMessage(), response.getResponseHeaders());
response = new WebResponse(data, request, response.getLoadTime());
}
System.out.println("content "+response.getContentAsString());
return response;
}
Any suggestions on how this can be done exactly and whats needed to be modified, i tried the following API's htmlunit jsoup webharvest selenium
Clicking more leads to the submission of an ajax request, which on return changes the DOM
Htmlunit's
javascript support is not good, so just analyze the request being sent using a proxy tool and code it manually.I use Fiddler as a proxy tool.