Problem Statement: I want to crawl this page : http://www.hongkonghomes.com/en/property/rent/the_peak/middle_gap_road/10305?page_no=1&rec_per_page=12&order=rental+desc&offset=0
Lets say I want to parse the address, that is "24, Middle Gap Road, The Peak, Hong Kong"
What I did: I first only tried to load using jsoup, but then I noticed that the page is taking some time to load. So, then I also plugged in HTMLUnit to wait for the page to load first
Code I wrote:
public static void parseByHtmlUnit() throws Exception{
String url = "http://www.hongkonghomes.com/en/property/rent/the_peak/middle_gap_road/10305?page_no=1&rec_per_page=12&order=rental+desc&offset=0";
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.waitForBackgroundJavaScriptStartingBefore(30000);
HtmlPage page = webClient.getPage(url);
synchronized(page) {
page.wait(30000);
}
try {
Document doc = Jsoup.parse(page.asXml());
String address = ElementsUtil.getTextOrEmpty(doc.select(".addr"));
System.out.println("address"+address);
} catch (Exception e) {
e.printStackTrace();
}
}
Expected output : In the console, I should get this output: address 24, Middle Gap Road, The Peak, Hong Kong
Actual output : address
How about this?