Web crawling class hidden fields

1k views Asked by At

I'm new to this. After testing several websites with my crawler I came across the following:

<div class="originalCurrencyInformation">                            
<label class="Hidden original-price">Price: £500</label>

Note the class:Hidden rather than type="Hidden". How can I retrieve the price using any library but my preference is Jsoup.

Here is an example snippet of code:

Document doc = Jsoup.connect("http://www.example.org")
                            .timeout(3000).get();
    Elements tags = doc.select("div.originalCurrencyInformation > Label.original-price");
    for(Element tag: tags){
                   System.out.println(tag);
    }

Update

I have tried Label.Hidden original-price and Label.Hidden.original-price but the value returns null, that's what I'm getting

1

There are 1 answers

4
David Conrad On BEST ANSWER

In your example, the original-price is not in a div, so it's not clear why you are looking for div.original-price. You can use:

doc.select("div.originalCurrencyInformation > label.Hidden.original-price")

To select the labels that have both the "Hidden" and "original-price" classes.

You can then use:

tag.text()

To get just the text from an element.