Jsoup select not returning all elements

405 views Asked by At

I am new to Jsoup Library. I have html like this.

<tr class="srrowns"> 
 <td class="num"> <a name="y2015"> </a> 1 </td> 
 <td nowrap><a href="/cve/CVE-2015-4004/" title="CVE-2015-4004 security vulnerability details">CVE-2015-4004</a></td> 
 <td><a href="/cwe-details/119/cwe.html" title="CWE-119 - CWE definition">119</a></td> 
 <td class="num"> <b style="color:red"> </b> </td> 
 <td> DoS Overflow +Info </td> 
 <td>2015-06-07</td> 
 <td>2015-06-08</td> 
 <td>
  <div class="cvssbox" style="background-color:#ff8000">
   8.5
  </div></td> 
 <td align="center">None</td> 
 <td align="center">Remote</td> 
 <td align="center">Low</td> 
 <td align="center">Not required</td> 
 <td align="center">Partial</td> 
 <td align="center">None</td> 
 <td align="center">Complete</td> 
</tr>

when I run element.select("td"), it is returning

<td class="num"> <a name="y2015"> </a> 1 </td>
<td nowrap><a href="/cve/CVE-2015-4004/" title="CVE-2015-4004 security vulnerability details">CVE-2015-4004</a></td>
<td><a href="/cwe-details/119/cwe.html" title="CWE-119 - CWE definition">119</a></td>
<td class="num"> <b style="color:red"> </b> </td>
<td> DoS Overflow +Info </td>
<td>2015-06-07</td>
<td>2015-06-08</td>
<td>
 <div class="cvssbox" style="background-color:#ff8000">
  8.5
 </div></td>
<td align="center">None</td>
<td align="center">Remote</td>
<td align="center">Low</td>
<td align="center">Not required</td>
<td align="center">Partial</td>
<td align="center">Complete</td>

Obivously, deleting <td align="center">None</td> before "Complete". Is there any way that I could get all items from Jsoup Selector?

My code looks something like this in Scala.

val connection = Jsoup.connect(url).get() 
val treelist = connection.select("tr.srrowns:contains(CVE-2015-4001)")
val tree = tree.select("td") 

I just saw that Jsoup select is implemented using LinkedHashSet. My goal is to extract text from each tags using Jsoup.text().Is there a workaround for this or do I have to write a parser just for getting all nodes(including duplicates)?

Thank you very much.

1

There are 1 answers

0
Stephan On

Try this CSS selector:

tr.srrowns:has(td:contains(CVE-2015-4004)) > td

DEMO

http://try.jsoup.org/~vAgiHQY6TIJ5MSUzR-m_Y1GD5_U

SAMPLE CODE

var cve = "CVE-2015-4004";
val doc = Jsoup.connect(url).get() 
val tds = doc.select("tr.srrowns:has(td:contains(" + cve + ")) > td")

for( var td <- tds ){
   println( td.text() );
}