Nokogiri how to traverse every row of a table with two classes

146 views Asked by At

I am attempting to parse an HTML table using Nokogiri. The table is marked up well and has no structural issues except for table header is embedded as an actual row instead of using <thead>. The problem I have is that I want every row but the first row, as I'm not interested in the header, but everything that follows instead. Here's an example of how the table is structured.

<table id="foo">
<tbody>
  <tr class="headerrow">....</tr>
  <tr class="row">...</tr>
  <tr class="row_alternate">...</tr>
  <tr class="row">...</tr>
  <tr class="row_alternate">...</tr>
</tbody>
</table>

I'm interesting in grabbing only rows with the class row and row_alternate. However, this syntax is not legal in Nokogiri as far as I'm aware:

doc.css('.row .row_alternate').each do |a_row|
  # do stuff with a_row
end

What's the best way to solve this with Nokogiri?

3

There are 3 answers

0
Kimball On BEST ANSWER

I would try this:

doc.css(".row, .row_alternate").each do |a_row|
  # do stuff with a_row
end
0
igor_rb On

try doc.at_css(".headerrow").remove and then

doc.css("tr").each do |row| #some code end

0
Vitalii Elenhaupt On

A CSS selector can contain multiple components separated by comma:

A comma-separated list of selectors represents the union of all elements selected by each of the individual selectors in the list. (A comma is U+002C.) For example, in CSS when several selectors share the same declarations, they may be grouped into a comma-separated list. White space may appear before and/or after the comma.

doc.css('.row, .row_alternate').each do |a_row|
  p a_row.to_html
end

# "<tr class=\"row\">...</tr>"
# "<tr class=\"row_alternate\">...</tr>"
# "<tr class=\"row\">...</tr>"
# "<tr class=\"row_alternate\">...</tr>"