I am using Rails 3.2 and Hpricot.
I’d like to find an XML element by the content of its child element and convert it to a Ruby object, which later shall be rendered.
In other words, I’d like to find the ‘vehicle’ element where its child ‘line_number’ content equals 1234.
This worked fine with REXML and following xPath:
/gsip/vehicle[line_number[text()=1234]]
REXML is slow, so I switched to Hpricot where the same xPath finds all vehicle elements not only the one where ‘line_number’ equals 1234.
Why does this find all vehicles?
file_path = Rails.root.join('public','gsip','gsip-vehicle-data.xml')
q = "/gsip/vehicle[line_number[text()=#{params[:id]}]]"
@vehicle_data = { :date => Date.today - 10.years } # initiate with very old date
xmldoc = File.read(file_path)
doc = Hpricot::XML(xmldoc)
doc.search(q) do |e|
if e.at('line_number').innerText == params[:id] # This line shouldn't be necessary?!
logger.info( "#{e.at('pa_number').innerText} (#{e.at('line_number').innerText} from #{e.at('date').innerText})" )
vehicle_date = Date.strptime(e.at('date').innerText, "%d.%m.%Y")
#logger.info('date: ' + vehicle_date.to_s)
if vehicle_date > @vehicle_data[:date]
e.children.select do |n|
logger.info("#{n.name} = #{n.innerText}")
@vehicle_data[n.name] = n.innerText
end
end
end
end
This finds the searched vehicle, but is slow:
file_path = Rails.root.join('public','gsip','gsip-vehicle-data.xml')
q = "/gsip/vehicle[line_number[text()=#{params[:id]}]]"
@vehicle_data = { :date => Date.today - 10.years } # initiate with very old date
XPath.each(xmldoc, q ) { |e|
#find the latest vehicle with given line_number
vehicle_date = Date.strptime(XPath.first(e,'date').text, "%d.%m.%Y")
if vehicle_date > @vehicle_data[:date]
e.elements.each { |n|
@vehicle_data[n.name] = n.text
}
end
}
My XML:
<gsip export_date="7/25/2012 12:04:27 PM" schema_version="1.01">
<vehicle id="ABC">
<date>02.07.2012</date>
<line_number>1234</line_number>
<pa_number>ABC</pa_number>
<vin>VIN</vin>
<my>2012</my>
</vehicle>
<vehicle id="ABD">
<date>02.07.2012</date>
<line_number>8348</line_number>
<pa_number>ABD</pa_number>
<vin>VIN</vin>
<my>2012</my>
</vehicle>
<vehicle>
...
</vehicle>
...
</gsip>
UPDATE
My switch to Nokogiri:
My request (localhost) has gone down from 4seconds to 250ms. My XML File is 5.6MB. Since it might be helpful for others I pasted my changes below:
class IncidentsController < ApplicationController
require 'nokogiri'
# ....
def vehicle
# helpfull links: =============================================================================
# Some say Nokogire is best: http://nokogiri.org/
# recursive link: http://stackoverflow.com/questions/11665126/why-xpath-search-works-in-rexml-but-not-with-hpricot
# =============================================================================================
# check if PA Number or Line Number is given:
num = ''
if params[:id] =~ /^\d{4}$/
num = 'line_number'
elsif params[:id] =~ /^[\d\w]{6}$/
num = 'pa_number'
elsif params[:id] =~ /^[\d\w]{17}$/
num = 'vin'
end
# read Vehicle Data from XML File
file_path = Rails.root.join('private','gsip','gsip-vehicle-data.xml')
q = "/gsip/vehicle[#{num}/text()='#{params[:id]}']"
@vehicle_data = { :date => Date.today - 10.years } # initiate with very old date
#logger.info("*** Find Vehicle Data in XML. xPath: #{q}")
doc = Nokogiri::XML( File.read(file_path) )
doc.xpath(q).each do |e|
vehicle_date = Date.strptime(e.xpath('date').first.content, "%d.%m.%Y")
#logger.info("Date: #{vehicle_date.to_s}")
if vehicle_date > @vehicle_data[:date]
e.element_children.all? do |n|
@vehicle_data[n.name] = n.content
end
end
end
respond_to do |format|
format.html { redirect_to connectors_path }
format.json { render :json => @vehicle_data }
format.xml { render :xml => @vehicle_data }
end
end
# ...
end
I'm new with Rails, so further comments on my code are welcome!
Hpricot was wonderful when it first came on the scene because it introduced the CSS selector syntax to HTML parsing. However, it wasn't ever completely XPath compliant, particularly around XPath predicate syntax, which you are using.
I would suggest Nokogiri. This library is fast and well-maintained, and is fully XPath 1.0 compliant. With it you should be able to pull the vehicle:
Also, a slight simplification: you really don't need nested predicates. This will also identify the correct vehicle: