Parsing an XML file with Nokogiri SAX parser

1k views Asked by At

I have done some reading and I have been trying to get certain data from a large XML file. The data looks like this:

<Provider ID="0042100323">
    <Last_Name>LastName</Last_Name>
    <First_Name>FirstName</First_Name>
    <Mdl_Name>Middle</Mdl_Name>
    <Gndr>M</Gndr>
</Provider>

I would like to write the start_element to add all of these to a record array like:

0042100323, LastName, FirstName, Middle, M

using something like:

def start_element name, attributes = []
    @@records << attributes[0][1] if name == "Provider"
end

How can I update my code to add the other tags to the array?

3

There are 3 answers

7
falsetru On BEST ANSWER

Use characters event to get text inside tag:

def characters(string)
  @records << string
end

UPDATE according to OP's comment:

to grab text selectively according to the containing tag; remember the last seen tag, and grab text only if the last seen tag is what you want.

def start_element(name, attributes = [])
  @records << attributes[0][1] if name == "Provider"
  @last_seen_tag = name
end

def characters(string)
  if ['Last_Name', 'First_Name', 'Mdl_Name', 'Gndr'].include? @last_seen_tag
    @records << string
  end
end

def end_element name
  @last_seen_tag = nil
end
0
Arup Rakshit On

You can take the below approach :-

require 'nokogiri'

@doc = Nokogiri::XML.parse <<-EOT
<Provider ID="0042100323">
    <Last_Name>LastName</Last_Name>
    <First_Name>FirstName</First_Name>
    <Mdl_Name>Middle</Mdl_Name>
    <Gndr>M</Gndr>
</Provider>
EOT

def start_element name, attributes = [], childs = []
  node = @doc.at("//*[local-name()='#{name}']")
  contents_of_childs = node.search("./child::*").each_with_object([]) do |n, a|
    a << n.text if childs.include?(n.name)
  end
  attributes.each_with_object([]) do |attr, a|
    a << node[attr] unless node[attr].nil?
  end + contents_of_childs
end

start_element('Provider', ['ID'], %w(Last_Name First_Name Mdl_Name Gndr))
# => ["0042100323", "LastName", "FirstName", "Middle", "M"]
0
Bala On

Variation to Arups answer:

require 'nokogiri'

@doc = Nokogiri::XML.parse <<-EOT
<Provider ID="0042100323">
    <Last_Name>LastName</Last_Name>
    <First_Name>FirstName</First_Name>
    <Mdl_Name>Middle</Mdl_Name>
    <Gndr>M</Gndr>
</Provider>
EOT

(@doc.at("//Provider")["ID"] + @doc.text).split 
 #=> ["0042100323", "LastName", "FirstName", "Middle", "M"]