I am a beginner in JAVA SAX
. I have a large XML
file and I want to extract some information from it. below is the XML
file, what I want to extract and the code:
Extract from the XML
file:
...
<Synset baseConcept="3" id="mizaAj_n2AR">
<SynsetRelations>
<SynsetRelation relType="hyponym" targets="TaboE_n2AR"/>
<SynsetRelation relType="hyponym" targets="TaboE_n2AR"/>
<SynsetRelation relType="hypernym" targets="ragobap_n4AR"/>
<SynsetRelation relType="hypernym" targets="ragobap_n4AR"/>
<SynsetRelation relType="hypernym" targets="Tiybap_Aln~afos_n1AR"/>
<SynsetRelation relType="hypernym" targets="Tiybap_Aln~afos_n1AR"/>
</SynsetRelations>
<MonolingualExternalRefs>
<MonolingualExternalRef externalReference="04623612-n" externalSystem="PWN30"/>
</MonolingualExternalRefs>
</Synset>
<Synset baseConcept="3" id="ragobap_n4AR">
<SynsetRelations>
<SynsetRelation relType="antonym" targets="mizaAj_n2AR"/>
<SynsetRelation relType="antonym" targets="mizaAj_n2AR"/>
</SynsetRelations>
<MonolingualExternalRefs>
<MonolingualExternalRef externalReference="04624826-n" externalSystem="PWN30"/>
</MonolingualExternalRefs>
</Synset>
<Synset baseConcept="3" id="tasal~uT_n1AR">
<SynsetRelations>
<SynsetRelation relType="has_instance" targets="simap_n1AR"/>
<SynsetRelation relType="is_instance" targets="simap_n1AR"/>
</SynsetRelations>
<MonolingualExternalRefs>
<MonolingualExternalRef externalReference="04625882-n" externalSystem="PWN30"/>
</MonolingualExternalRefs>
</Synset>
...
I want:
hyponym: 2
hypernym: 4
antonym: 2
has_instance: 1
is_instance:1
The code (the main class and my handler):
import java.io.IOException;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
public class Main {
public static void main(String[] args) throws SAXException, IOException{
XMLReader p = XMLReaderFactory.createXMLReader();
p.setContentHandler(new handler());
p.parse("test1.xml");
}
----------------------------------------
import org.xml.sax.helpers.DefaultHandler;
public class handler extends DefaultHandler {
@Override
public void startElement(String SpacenameURI, String localName,
String qName, Attributes attrs) {
System.out.println("qname = " + qName);
String node = qName;
if (attrs != null) {
for (int i = 0; i < attrs.getLength(); i++) {
//nous récupérons le nom de l'attribut
String aname = attrs.getLocalName(i);
//Et nous affichons sa valeur
System.out.println("Attribut " + aname + " valeur : " + attrs.getValue(i));
}
}
}
}
This code uses a Stream reader, meaning it will only load one element at a time in memory. This makes it efficient, even for large files.
A map is used to keep track of the counts. Every time I encounter a "SynsetRelation" element I check first to see if it is already counted, then I increment the counter.
The result is map containing the counts per detected value.
You would use it like this in your main class: