rdflib not parsing RDF/XML file

1.2k views Asked by At

I am trying to load and parse a very simple rdf file in xml format using rdflib. I don't think, it is parsing correctly. Here is my rdf/xml file,

<rdf:RDF xmlns:rdf="http://w3.org/1999/02/22-rdf-syntax-ns#" 
         xmlns:foaf="http://xmlns.com/foaf/0.1/"
>

  <foaf:Person>
    <foaf:name>Peter Parker</foaf:name>
  </foaf:Person>

</rdf:RDF>

My python script is here,

from rdflib import Graph

g = Graph()
g.parse("person_1.rdf", format="xml")

print(len(g))

print(g.serialize(format="xml").decode("u8"))

print("Test - 2")

And, here is the program output,

3
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns:foaf="http://xmlns.com/foaf/0.1/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:nodeID="Nababb97ad88341329a7cf22cec65c00c">
    <rdf:type rdf:resource="http://w3.org/1999/02/22-rdf-syntax-ns#RDF"/>
    <foaf:Person rdf:nodeID="Nfa7b9ab24fae4bcd9ffbaa13aeb733db"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="Nfa7b9ab24fae4bcd9ffbaa13aeb733db">
    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/name"/>
  </rdf:Description>
</rdf:RDF>

Test - 2

I don't see the name "Peter Parker" in the output. Am I doing something wrong. Thanks in advance.

3

There are 3 answers

2
yazz On

If you just want to retrieve the data, try the following method.

from simplified_scrapy import utils, SimplifiedDoc, req
xml = '''
<rdf:RDF xmlns:rdf="http://w3.org/1999/02/22-rdf-syntax-ns#" 
         xmlns:foaf="http://xmlns.com/foaf/0.1/"
>

  <foaf:Person>
    <foaf:name>Peter Parker</foaf:name>
  </foaf:Person>

</rdf:RDF>
'''
# xml = utils.getFileContent('person_1.rdf')
doc = SimplifiedDoc(xml)
print(doc.select('foaf:Person>foaf:name>text()'))
# Or
print(doc.select('foaf:name>text()'))
# Or
print(doc.select('foaf:name'))

Result:

Peter Parker
Peter Parker
{'tag': 'foaf:name', 'html': 'Peter Parker'}
0
Sofia Khwaja On

You will have surround the document with foafpersonal profile document attributes as shown below

<foaf:PersonalProfileDocument>
  <foaf:Person>
    <foaf:family_name>Peter Parker</foaf:family_name>
  </foaf:Person>
</foaf:PersonalProfileDocument>
</rdf:RDF>

then import the foaf namespace as below

from rdflib.namespace import FOAF , XSD
0
Nicholas Car On

I don't see any issue in parsing this RDF - it is valid RDF - but the reason you're not seeing good results when re-serializing is that it is terrible RDF that doesn't make sense! You need to ID the person node. If you know the URI of Peter Parker is http://example.com/person/pp then you can use:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <foaf:Person rdf:about="http://example.com/person/pp">
    <foaf:name>Peter Parker</foaf:name>
  </foaf:Person>
</rdf:RDF>

If you don't know the URI of Peter Parker, you can use a Blank Node:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <foaf:Person rdf:nodeID="ub2bL2C1">
    <foaf:name>Peter Parker</foaf:name>
  </foaf:Person>
</rdf:RDF>

But RDF's all about IDing things so far better to assign a URI to the person.