print rdflib.Graph using serialize() in the same layout

1.2k views Asked by At

I'm having the following problem when using rdflib serialize() method to print the graph. The layout changes from the original file used to create the graph.

The code is as follows

from rdflib import Graph
mapping_graph = Graph().parse("valid_mapping.ttl", format="ttl")
print(mapping_graph.serialize(format="ttl").decode("utf-8"))

Which outputs

<file:///home/alex/Desktop/Mapping-Quality-Framework/Mapping-Quality-Model/valid_mapping.ttl#TripleMap1>  rr:logicalTable [ rr:tableName "people" ] ;
    rr:predicateObjectMap [ rr:objectMap [ rr:column "publications" ;
                    rr:language "en-GB" ] ;
            rr:predicate foaf:publications ;
            rr:termType rr:Literal ],
        [ rr:objectMap [ rr:column "age" ;
                    rr:datatype xsd:second ] ;
            rr:predicate foaf:age ],
        [ rr:objectMap [ rr:column "age" ;
                    rr:datatype xsd:third ;
                    rr:language "dhhdhd" ] ;
            rr:predicate dbo:equipment ] ;
    rr:subjectMap [ rr:class foaf:ggg ] .

While the input file is

<#TripleMap1>
    rr:logicalTable [ rr:tableName "people" ] ;
    rr:subjectMap [ rr:class foaf:ggg ];
    rr:predicateObjectMap [   rr:predicate foaf:publications ;
                              rr:termType rr:Literal;
                              rr:objectMap [ rr:column "publications" ;
                                           rr:language "en-GB" ] ;
                            ];
    rr:predicateObjectMap
        [   rr:predicate foaf:age;
            rr:objectMap [ rr:column "age" ;
                         rr:datatype xsd:second ] ;
            ];
    rr:predicateObjectMap
        [   rr:predicate dbo:equipment;
            rr:objectMap [ rr:column "age" ;
                    rr:datatype xsd:third;
                         rr:language "dhhdhd"] ; ] ;
.

The layout of the graph is changed by the serialize() method.

Any help would be gratefully appreciated.

1

There are 1 answers

0
Nicholas Car On

The comments by @UninformedUser are correct: you're asking for something that the Turtle syntax wasn't designed for. I've seen this issue - about different forms of serialization confusing people - come up a few times. Turtle isn't like JSON or even XML and other formats which can be sorted in a particular way. This is because, fundamentally, there is no ordering in RDF graphs. It is not possible to know, and thus repeatedly use, a single order for peer Blank Nodes for instance.

Your various Turtle files are isomorphic which, in graph terms, is as equal as things get!

One semi-solution is to implement a semi-deterministic serializer that orders things in particular ways, but this will always make assumptions about Blank Node IDs and so on. You could make such a serializer on top of RDFlib's serializer that takes in the RDFlib-serialized file - Turtle or N3 etc - and sorts it in some way. I've personally implemented such a sorter previously for Git diffing and sorted the Blank Nodes by a hash of their property values. You could rely on this for a specific scenario but perhaps not as a serializer for data in general.

You could also look at ways of communicating RDF data to your users that isn't static Turtle structure-dependent. You could write a small function that counts things in your graphs and reports on that basis for comparison, e.g.:

1 x rr:logicalTable 1 x rr:subjectMap ... 2 x rr:predicateObjectMap

Or, a more domain-specific thing:

list the rr:tableName & rr:column values from your data in some fixed format that allows for easier comparison.

Some scenario-specific reporting, rather than general Turtle, is my ultimate suggestion.

A more general approach, but harder, could be to use a constraints testing system like SHACL to inspect small graphs, like your Turtle files, and present/order/validate them in certain ways. SHACL has a presentation bent to it, not just validation, which is the main use case for it.