Converting a CSV to RDF where one column is a set of values

1k views Asked by At

I want to convert a CSV to RDF.

One of the column of that CSV is, in fact, a set of values joined with a separator character (in my case, the space character).

Here is a sample CSV (with header):

col1,col2,col3
"A","B C D","John"
"M","X Y Z","Jack"

I would like the conversion process to create a RDF similar to this:

:A :aProperty :B, :C, :D; :anotherProperty "John".
:M :aProperty :X, :Y, :Z; :anotherProperty "Jack".

I usually use Tarql for CSV conversion.
It is fine to iterate per row.
But it has no feature to sub-iterate "inside" a column value.

SPARQL-Generate may help (with iter:regex and sub-generate, as far as a I understand). But I cannot find any example that matches my use case.

PS: may be RML can help too. But I have no prior knowledge of this technology.

3

There are 3 answers

0
Maxime Lefrançois On

You can test this query on the playground https://ci.mines-stetienne.fr/sparql-generate/playground.html and check it behaves as expected:

BASE <http://data.example.com/> 
PREFIX : <http://example.com/> 
PREFIX iter: <http://w3id.org/sparql-generate/iter/>
PREFIX fun: <http://w3id.org/sparql-generate/fn/>

GENERATE { 
  <{?col1}> :anotherProperty ?col3.
  GENERATE{
      <{?col1}> :aProperty <{ ?value }> ; 
  }
  ITERATOR iter:Split( ?col2 , " " ) AS ?value .
}
ITERATOR iter:CSVStream("http://example.com/file.csv", 20, "*") AS ?col1 ?col2 ?col3
0
Gregg Kellogg On

The Tabular Data Model and related specs target this use case, although as I recall, we didn't provide for combinations of valueUrl and separator to have sub-columns generate multiple URIs.

The metadata to describe this would be something like the following:

{
  "@context": "http://www.w3.org/ns/csvw",
  "url": "test.csv",
  "tableSchema": {
    "columns": [{
      "name": "col1",
      "titles": "col1",
      "datatype": "string",
      "required": true
    }, {
      "name": "col2",
      "titles": "col2",
      "datatype": "string",
      "separator": " "
    }, {
      "name": "col3",
      "titles": "col3",
      "datatype": "string",
      "propertyUrl": "http://example.com/anotherProperty",
      "valueUrl": "http://example.com/{col3}"
    }],
    "primaryKey": "col1",
    "aboutUrl": http://example.com/{col1}"
  }
}
0
Dylan Van Assche On

You can accomplish this with RML and FnO.

First, we need to access each row which can be accomplished with RML. RML allows you to iterate over each row of the CSV file (ql:CSV) with a LogicalSource. Specifying the iterator (rml:iterator) is not needed since the default iterator in RML is a row-based iterator. This results into the following RDF (Turtle):

<#LogicalSource>
    a rml:LogicalSource;
    rml:source "data.csv";
    rml:referenceFormulation ql:CSV.

The actually triples are generated with the help of a TriplesMap which uses the LogicalSource to retrieve the data from each CSV row:

<#MyTriplesMap>
    a rr:TriplesMap;
    rml:logicalSource <#LogicalSource>;

    rr:subjectMap [
        rr:template "http://example.org/{col1}";
    ];

    rr:predicateObjectMap [
        rr:predicate ex:aProperty;
        rr:objectMap <#FunctionMap>;
    ];

    rr:predicateObjectMap [
        rr:predicate ex:anotherProperty;
        rr:objectMap [
            rml:reference "col3";
        ];
    ].

The col3 CSV column be used to create the following triple:

<http://example.org/A> <http://example.org/ns#anotherProperty> "John".

However, the string in the CSV column col2 needs to be split first. This can be achieved with Fno (Function Ontology) and an RML processor which supports the execution of FnO functions. Such RML processor can be the RML Mapper, but other processors can be used too. The following RDF is needed to invoke an FnO function which splits the input string with a space as separator with our LogicalSource as input data:

<#FunctionMap>
    fnml:functionValue [
        rml:logicalSource <#LogicalSource>; # our LogicalSource
        rr:predicateObjectMap [
            rr:predicate fno:executes; 
            rr:objectMap [ 
                rr:constant grel:string_split # function to use
            ];
        ];
        rr:predicateObjectMap [
            rr:predicate grel:valueParameter;
            rr:objectMap [ 
                rml:reference "col2" # input string
            ];
        ];
        rr:predicateObjectMap [
            rr:predicate grel:p_string_sep;
            rr:objectMap [ 
                rr:constant " "; # space separator
            ];
        ];
    ].

The supported FnO functions by the RML mapper are available here: https://rml.io/docs/rmlmapper/default-functions/ You can find the function name and its parameters on that page.

Mapping rules

@base <http://example.org> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#> .
@prefix fno: <https://w3id.org/function/ontology#> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .
@prefix ex: <http://example.org/ns#> .

<#LogicalSource>
    a rml:LogicalSource;
    rml:source "data.csv";
    rml:referenceFormulation ql:CSV.


<#MyTriplesMap>
    a rr:TriplesMap;
    rml:logicalSource <#LogicalSource>;

    rr:subjectMap [
        rr:template "http://example.org/{col1}";
    ];

    rr:predicateObjectMap [
        rr:predicate ex:aProperty;
        rr:objectMap <#FunctionMap>;
    ];

    rr:predicateObjectMap [
        rr:predicate ex:anotherProperty;
        rr:objectMap [
            rml:reference "col3";
        ];
    ].

<#FunctionMap>
    fnml:functionValue [
        rml:logicalSource <#LogicalSource>;
        rr:predicateObjectMap [
            rr:predicate fno:executes; 
            rr:objectMap [ 
                rr:constant grel:string_split 
            ];
        ];
        rr:predicateObjectMap [
            rr:predicate grel:valueParameter;
            rr:objectMap [ 
                rml:reference "col2" 
            ];
        ];
        rr:predicateObjectMap [
            rr:predicate grel:p_string_sep;
            rr:objectMap [ 
                rr:constant " ";
            ];
        ];
    ].

Output

<http://example.org/A> <http://example.org/ns#aProperty> "B".
<http://example.org/A> <http://example.org/ns#aProperty> "C".
<http://example.org/A> <http://example.org/ns#aProperty> "D".
<http://example.org/A> <http://example.org/ns#anotherProperty> "John".
<http://example.org/M> <http://example.org/ns#aProperty> "X".
<http://example.org/M> <http://example.org/ns#aProperty> "Y".
<http://example.org/M> <http://example.org/ns#aProperty> "Z".
<http://example.org/M> <http://example.org/ns#anotherProperty> "Jack".

Note: I contribute to RML and its technologies.