Convert anyURI-typed string representations of CURIes to real CURIes or IRIs

68 views Asked by At

I have triples like this, where the object is an anyURI-typed string representation of a CURIe. I would like to construct the triples with the object as a true CURIe or IRI.

@prefix source: <https://example.org/source> .
@prefix external: <https://example.org/external> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

source:sample1 source:external_identifiers "external:0110680"^^xsd:anyURI .
  • IRI(?o) returns nothing.
  • IRI(str(?o)) returns <external:0110680>
    • but I want <https://example.org/external/0110680>
  • This question mentions tarql:expandPrefixedName, but when I try that (with the prefix or just as bare expandPrefixedName) I get the following error message in arq or GraphDB. I assume that's because the tarql functions aren't available in those tools?

MALFORMED QUERY: Lexical error at line 12, column 28. Encountered: '40' (40), after prefix "expandPrefixedName"

I would prefer to do this in SPARQL, but would also try a Python solution using something like rdflib.

2

There are 2 answers

3
Stefan - brox IT-Solutions On

To convert it to an IRI, you could use:

BIND( IRI(REPLACE(STR(?o), "^external:", STR(external:))) AS ?o_iri ) .
  • REPLACE() replaces the string "external:" (i.e., the prefix label; ^ represents the beginning of the value) in STR(?o) with STR(external:)
  • STR(?o) converts ?o ("external:0110680"^^xsd:anyURI) to a string ("external:0110680")
  • STR(external:) takes the prefix IRI (<https://example.org/external>) and converts it to a string ("https://example.org/external")
  • IRI() converts the replaced string to an IRI

If you have a few different prefixes, you could use something like this:

{
  FILTER( STRSTARTS(STR(?o), "foo:") ) .
  BIND( IRI(REPLACE(STR(?o), "^foo:", STR(foo:))) AS ?o_iri ) .
}
UNION
{
  FILTER( STRSTARTS(STR(?o), "bar:") ) .
  BIND( IRI(REPLACE(STR(?o), "^bar:", STR(bar:))) AS ?o_iri ) .
}

(Instead of a FILTER, you could use IF inside the BIND.)

Another option could be COALESCE with nested IFs.

0
IS4 On

If your triple store supports backreferences, you can prepare the prefix mapping beforehand and replace them all without complicating the query itself:

BIND("b:part" AS ?curi)
BIND("^a: urn:a: ^b: urn:b:" AS ?prefixes)

BIND(CONCAT(?prefixes, "|", ?curi) AS ?src)
BIND(REPLACE(?src, "\\^([^:]*:) ([^ ]+).*\\|\\1", "|$2") AS ?replaced)
BIND(REPLACE(?replaced, "^.*\\|", "") AS ?result)

This prepares a string in the form (^prefix: URI)...|CURIE and looks for a prefix subsequently appearing in the CURIE, replacing it with the full URI. The characters ^ | are chosen as delimiters because they are disallowed in URIs. Lastly, cleanup is performed to remove the (rest of the) prefix mapping.

If no prefix is found, this will leave the CURIE as it is. In case you want to detect that, you can change "|$2" to "`$2" and "^.*\\|", "" to "^.*([|`])", "$1" ‒ the result will start with either | or `, ready for more checks.