Arbitrary strings considered acceptable as references in RDF N-Triples?

81 views Asked by At

In the W3C RDF 1.1 N-Triples doc, the IRIREF production used for non-literal subject/object/predicates is defined pretty much as just a string inside angle brackets (<>) [1], even though it's called an IRI.

Is this why some example files [2] have simple identifiers while other parsers such as RDFLib will throw an exception if the identifier isn't a valid IRI with a scheme: section? Are RDF files with non-literals that aren't valid IRIs still well-formed despite the terminology used in the RDF spec?

[1] https://www.w3.org/TR/n-triples/#grammar-production-IRIREF

[2] https://github.com/cayleygraph/cayley/blob/master/data/testdata.nq

1

There are 1 answers

0
Jeen Broekstra On

Is this why some example files have simple identifiers while other parsers such as RDFLib will throw an exception if the identifier isn't a valid IRI with a scheme: section?

No. The example file you point to is strictly speaking not syntactically correct N-Triples. In fact it's not N-Triples at all, but N-Quads - a different syntax format. But even if it were N-Triples, it would be incorrect to have IRIs in this form.

The N-Triples Recommendation says "IRIs may be written only as absolute IRIs" (see section 2.2) - absolute IRIs being defined syntactically in RFC 3987. This is normative, even if the grammar production itself does not enforce it.

However, the IRIs you see in that example file could be interpreted to be relative IRI references, and some N-Triples parsers have been extended somewhat to allow dereferencing of relative IRIs using a base IRI. This is probably why you see these kinds of syntactically incorrect N-Triples files in the wild sometimes. It's a non-standard extension of the format.

Are RDF files with non-literals that aren't valid IRIs still well-formed despite the terminology used in the RDF spec?

This depends on the syntax format you're using. Well-formedness being a property of a document in a specific concrete syntax, the rules are different for N-Triples, than they are for, say, Turtle, or RDF/XML.

RDF itself in its abstract syntax enforces that IRIs are conformant to RFC3987 and absolute, so any RDF document that, when processed, produces unresolved relative IRIs, or IRIs that are not conformant to the RFC is, if perhaps not non-wellformed, certainly invalid.

Some concrete syntax formats (such as RDF/XML, TriG, Turtle, etc.) provide different shorthand mechanisms for IRIs (prefixed names, relative IRIs + base IRI, etc). However, as we have seen above, N-Triples has no such shorthand mechanism built-in, so any IRI that is non-absolute makes the document non-wellformed.