Remove invalid N-Quads from file in Jena

223 views Asked by At

I have a file containing N-Quads (using the schema.org vocabulary) and I want to load it into a TDB RDF-store, using Apache Jena's command line tools. The command that I'm using is:

tdbloader --loc <rdf_store_location> <file_to_load>

But during the loading, I got an error:

[line: 769293, col: 154] Illegal unicode escape sequence value: \" (0x22)

I also ran the validation tool from Jena command line tools:

riot --validate <file_to_load>

and indeed, there are at least 30 errors/warnings similar to that:

Bad IRI

The path contains a segment /../ not at the beginning of a relative reference, or it contains a /./ These should be removed

Is there a way to ignore invalid N-Quads, or to delete them, by using the command line tools (Jena or if you have knowledge of other)?

Otherwise the only option would be to do a script to remove the invalid characters. But besides the file is huge (60 GB), I guess this is very prone to errors.

0

There are 0 answers