I have a .rdf file (over 2gb compressed) that apparently has some duplicated IRIs in the middle, and perhaps other issues.
The following error in the workbench during import:
RDF Parse Error: ID '_D5C2483C53D3F747_up.name_uORF' has already been defined [line 6907110, column 53
Is there a tool to pre-process these huge files prior to import using some defined behavior, eg "just skip it", etc?
When you import files through the GraphDB Workbench, there's an "Advanced settings" foldout menu. Fold that out, it has several options you can enable or disable regarding validation, including "Should stop on error". I can't be sure that it will continue on this particular error if you disable that option (there are some syntax errors that the parser simply can't recover from), but it's worth a shot.