I'm trying to read the latest Wikidata dump while skipping the first, say, 100 lines.
Is there a better way to do this than calling next() repeatedly?
WIKIDATA_JSON_DUMP = bz2.open('latest-all.json.bz2', 'rt')
for n in range(100):
next(WIKIDATA_JSON_DUMP)
Alternatively, is there a way to split up the file in bash by, say, using bzcat to pipe select chunks to smaller files?
If it was compressed using something like bgzip, you can skip blocks, but they will contain a variable number of lines, depending on the compression ratio. For raw bzip files which are a single stream, I don't think you have any choice but to read and throw away the lines to be skipped, due to the nature of the compression format.