Is there a way to skip first x lines of a bz2 file in Python without calling next()?

Question

Is there a way to skip first x lines of a bz2 file in Python without calling next()?

296 views Asked by zadrozny At 05 July 2021 at 22:59

I'm trying to read the latest Wikidata dump while skipping the first, say, 100 lines.

Is there a better way to do this than calling next() repeatedly?

WIKIDATA_JSON_DUMP = bz2.open('latest-all.json.bz2', 'rt')

for n in range(100):
    next(WIKIDATA_JSON_DUMP)

Alternatively, is there a way to split up the file in bash by, say, using bzcat to pipe select chunks to smaller files?

Original Q&A

There are 2 answers

**Tom Morris** · Answer 1 · 2021-07-06T14:24:10+00:00

If it was compressed using something like bgzip, you can skip blocks, but they will contain a variable number of lines, depending on the compression ratio. For raw bzip files which are a single stream, I don't think you have any choice but to read and throw away the lines to be skipped, due to the nature of the compression format.

**Pineapples** · Answer 2 · 2021-07-11T14:55:33+00:00

Pineapples On 11 July 2021 at 14:55

You can try the following in bash, to skip the first 10 lines for example:

bzcat -d -c /tmp/myfile.bz2 | tail -n +11

Notice the tail gets the N+1 number of lines you want to skip.

TechQA.

Is there a way to skip first x lines of a bz2 file in Python without calling next()?

There are 2 answers

Related Questions in WIKIDATA

Related Questions in BZ2

Popular Questions

Trending Questions