Delete specific line(pattern) from .gz file using python for large file size

880 views Asked by At

I am working with .gz extension file where I am required to delete specific pattern from the file with least processing time and not altering the file at all.

1

There are 1 answers

5
fferri On BEST ANSWER

Have you tried using gzip.GzipFile? Arguments are similar to open.

Example of reading a lines from a file and writing to another files if a certain condition does not match:

import gzip

with gzip.GzipFile('output.gz', 'w') as fout:
    with gzip.GzipFile('input.gz','r') as fin:
        for line in fin:
            if not your_remove_condition(line):
                fout.write(line)

Note that the input and output file must be different.