Python or bash script: if pattern in lines between two identical markers, remove lines and first marker

Question

Python or bash script: if pattern in lines between two identical markers, remove lines and first marker

124 views Asked by marsed At 20 June 2018 at 21:02

As a beginner, I'm trying to solve the following problem (bash or python script):

the file (~50G!):

marker
xxx
xxx
xxx
pattern
marker
xxx
xxx
xxx
marker
xxx
xxx
xxx
pattern

I would like to find a way to remove the lines between two markers + the first marker, but not the last occurrence of the marker IF no pattern can be found throughout the lines.

Wanted result:

marker
xxx
xxx
xxx
pattern
[empty!]
marker
xxx
xxx
xxx
pattern

I tried to solve it with regex or awk (that's a very shy beginning)

awk '/marker/{f=1} f; /marker/{f=1}' file

but I'm having a hardtime understanding how to implement that in a function that would solve the entire problem. It would make me very happy if someone could help me with that!

Cheers

Original Q&A

There are 1 answers

**killian95** · Accepted Answer · 2018-06-20T21:46:38+00:00

Here's a way to do it in python. Treat marker as a separator, then remove anything from the text snippets between that don't contain pattern

f = open('markerfile.txt','r')

lines = f.read().split('marker\n')
lines = [entry for entry in lines if 'pattern' in entry or not entry]
print 'marker\n'.join(lines)

Edit: the or not entry bit in the list comprehension just handles the case where marker is the first line in the file.

Edit 2: Here's a streaming version (better suited for large files.) It uses islice from itertools to get n lines of the file at a time. The rest of the algorithm is more or less the same.

from itertools import islice

f = open('markerfile.txt','r')
fout = open('markersout.txt','w')

n=5
while True:
    next_n_lines = ''.join(list(islice(f, n)))
    if not next_n_lines:
        break
    lines = next_n_lines.split('marker\n')
    lines = [entry for entry in lines if 'pattern' in entry or not entry]
    print >> fout, 'marker\n'.join(lines).strip()

f.close()
fout.close()

TechQA.

Python or bash script: if pattern in lines between two identical markers, remove lines and first marker

There are 1 answers

Related Questions in PYTHON

Related Questions in BASH

Related Questions in DESIGN-PATTERNS

Related Questions in MARKER

Related Questions in LINES

Popular Questions

Trending Questions