Delete 2 lines of text before the line containing the matching pattern using sed?

168 views Asked by At

I have a file containing the following lines:

aaa
bbb
ccc
pattern
eee
fff
ggg
pattern
hhh

I would like to delete 2 lines before the last matching pattern in the file. The expected output is:

aaa
bbb
ccc
pattern
eee
pattern
hhh

I tried - sed -i '/pattern/{N;N;d;}' file but it didn't work. There was no change to the file.

I also tried - tac file | sed '/pattern/,+2 d' | tac > tmpfile && mv tmpfile file but this also deleted the line containing the matching pattern.

My sed version is sed (GNU sed) 4.7.

Any help would be much appreciated. Thanks.

8

There are 8 answers

3
Shawn On BEST ANSWER

sed is the wrong tool for this. Any time you want to edit a file, especially if you want to look backwards after some matching bit, ed is almost always a better option, as it's designed to work with files, not a stream of lines always moving forward.

ed -s file.txt <<'EOF'
?pattern?-2;+1 d
w
EOF

or if a heredoc isn't convenient

printf '%s\n' '?pattern?-2;+1 d' w | ed -s file.txt

will first set the current line to the one two before the last one matching pattern, and then delete that line and the one following (So the two lines preceding that last match of pattern), and finally write the modified file back out.

0
Sebastian Carlos On

Edit: HatLess's sed solution looks much better to me.

I agree with the Shawn's answer; sed is not the best tool for the job. But here's a solution with sed:

Have a script.sed file:

# read the full file into the pattern space
:1
$! { N ; b1 }

# replace last occurrence of "2 lines plus pattern line"
# with just the pattern line
s/(.*\n.*\n)(pattern\n?.*)\'/\2/m

Run it like this:

sed -E -f script.sed file.txt

Or in a single line like this:

sed -E ':1 ; $! { N ; b1 } ; s/(.*\n.*\n)(pattern\n?.*)\'\''/\2/m' text

The basic idea is that, because we need to work on the latest pattern in the file, we need to read the entire file before modifying it.

The first two lines are a loop using sed's goto-like commands:

  • :1 creates a label called 1.
  • $! makes sure that we run the following commands for every line except the last one.
    • N reads the next line.
    • b1 jumps to the label 1.

The following substitution command will only run on the last line. Note the following:

  • We don't need to escape the capturing group parentheses (\( and \)) because we call sed with the -E flag which turns on the Extended Regular Expression syntax.
  • We pass the flag m to the substitute command, which makes the regex work in multiline mode. In our case, this provides the following characteristics:
    • The dot (.) no longer matches newline characters (\n). This is useful in our case because we want to be explicit about the number of lines we match.
    • It enables the special \' character (a sed-only feature), which matches the end of the buffer. We need this to anchor our regex to the end of the file.
  • Also note the \n? after pattern. Because sed reads lines without the trailing new line, this is a way to match a "pattern" that might be either the last line or a line in the middle of the file.
0
Daweo On

I would harness GNU AWK for this task following way, let file.txt content be

aaa
bbb
ccc
pattern
eee
fff
ggg
pattern
hhh

then

awk '{arr[NR]=$0}/pattern/{ln=NR}END{for(i=1;i<=NR;i+=1){if(i+2!=ln&&i+1!=ln){print arr[i]}}}' file.txt

gives output

aaa
bbb
ccc
pattern
eee
pattern
hhh

Explanation: I store lines of file.txt in array arr with keys being their numbers, if pattern found I set ln variable to line of number. After all lines are stores I iterate over arr print lines whose numbers are not ln less 1 and ln less 2.

(tested in GNU Awk 5.1.0)

0
potong On

This might work for you (GNU sed):

sed -En ':a
         N
         /(.*(pattern))\n?(.*\2)/{h;s//\1/p;x;s//\3/}
         ${s/([^\n]*\n){2}(pattern)/\2/;p}
         ba' file

Gather up the lines of the file.

If the collection contains two occurrences of the pattern, print upto and including the first pattern, then reduce the current collection by the same amount (minus an introduced leading newline).

At the end of the file, match on pattern again, this time removing the two lines before it and print the result.

Alternative:

sed -zE 's/(.*)(\n[^\n]*){2}(\npattern)/\1\3/' file
0
jhnc On

tac + sed

tac infile | sed -n '
    p
    /pattern/ {
        n
        n
    :a
        n
        p
        ba
    }
' | tac >tmpfile &&
mv tmpfile infile

sed + shell

(
    n=$(sed -n '/pattern/=' infile | tail -n 1)
    sed -i "$((n-2)),$((n-1))d" infile
)
0
sseLtaH On

Using GNU sed

$ sed -Ezi.bak 's/(.*\n)([^\n]*\n){2}(pattern)/\1\3/' input_file
aaa
bbb
ccc
pattern
eee
pattern
hhh
2
Ed Morton On

Using any awk with tac and only reading 1 line at a time into memory:

$ tac file | awk '!(c && c--); !f && /pattern/{f=c=2}' | tac
aaa
bbb
ccc
pattern
eee
pattern
hhh

Most of the other posted solutions are reading the whole input into memory and so will fail if the input is too large to fit in memory.

2
Kaz On

We can avoid using temporary files (e.g. via tac) or loading the file into RAM, or editing in-place, if we make two passes over it:

$ awk 'NR == FNR && /pattern/ { pos = NR }
       NR == FNR { next }
       FNR < pos - 2 || FNR >= pos' data data
aaa
bbb
ccc
pattern
eee
pattern
hhh

Here, I'm giving the data file twice as an argument on awk's command line. The condition NR == FNR is an idiom in awk which evaluates to true when we are processing the first file (thus, in our case, the first pass over the same file).

In the first pass, we record the line number of the last line which matches pattern, simply by recording the position of line which matches pattern into the same pos variable.

In the second pass through the data, we print all lines which are not one of the two lines before pos.