How to filter data from flat file with multiply lines pattern using awk or sed tool?

Question

How to filter data from flat file with multiply lines pattern using awk or sed tool?

69 views Asked by Piotr Wójcik At 09 June 2015 at 15:27

This is my first post on this site. I have probably not very easy problem with awk or sed language. In my file are data like this:

A
B
C
[Start]D
E
F
[/End]
G
...
[Start]H
I
J
[/End]
...
K

And I need following result:

A
B
C
[Open]D E F[/Close]
G
...
[Open]H I J[/Close]
...
K

For now I have not working awk code:

BEGIN {
    step=0
}

/[\/End]/ {
    if(step==3) print "[/Close]"
    step=0
}

step==2 {
    print
    step=3
}

step==1{
    print
    step=2
}

/[Start]/ {
    print "[Begin]"
    step=1
}

step=0{
    print
}

Many thanks for yours answers. I hope to stay here a little bit longer. Cheers! P.

Original Q&A

There are 3 answers

Ed Morton On 09 June 2015 at 16:21

$ cat tst.awk
sub(/^\[Start\]/,"[Open]")  { ors=ORS; ORS=OFS }
sub(/^\[\/End\]/,"[Close]") { ORS=ors }
{ print }

$ awk -f tst.awk file
A
B
C
[Open]D E F [Close]
G
...
[Open]H I J [Close]
...
K

If you care about the extra space before each "[Close]" we can do something different but it'll be a bit more complicated., e.g.:

$ cat tst.awk
sub(/^\[Start\]/,"[Open]")  { f=1; rec=$0; next }
sub(/^\[\/End\]/,"[Close]") { f=0; $0=rec $0 }
f { rec = rec OFS $0; next }
{ print }

$ awk -f tst.awk file
A
B
C
[Open]D E F[Close]
G
...
[Open]H I J[Close]
...
K

Wintermute On 09 June 2015 at 15:48

With sed, you could write (GNU sed syntax, for BSD sed see below):

sed '/\[Start\]/ { s//[Open]/; :a \,\[/End\],! { s/\n/ /; N; ba }; s,,[/Close],; s/\n// }' filename

This is to be read as follows:

/\[Start\]/ {        # If a line contains [Start]
  s//[Open]/         # replace it with [Open] (an empty regex reattempts the most
                     # recently used regex, which was \[Start\])
  :a                 # jump label for looping
  \,\[/End\],! {     # Until we find [/End]
    s/\n/ /          # replace newlines with spaces (this does nothing the first
                     # time around, but since we don't want to replace the last
                     # newline with a space but an empty string, we have to
                     # isolate it somehow; this works for that
    N                # fetch next line, append it to what we already have
    ba               # go back to a
  }
  s,,[/Close],       # replace the [/End] we just found with [/Close]
  s/\n//             # and replace the last newline with nothing, to get the
                     # spaces right.
}

Note that to make this work with BSD sed, the call has to be amended slightly:

 sed -e '/\[Start\]/ { s//[Open]/; :a' -e '\,\[/End\],! { s/\n/ /; N; ba' -e '}; s,,[/Close],; s/\n// }' filename

This is because BSD sed doesn't terminate label names at semicolons the way GNU sed does. Apart from the -e that split the code after label names, it is the same code.

Further note that this will only work as long as the [Start] .. [/End] tags are not nested. If they are, you'll want to ditch sed and awk and use at least Perl (which supports recursion in regexes¹).

¹ Well, it calls them "regular expressions;" it's a bit of a misnomer because they're not limited to regular languages with all the stuff Perl crams into them. The point is: nested tags aren't a regular language anymore, so you need/want that stuff for it.

**karakfa** · Accepted Answer · 2015-06-09T16:14:12+00:00

karakfa On 09 June 2015 at 16:14 BEST ANSWER

This awk will do most of it, but will leave space before the [\Close]

awk '/Start/{ORS=FS} /End/{ORS=RS} sub(/Start/,"Open") sub(/End/,"Close") 1' file

It's easy to trim that in another pass (pipe previous output to this script)

awk 'sub(/ \[/,"\[") 1'

TechQA.

How to filter data from flat file with multiply lines pattern using awk or sed tool?

There are 3 answers

Related Questions in REGEX

Related Questions in AWK

Related Questions in SED

Popular Questions

Popular Tags

Trending Questions