How do I match all but the first matches in a line with sed?

81 views Asked by At

I'm doing my commit messages in Git with a certain pattern to ease creation of a changelog for new releases (https://stackoverflow.com/a/5151123/520162).

Every change that should be taken into my changelog gets prefixed with CHG, NEW or FIX.

When it comes to generation of my changelog, I print out the revisions I'm going to parse using the following command for each revision:

git show --quiet --date=short --pretty=format:"%cd %an %s%n%n%w(100,21,21)%b%n" $CURRENTREVISION

The subject (%s) holds the subject of the modification.

Next, I'm using SED to modify the generated outputs so that they fit the needs of my changelog file.

Now, it happens that in the subject line, there are multiple occurrences of CHG, NEW or FIX. My output of the subject looks like this:

DATE NAME FIX first change NEW second change CHG third change

I'd like to prefix all but the first occurrence of my keywords with a newline so that each CHG, NEW or FIX starts a new line:

DATE NAME FIX first change
          NEW second change
          CHG third change

What do I have to tell SED in order to achieve this?

4

There are 4 answers

1
Etan Reisner On BEST ANSWER

sed isn't the most appropriate tool for this

With awk it would look like this.

awk '{n=0; for (i=1; i<=NF; i++) {if ($i ~ /(NEW|FIX|CHG)/) {$i=(n++?"\n          ":"")$i}}}7'
  • n=0 (re)set a flag
  • for (i=1; i<=NF; i++) loop over every field of the line
  • if ($i ~ /(NEW|FIX|CHG)/) if the field is one of the markers
    • $i=(n++?"\n ":"")$i update the field by adding the appropriate leading space (or none)
  • 7 truth-y pattern to print out the current line.
0
Arjun Mathew Dan On
awk '{while(++i<=NF){if($i~/FIX|NEW|CHG/){if(f){$i="\n"$i}else{f=1}}}}1'

or even smaller:

awk '{while(++i<=NF){if($i~/FIX|NEW|CHG/){if(f++){$i="\n"$i}}}}1'

Example:

$echo "DATE CH NAME FIX first change NEW second change CHG third change" | awk '{while(++i<=NF){if($i~/FIX|NEW|CHG/){if(f){$i="\n"$i}else{f=1}}}}1'

DATE CH NAME FIX first change 
NEW second change 
CHG third change

Go from 1st to last fields. for whichever field matching either of the 3 patterns, we check if f=1, which will be false in case of the first match. since we r doing f++, for next matches it will be true and hence "\n" will be added before.

0
NeronLeVelu On
sed '/^DATE NAME/ {
:cycle
   s/\(.\{1,\}\) \(FIX .*\)/\1\
\2/g
   t cycle
   s/\(.\{1,\}\) \(NEW .*\)/\1\
\2/g
   t cycle
   s/\(.\{1,\}\) \(CHG .*\)/\1\
\2/g
   t cycle

   s/\n/&          /g
   s/\n */ /
   }' YourFile

something like that for posix version (--posix on GNU sed).

a simple

   s/\(.\{1,\}\) \(\(CHG|FIX|NEW\) .*\)/\1\
\2/g
   t cycle

could replace the 3 first s/// with a GNU sed taht allow the |

I secure a bit with the first /^DATA NAME/ as filter but if only this kind of line is treated, no need of this (and associated { })

0
Jason Hu On

sed doesn't sound like a right tool for this work. the state preserved in sed is very limited and your goal needs a counter, which is fairly difficult in sed. i think you won't be happy to maintain your code afterwards.

instead, i think maybe Perl is a fantastic tool for it.

something like this:

while(<STDIN>){
    my @matches = m/^(.*?)((?:FIX|NEW|CHG).*?)*$/;
    my $date_name = unshift @matches; # only FIX, NEW, CHG remains now
    print $date_name, unshift @matches;
    while (@matches) { print "\t\t", unshift @matches; }
}

pipe in your original data, and redirect out to file in shell.