Awk one-liner to replace text of first matching regex occurence only

324 views Asked by At

I need this awk command to replace ss:Width="252" in the first XML tag in the text with ss:Width="140" and leave the rest of the tags alone:

cat <<- EOF > text
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="189"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="189"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
EOF

awk '{c=++count[$0]} c==1 {sub(/ss:Width=\"[0-9]{1,4}\"/,"ss:Width=\"140\"")} {print}' text > newf

cat newf

Instead, it replaces the expression in the first instances of each of the three unique matches (three total replacements, whereas I want only one.)

<ss:Column ss:AutoFitWidth="1" ss:Width="140"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="140"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="140"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="189"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="252"/>

Why does it behave this way? How is the incrementer behaving in my awk command? I expected it to increment after the first qualifying match of /ss:Width=\".*\"/ but it seems like it's not incrementing until all unique matches are found, then ignoring subsequent non-unique matches only. Is that right? I tried to force the counter to increment at the end of the c == 1 block like this:

awk '{c=++count[$0]} c==1 {sub(/ss:Width=\".*\"/,"ss:Width=\"140\"");c++} {print}' text > newf

But I get the same output. I didn't have any luck trying this task in sed & I'd rather do it in awk anyway. I'm specifically interested in understanding this awk syntax.

Edit: I tested this theory by changing one of the width attributes to another random number. It does also replace that one with 140. So, it is limiting to the first instance of all matching expressions, not the first matching expression itself.

Edit: As Cody pointed out my regex is greedy. I changed .* to be [0-9]{1,4} however the behavior is the same - it still replaces only the first instance of every unique match. I also changed one of the XML tags' width attributes to a 3rd unique number and updated the output to illustrate the behavior I'm trying to fix.

This is AIX/ksh.

4

There are 4 answers

7
shawnt00 On BEST ANSWER
awk 'found == 0 { found = sub(/ss:Width=\"[0-9]{1,4}\"/,"ss:Width=\"140\"")} //' text > newf

You might be able to shorten that a bit.

Your old approach was keeping an array of counters indexed by the line of input. That's why it was exhibiting the behavior you weren't expecting.

Some of the other answers assume that all lines will match the /ss:Width/ regex and/or always find the width attribute at the end of a line. It's probably true in your case but worthy of noting. I decided not to assume those things in the script above.

0
Nathan Wilson On

Try this:

awk '($0 ~ /ss:Width/) {if (once != 1) {sub("[0-9]+\"/>","140\"/>")}; once=1; print}' text

It looks for the first line containing ss:Width then replaces the last number before the closing tag with 140.

1
Cody Stevens On

It looks like your regular expression is greedy.

sub(regexp, replacement [, target]) The sub function alters the value of target. It searches this value, which is treated as a string, for the leftmost, longest substring matched by the regular expression regexp.

0
anubhava On

It is actually pretty easy with custom field separators:

awk -F ' ss:Width="252"' -v r=' ss:Width="140"' '!p && NF>1{p=1; $1 = $1 r} 1' text
    <ss:Column ss:AutoFitWidth="1" ss:Width="140"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="189"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="189"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>

-F ' ss:Width="252"' sets field separator as the ss:Width="252".

!p && NF>1 puts replaced value r for the first instance of searched text.