Awk one-liner to replace text of first matching regex occurence only

Question

Awk one-liner to replace text of first matching regex occurence only

323 views Asked by Matt Dexter At 20 June 2015 at 03:51

I need this awk command to replace ss:Width="252" in the first XML tag in the text with ss:Width="140" and leave the rest of the tags alone:

cat <<- EOF > text
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="189"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="189"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
EOF

awk '{c=++count[$0]} c==1 {sub(/ss:Width=\"[0-9]{1,4}\"/,"ss:Width=\"140\"")} {print}' text > newf

cat newf

Instead, it replaces the expression in the first instances of each of the three unique matches (three total replacements, whereas I want only one.)

<ss:Column ss:AutoFitWidth="1" ss:Width="140"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="140"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="140"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="189"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
<ss:Column ss:AutoFitWidth="1" ss:Width="252"/>

Why does it behave this way? How is the incrementer behaving in my awk command? I expected it to increment after the first qualifying match of /ss:Width=\".*\"/ but it seems like it's not incrementing until all unique matches are found, then ignoring subsequent non-unique matches only. Is that right? I tried to force the counter to increment at the end of the c == 1 block like this:

awk '{c=++count[$0]} c==1 {sub(/ss:Width=\".*\"/,"ss:Width=\"140\"");c++} {print}' text > newf

But I get the same output. I didn't have any luck trying this task in sed & I'd rather do it in awk anyway. I'm specifically interested in understanding this awk syntax.

Edit: I tested this theory by changing one of the width attributes to another random number. It does also replace that one with 140. So, it is limiting to the first instance of all matching expressions, not the first matching expression itself.

Edit: As Cody pointed out my regex is greedy. I changed .* to be [0-9]{1,4} however the behavior is the same - it still replaces only the first instance of every unique match. I also changed one of the XML tags' width attributes to a 3rd unique number and updated the output to illustrate the behavior I'm trying to fix.

This is AIX/ksh.

Original Q&A

There are 4 answers

Nathan Wilson On 20 June 2015 at 07:02

Try this:

awk '($0 ~ /ss:Width/) {if (once != 1) {sub("[0-9]+\"/>","140\"/>")}; once=1; print}' text

It looks for the first line containing ss:Width then replaces the last number before the closing tag with 140.

Cody Stevens On 20 June 2015 at 04:25

It looks like your regular expression is greedy.

sub(regexp, replacement [, target]) The sub function alters the value of target. It searches this value, which is treated as a string, for the leftmost, longest substring matched by the regular expression regexp.

anubhava On 20 June 2015 at 07:48

It is actually pretty easy with custom field separators:

awk -F ' ss:Width="252"' -v r=' ss:Width="140"' '!p && NF>1{p=1; $1 = $1 r} 1' text
    <ss:Column ss:AutoFitWidth="1" ss:Width="140"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="189"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="189"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="126"/>
    <ss:Column ss:AutoFitWidth="1" ss:Width="252"/>

-F ' ss:Width="252"' sets field separator as the ss:Width="252".

!p && NF>1 puts replaced value r for the first instance of searched text.

**shawnt00** · Accepted Answer · 2015-06-20T04:02:15+00:00

awk 'found == 0 { found = sub(/ss:Width=\"[0-9]{1,4}\"/,"ss:Width=\"140\"")} //' text > newf

You might be able to shorten that a bit.

Your old approach was keeping an array of counters indexed by the line of input. That's why it was exhibiting the behavior you weren't expecting.

Some of the other answers assume that all lines will match the /ss:Width/ regex and/or always find the width attribute at the end of a line. It's probably true in your case but worthy of noting. I decided not to assume those things in the script above.

TechQA.

Awk one-liner to replace text of first matching regex occurence only

There are 4 answers

Related Questions in REGEX

Related Questions in AWK

Related Questions in KSH

Related Questions in AIX

Popular Questions

Popular Tags

Trending Questions