How to change urlencode spaces inside a sed capture group only?

55 views Asked by At

I have some markdown files with links like this one:

[Royal Enfield Flying Flea](RE Flying Flea)

And I'm trying to urlencode the second part of the link to become

[Royal Enfield Flying Flea](RE%20Flying%20Flea)

I'm using sed, which i think is the one tool best suited for the job. I already made some other transformations in the file using it. But I can use another, like awk if someone thinks it's better.

I also managed to capture the strings using sed -E 's/\[([A-Za-z0-9 ]*)\]\(([A-Za-z0-9 ]*)\)/\1\2/' but I couldn't found a ways to make substitution in the selected part. Only to show it as it is.

If someone can point me in the right path, I appreciate.

3

There are 3 answers

2
Gilles Quénot On

Using Perl and a proper module to percent escape URL:

perl -MURI::Escape -pe 's/(\[.*?\])\((.*?)\)/"$1(" . uri_escape($2) . ")"/e' file

Yields:

[Royal Enfield Flying Flea](RE%20Flying%20Flea)

If you'd like to edit the file in place, use -i option:

perl -i -pe ......

The regular expression matches as follows:

Node Explanation
( group and capture to \1:
\[ [
.*? any character except \n (0 or more times (matching the least amount possible))
\] ]
) end of \1
\( (
( group and capture to \2:
.*? any character except \n (0 or more times (matching the least amount possible))
) end of \2
\) )

See URI::Escape

0
Wiktor Stribiżew On

You can use

sed -E ':a;s/(\[[^][]*]\([^()[:space:]]*)[[:space:]]([^()]*\))/\1%20\2/;ta' file

See the online Bash demo:

#!/bin/bash
s='[Royal Enfield Flying Flea](RE Flying Flea)
[Some string here](another string there)'
sed -E ':a;s/(\[[^][]*]\([^()[:space:]]*)[[:space:]]([^()]*\))/\1%20\2/;ta' <<< "$s"
# => 
#    [Royal Enfield Flying Flea](RE%20Flying%20Flea)
#    [Some string here](another%20string%20there)

Details:

  • :a - an a label
  • s - substitute action
  • (\[[^][]*]\([^()[:space:]]*)[[:space:]]([^()]*\)) - search pattern:
    • (\[[^][]*]\([^()[:space:]]*) - Group 1: a [, any zero or more chars other than [ and ], then a ]( string, then any zero or more chars other than (, ) and whitespace
    • [[:space:]] - a whitespace
    • ([^()]*\)) - Group 2: any zero or more chars other than ( and ) and then a literal )
  • \1%20\2 - replaces the match with Group 1 + %20 + Group 2
  • ta - upon a successful replacement, jumps back to the a label position.
0
sseLtaH On

Using sed

$ sed -E ':a;s/(\([^)]*) /\1%20/;ta' input_file
[Royal Enfield Flying Flea](RE%20Flying%20Flea)