Trying to remove duplicates words in a String using sed of MacOS

182 views Asked by At

I'm trying to remove duplicates words in a string using sed from MacOS using the following command:

sed -r 's/^([A-Za-z0-9_]+) \1$/\1/' <<< 'The best of of The United Kingdom'

But it only returns

The best of of United Kingdom

What I'm missing? Could you guys give me hand? Please.

4

There are 4 answers

0
dan On

You are unnecesarily anchoring the regex at the start and end of line. Remove ^ and $. Change -r to the POSIX -E and it will work on BSD/Mac sed. You also need the g flag to replace multiple repeating word patterns.

sed -E 's/([A-Za-z0-9_]+) \1/\1/g'
1
sseLtaH On

You can try this sed

$ sed 's/\([^ ]* \)\1\+/\1/' input_file
The best of The United Kingdom

Your original code had unneeded anchors ^|$. Here is a fixed version

$ sed -r 's/([A-Za-z0-9_]* )\1+/\1/' <<< 'The best of of The United Kingdom'
The best of The United Kingdom
1
Ryszard Czech On

Use

sed -E 's/[[:<:]]([[:alnum:]_]+)([[:space:]]+\1)+[[:>:]]/\1/'

EXPLANATION

--------------------------------------------------------------------------------
  [[:<:]]                  the boundary between a non-word char or
                           start of string and a word char
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [[:alnum:]_]+            any character of: letters and digits,
                             '_' (1 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (                        group and capture to \2 (1 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [[:space:]]+             any character of: whitespace characters
                             (like \s) (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  )+                       end of \2
--------------------------------------------------------------------------------
  [[:>:]]                  the boundary between a word char (\w) and
                           something that is not a word char
0
Rafa_izu On

I installed gnu-sed. Problem solved.