As many people ,i am struggling with what it seems a "trivial" regex issue. in a given text, whenever I encounter a word within {} brackets i need to extract it.At first i used

"\\{-?(\\w{3,})\\}"

and it worked ok:

as long as the word didnt have any white space or special character like ' . For example {Project} returns Project.But {Project Test} or {Project D'arce} don't return anything. i know that for white characters i need to use \s.But it is absolutely not clear for me how to add to the above , i tried :

"%\\{-?(\\w(\\s{3,})\\)\\}"))

but not working.Also what if i want to add words containing a special characters like ' ??? Its really frustrating

2 Answers

2
The fourth bird On Best Solutions

You could use a character class [\w\s']and add to it what you could allow to match:

\{-?([\w\s']{3,})}

In Java

String regex = "\\{-?([\\w\\s']{3,})}";

Regex demo

If you want to prevent matching only 3 whitespace chars, you could use a repeating group:

\{-?\h*([\w']{3,}(?:\h+[\w']+)*)\h*}

About the pattern

  • \{ Match { char
  • -? Optional hyphen
  • \h* Match 0+ times a horizontal whitespace char
  • ([\w\s']{3,}) Capture in a group matching 3 or more times either a word char, whitespace char or '
  • (?:\h[\w']+)* Repeat 0+ times matching 1+ horizontal whitespace chars followed by what is listed in the character class
  • \h* Match 0+ times a horizontal whitespace char
  • } Match }

In Java

String regex = "\\{-?\\h*([\\w']{3,}(?:\\h+[\\w']+)*)\\h*}";

Regex demo

3
Pshemo On

How about matching any character inside {..} which is not }?

To do so you can use negated character class [^..] like [^}]. So your regex can look like

"\\{[^}]{3,}\\}"

But if you want to limit your regex only to some specific alphabet you can also use character class to combine many characters and even predefined shorthand character classes like \w \s \d and so on.

So if you want to accept any word character \w or whitespace \s or ' your regex can look like

"\\{[\\w\\s']{3,}\\}"