I'm trying to alter some XML with Find&Replace in Notepad++ using regex.

This is the specific XML I'm trying to capture:

<category name="Content Server Categories:FOLDER:test category">
    <attribute name="test attribuut"><![CDATA[test]]></attribute>
    <attribute name="test attribuut1"><![CDATA[test1]]></attribute>
</category>

Following 'FIND' regex does the job (for now):

<(category) name="Content Server Categories:(.+?)">(.+)</(category)>

Now i need the XML to be replaced by this:

<category-FOLDER:testcategory name="Content Server Categories:FOLDER:test category">
    <attribute name="test attribuut"><![CDATA[test]]></attribute>
    <attribute name="test attribuut1"><![CDATA[test1]]></attribute>
</category-FOLDER:testcategory>

Currently i tried using this 'REPLACE BY' regex:

<($1-$2) name="Content Server Categories:($2)">($3)</($1-$2)>

But that gives to following output:

<category-FOLDER:test category name="Content Server Categories:FOLDER:test category">
    <attribute name="test attribuut"><![CDATA[test]]></attribute>
    <attribute name="test attribuut1"><![CDATA[test1]]></attribute>
</category-FOLDER:test category>

As you can see i get category-FOLDER:test category instead of category-FOLDER:testcategory

The space(s) needs to be removed..

The problem is that the input can look different. Now it is this:

<category name="Content Server Categories:FOLDER:test category">

But it could look like these examples as well:

<category name="Content Server Categories:FOLDER1:FOLDER2:test category">

<category name="Content Server Categories:FOLDER NAME:test category">

<category name="Content Server Categories:FOLDER NAME: FOLDER NAME1:test category">

<category name="Content Server Categories:FOLDER:test category name">

...

How do I catch all of these correctly and remove the spaces?

EDIT: Almost forgot,

'. Matches newline' is __ON__

1 Answers

2
The fourth bird On Best Solutions

One approach could be to do it in 2 steps due to the replacement of the multiple spaces afterwards.

Get the required structure (Note to use the non greedy version .*? to prevent over matching):

<(category) name="Content Server Categories:(.+?)">(.+?)</(category)>

Regex demo

In the replacement use your replacement without the parenthesis or else they would be included in the replacement:

<$1-$2 name="Content Server Categories:$2">$3</$1-$2>

Then match the spaces making use of repetitive matches using \G:

(?:</?category-|\G(?!^))\K\s*([\w:]+) (?!name=)

In the replacement replace the whitespaces with capturing group 1 $1

Explanation

  • (?: Non capturing group
    • </?category-FOLDER Match text with an optional /
    • | Or
    • \G(?!^) Assert position at the end of the previous match
  • ) Close non capturing group
  • \K\s* Forget what was previously matched and then match 0+ whitespace chars
  • ([\w:]+) Capture in group 1 matching 1+ times a word char or :
  • (?!name=) Assert what is on the right is not a not 'name='

Regex demo