Notepad++ or UltraEdit: regex remove special duplicates

1.3k views Asked by At

I need to remove duplicates if

key = anything

but NOT

key=anything

the key can be anything too

e.g. edit_home=home must be in place

while edit_home = home or even other string must be removed IF edit_home is a duplicate

for all the lines of the document

thank you

p.s. clearer example:

one=you are
two=we are
three_why=8908908
one = good
two = fine
three_4 = best
three_why = win

from that list i only need to keep:

one=you are
two=we are
three_why=8908908
three_4 = best // because three_4 doesn't have a duplicate

I found a method to do it, but I would need a better search list support by regex or a plugin or a direct regex (which I don't know).

That is: I have two files to compare.

One has the full keys, the other has incomplete.

I merge in a new file all the keys from the first file with those ones of the second, in groups (because the keys are in groups e.g. many keys titled one, many titled two and so on...). Then I regex replace all the keys in the new file by

find (.*)(\s\=\s) replace with \1\=

So they all become key=anything

Then I replace everything after = with empty to isolate the keys.

Then remove the duplicates.

At this point I have trouble to do something like

^.*(^keyone\b|^keytwo\b|^keythree\b).*$

to find all those keys in the document I need. So from that I can select all and replace with the correct keys.

Why? Because in this example the keys are 3 only BUT indeed the keys are many and the find field breaks at a certain point.

How to do it right?

Update: I found Toolbucket plugin which allows to search for many strings, but another issue is that in addition to duplicate, I also have to remove the original.

That is, if I find 2 times the same key "one" I have to remove all the lines containing one.

3

There are 3 answers

0
kenwarr On

ok, after all that i wrote, one solution could be (therefore, once i have the merged keys)

(?m)^(.*)$(?=\r?\n^(?!\1).*(?s).*?\1)

with this i can mark/highlight all the duplicated keys :-) so then i can manage those only, removing them from the first list and adding what remains to the second file...

If someone has a solution with a direct regex will be really appreciated

3
dimitrisli On

Ctrl + F

Find tab

Find what: ^.*\S=\S.*$

Find All in Current Document

Copy result from result window to a new window (the list of Line 1: Line 2: Line 3: ...)

Ctrl + F

Replace tab

(the following will remove the leading "Line number:" from every line)

Find what: ^.*?\d:\s

Replace with: Empty

0
Mofi On

Here is a commented UltraEdit script for this task.

// Note: This script does not work for large files as it loads the
// entire file content into very limited scripting memory for fast
// processing even with multiple GB of RAM installed.

if (UltraEdit.document.length > 0)  // Is any file opened?
{
   // Define environment for this script and select entire file content.
   UltraEdit.insertMode();
   UltraEdit.columnModeOff();
   UltraEdit.activeDocument.selectAll();

   // Determine line termination used currently in active file.
   var sLineTerm = "\r\n";
   if (typeof(UltraEdit.activeDocument.lineTerminator) == "number")
   {
      // The two lines below require UE v16.00 or UES v10.00 or later.
      if (UltraEdit.activeDocument.lineTerminator == 1) sLineTerm = "\n";
      else if (UltraEdit.activeDocument.lineTerminator == 2) sLineTerm = "\r";
   }
   else  // This version of UE/UES does not offer line terminator property.
   {
      if (UltraEdit.activeDocument.selection.indexOf(sLineTerm) < 0)
      {
         sLineTerm = "\n";          // Not DOS, perhaps UNIX.
         if (UltraEdit.activeDocument.selection.indexOf(sLineTerm) < 0)
         {
            sLineTerm = "\r";       // Also not UNIX, perhaps MAC.
            if (UltraEdit.activeDocument.selection.indexOf(sLineTerm) < 0)
            {
               sLineTerm = "\r\n";  // No line terminator, use DOS.
            }
         }
      }
   }

   // Get all lines of active file into an array of strings
   // with each string being one line from active file.
   var asLines = UltraEdit.activeDocument.selection.split(sLineTerm);
   var nTotalLines = asLines.length;

   // Process each line in the array.
   for(var nCurrentLine = 0; nCurrentLine < asLines.length; nCurrentLine++)
   {
      // Skip all lines not containing or starting with an equal sign.
      if (asLines[nCurrentLine].indexOf('=') < 1) continue;

      // Get string left to equal sign with tabs/spaces trimmed.
      var sKey = asLines[nCurrentLine].replace(/^[\t ]*([^\t =]+).*$/,"$1");

      // Skip lines beginning with just tabs/spaces left to equal sign.
      if (sKey.length == asLines[nCurrentLine].length) continue;
      var_dump(sKey);

      // Build the regular expression for the search in all other lines.
      var rRegSearch = new RegExp("^[\\t ]*"+sKey+"[\\t ]*=","g");

      // Ceck all remaining lines for a line also starting with
      // this key string case-sensitive with left to an equal sign.
      var nLineCompare = nCurrentLine + 1;
      while(nLineCompare < asLines.length)
      {
         // Does this line also has this key left to equal
         // sign with or without surrounding spaces/tabs?
         if (asLines[nLineCompare].search(rRegSearch) < 0)
         {
            nLineCompare++;   // No, continue on next line.
         }
         else  // Yes, remove this line from array.
         {
            asLines.splice(nLineCompare,1);
         }
      }
   }
   // Was any line removed from the array?
   if (nTotalLines == asLines.length)
   {
      UltraEdit.activeDocument.top();  // Cancel the selection.
      UltraEdit.messageBox("Nothing found to remove!");
   }
   else
   {
      // If version of UE/UES supports direct write to clipboard, use
      // user clipboard 9 to paste the lines into file with overwritting
      // everything as this is much faster than using write command in
      // older versions of UE/UES.
      if (typeof(UltraEdit.clipboardContent) == "string")
      {
         var nActiveClipboard = UltraEdit.clipboardIdx;
         UltraEdit.selectClipboard(9);
         UltraEdit.clipboardContent = asLines.join(sLineTerm);
         UltraEdit.activeDocument.paste();
         UltraEdit.clearClipboard();
         UltraEdit.selectClipboard(nActiveClipboard);
      }
      else UltraEdit.activeDocument.write(asLines.join(sLineTerm));

      var nRemoved = nTotalLines - asLines.length;
      UltraEdit.activeDocument.top();
      UltraEdit.messageBox("Removed " + nRemoved + " line" + ((nRemoved != 1) ? "s" : "") + " on updated file.");
   }
}

Copy this code and paste it into a new ASCII file using DOS line terminators in UltraEdit.

Next use command File - Save As to save the script file for example with name RemoveDuplicateKeys.js into %AppData%\IDMComp\UltraEdit\MyScripts or wherever you want to have saved your UltraEdit scripts.

Open Scripting - Scripts and add the just saved UltraEdit script to the list of scripts. You can enter a description for this script, too.

Open the file with the list, or make this file active if it is already opened in UltraEdit.

Run the script by clicking on it in menu Scripting, or by opening Views - Views/Lists - Script List and double clicking on the script.