Couchbase xdcr regex - How do I exclude keys using regex?

329 views Asked by At

I am trying to exclude certain documents from being transported to ES using XDCR. I have the following regex that filters ABCD and IJ

https://regex101.com/r/gI6sN8/11

Now, I want to use this regex in the XDCR filtering

^(?!.(ABCD|IJ)).$

enter image description here

How do I exclude keys using regex?

EDIT:
What if I want to select everything that doesn't contains ABCDE and ABCHIJ.
I tried

https://regex101.com/r/zT7dI4/1

2

There are 2 answers

4
AudioBubble On

edit:

Sorry, after further looking at it, this method is invalid. For instance, [^B] allows an A to get by, letting AABCD slip through (since it will match AA at first, then match BCD with the [^A]. Please disregard this post.

Demo here shows below method is invalid


(disregard this)
You could use a posix style trick to exclude words.
Below is to exclude ABCD and IJ.
You get a sense of the pattern from this.
Basically, you put all the first letters into a negative class
as the first in the alternation list, then handle each word
in a separate alternation.

^(?:[^AI]+|(?:A(?:[^B]|$)|AB(?:[^C]|$)|ABC(?:[^D]|$))|(?:I(?:[^J]|$)))+$

Demo

Expanded

 ^ 
 (?:
      [^AI]+ 
   |  
      (?:                     # Handle 'ABCD`
           A
           (?: [^B] | $ )
        |  AB
           (?: [^C] | $ )
        |  ABC
           (?: [^D] | $ )
      )
   |  
      (?:                     # Handle 'IJ`
           I
           (?: [^J] | $ )
      )
 )+
 $
0
dnault On

Hopefully one day there will be built-in support for inverting the match expression. In the mean time, here's a Java 8 program that generates regular expressions for inverted prefix matching using basic regex features supported by the Couchbase XDCR filter.

This should work as long as your key prefixes are somehow delimited from the remainder of the key. Make sure to include the delimiter in the input when modifying this code.

Sample output for red:, reef:, green: is:

^([^rg]|r[^e]|g[^r]|re[^de]|gr[^e]|red[^:]|ree[^f]|gre[^e]|reef[^:]|gree[^n]|green[^:])

File: NegativeLookaheadCheater.java

import java.util.*;
import java.util.stream.Collectors;

public class NegativeLookaheadCheater {

    public static void main(String[] args) {
        List<String> input = Arrays.asList("red:", "reef:", "green:");
        System.out.println("^" + invertMatch(input));
    }

    private static String invertMatch(Collection<String> literals) {
        int maxLength = literals.stream().mapToInt(String::length).max().orElse(0);

        List<String> terms = new ArrayList<>();
        for (int i = 0; i < maxLength; i++) {
            terms.addAll(terms(literals, i));
        }

        return "(" + String.join("|", terms) + ")";
    }

    private static List<String> terms(Collection<String> words, int index) {
        List<String> result = new ArrayList<>();
        Map<String, Set<Character>> prefixToNextLetter = new LinkedHashMap<>();

        for (String word : words) {
            if (word.length() > index) {
                String prefix = word.substring(0, index);
                prefixToNextLetter.computeIfAbsent(prefix, key -> new LinkedHashSet<>()).add(word.charAt(index));
            }
        }

        prefixToNextLetter.forEach((literalPrefix, charsToNegate) -> {
            result.add(literalPrefix + "[^" + join(charsToNegate) + "]");
        });

        return result;
    }

    private static String join(Collection<Character> collection) {
        return collection.stream().map(c -> Character.toString(c)).collect(Collectors.joining());
    }
}