Need Help Implementing Multi-Segment Matching With PCRE2 in C++

44 views Asked by At

I am trying to implement multi-segment matching with pcre2_match in accordance with the documentation.

// #define PCRE2_CODE_UNIT_WIDTH 0
// #include <pcre2.h>

// Variable initialisation
char const *string_one = "one ";
char const *string_two = "one hundred";
int        error_number; // <- These are unused in
PCRE2_SIZE error_offset; //    our example.
int return_code;
auto *code = pcre2_compile_8((PCRE2_SPTR8) "one hundred",
                             PCRE2_ZERO_TERMINATED,
                             0,
                             &error_number,
                             &error_offset,
                             NULL);
if (code == NULL) { fprintf(stderr, "`pcre2_compile`"); exit(1); }
auto *match_data = pcre2_match_data_create_from_pattern_8(code, NULL);
auto *ovector    = pcre2_get_ovector_pointer_8(match_data);

// Matching
return_code = pcre2_match_8(code,
                            (PCRE2_SPTR8) string_one,
                            PCRE2_ZERO_TERMINATED,
                            0,
                            PCRE2_PARTIAL_HARD,
                            match_data,
                            NULL);
if (return_code == PCRE2_ERROR_PARTIAL) {
  PCRE2_SIZE offset = ovector[1]; // Positions [0] and [1] designate the
                                  // bounds of the first match.
  return_code = pcre2_match_8(code,
                             (PCRE2_SPTR8) string_two,
                             PCRE2_ZERO_TERMINATED,
                             offset,
                             PCRE2_PARTIAL_HARD | PCRE2_NOTBOL,
                             match_data,
                             NULL);
  printf("[rc]: %d\n", return_code);
}

We compile the regular expression /one hundred/ and declare two strings to match against (string_one and string_two). According to the documentation, once a partial match has been produced, you append the next segment to the first one and call pcre2_match with an offset, so we prepare string_two as though it was the first string followed by the next set of data.

  1. We match the compiled regex code against string_one using the option PCRE_PARTIAL_HARD, expecting a partial match. This, indeed, sets return_code to -2 (PCRE2_ERROR_PARTIAL), telling us the match is incomplete (or, rather, to my understanding, might be incomplete, which in this case definitely is true).

  2. Knowing the match was only partial, we retrieve the ending bound of the partial match from the ovector (this is verified as 4), and match against an initially identical string followed by the characters required for a complete match. PCRE2_NOTBOL (not beginning of line) is specified here withal in accordance with the documentation.

I expect return_code following the second call to pcre2_match to take on a non-negative value (or possibly -2 if I have misunderstood something). Instead, -1 is returned, denoting PCRE2_ERROR_NOMATCH.

Why is that?

0

There are 0 answers