Regex extraction from Right to Left

122 views Asked by At

I have some data where I like to extract data from right to left. Sample data

1,4,34
5,15
22

Expected output:

One=34  Two=4  Three=1
One=15  Two=5
One=22

This is as far as I have got with my regex experience.

(?:(?<three>\d+),)?(?:(?<two>\d+),)?(?<one>\d+)$

But this gives :

One=34  Two=4  Three=1
One=15  Three=5
One=22

So it fails when there are only two extraction. Any good idea? PS I do not have any revers tools

5

There are 5 answers

3
The fourth bird On BEST ANSWER

You can make the first 2 groups optional as a whole:

^(?:(?:(?<three>\d+),)?(?<two>\d+),)?(?<one>\d+)$

The pattern matches:

  • ^ Start of string
  • (?: Non capture group
    • (?:(?<three>\d+),)? Optionally capture 1+ digits in group "three" and match a comma
    • (?<two>\d+), Capture 1+ digits in group "two" and match a comma
  • )? Close the non capture group
  • (?<one>\d+) Capture 1+ digits in group "one"
  • $ End of string

Regex demo

1
Vivick On

^((?:(?<three>\d+),)(?:(?<two>\d+),)|(?:(?<two2>\d+),)?)(?<one>\d+)$ is the only potential solution I can think of, but since capture groups must all have different names, you end up with 2 "two" with different names.

2
sln On

Naming groups in the reverse order is ok.
If you're looking for matching in the reverse order, this is a direct way.

This is a template regex that can be expanded as needed and will match left to
right (LTR) in a string from the last to the first in ascending group order.

This removes post processing steps.

Example, these strings produce these matching arrays:

1,4,34 => [34,4,1]
5,15 => [15,5]
22 => [22]

https://regex101.com/r/uo04VM/1

^(?=(?&D_n){0,2}(\d+)$)(?=(?:(?&D_n){0,1}(\d+)(?&n_D)$)?)(?=(?:(\d+)(?&n_D){2}$)?).+$(?(DEFINE)(?<D_n>\d+[^\d\r\n]+)(?<n_D>[^\d\r\n]+\d+))

Expanded

^
(?=
   (?&D_n){0,2}
   ( \d+ )                       # (1)
   $
)
(?=
   (?:
      (?&D_n){0,1}
      ( \d+ )                       # (2)
      (?&n_D) $
   )?
)
(?=
   (?:
      ( \d+ )                       # (3)
      (?&n_D){2} $
   )?
)
.+ $
(?(DEFINE)
   (?<D_n> \d+ [^\d\r\n]+ )      # (4)
   (?<n_D> [^\d\r\n]+ \d+ )      # (5)
)
3
warren On

You want a variable-list of field names extracted from delimited data in reverse order?

How many entries could you possibly have? Three? Five? Two hundred seventy four?

Are you trying to do this at search time (ie in SPL you are writing/running), or in props.conf?

If you are trying to do this at search time, I would not try to use a regular expression at all - use split() (or makemv) and mvindex() (with negative indexing) to find the items you want:

...
| eval mvlist=split(delimited_field,",")
...
| eval three=mvindex(mvlist,-3)
...
0
Jotne On

To avoid using regex from right to left, I found a way to revers the string.

Sed by it self seem to have a limit to 9 numbered back references.

echo "AbCdEfG" | sed  -r 's/(.)(.)?(.)?(.)?(.)?(.)?(.)?/\7\6\5\4\3\2\1/'
GfEdCbA

But sed splunk does not have this limit (nor that I need so many) so

| makeresults 
| eval test="abcdefghijkl"
| rex mode=sed field=test "s/(.)(.)?(.)?(.)?(.)?(.)?(.)?(.)?(.)?(.)?(.)?(.)?/\12\11\10\9\8\7\6\5\4\3\2\1/"

gives: test=lkjihgfedcba

Then using regex from left to right works fine.