In certain flavours of regular expression, inside a square bracket expression, the =
symbol is a special character that is used as a delimiter for enclosing any one the elements within an equivalence class. The documentation says the following:
An equivalence class expression shall represent the set of collating elements belonging to an equivalence class, as described in Collation Order. Only primary equivalence classes shall be recognized. The class shall be expressed by enclosing any one of the collating elements in the equivalence class within bracket-equal ( "[=" and "=]" ) delimiters. For example, if 'a', 'à', and 'â' belong to the same equivalence class, then "[[=a=]b]", "[[=à=]b]", and "[[=â=]b]" are each equivalent to "[aàâb]". If the collating element does not belong to an equivalence class, the equivalence class expression shall be treated as a collating symbol.
I am not quite sure what this means. If a
, à
and â
belong to the same equivalence class, does that mean we wish to specify that the regular expressions "[ab]"
, "[àb]"
and "[âb]"
are equivalent? Then what is the purpose of using the [=
=]
delimiter, since we might as well write "[aàâb]"
?
I understand what an "equivalence class" means in its general definition, but I am unable to grasp its meaning in this context.
Essentially, for example,
[=a=]
means "all the characters that belong to the equivalence class whicha
belongs to. Ifa
andà
form an equivalence class which only contains those two, then[=a=]
and[=à=]
are both the same asaà
(obviously written inside a[…]
). But if the equivalence class also containsâ
, then all of[=a=]
,[=à=]
, and[=â=]
meanaàâ
(again, written in[…]
).What characters an equivalence class contains depends on the locale, but if the locale doesn't define a given equivalence class, let's say
[=a=]
, then the collating sequence with the same name will be used, i.e.[.a.]
, which normally is just the same asa
, as the locales usually include normal characters as collating sequences.Reference: Mastering Regular Expressions, 3rd Edition, page 128, which is an excellent book on regular expression, written by someone who knows regex down to the smallest bit.