I have to find some special characters using a Python regular expression. For example, for the character 'à', I found some hex codes to build a regex pattern:
r'\x61\x300|\xe0|\x61\x300'
But I am afraid that I may miss some other hex codes. How can I find all possible hex codes for a character?
This is a bit of an XY problem. You want something like
If your regex needs to contain more than just a static literal string, probably compose it out of normalized Unicode strings and other regex constructs (though in simple cases individual regex primitives like
^and.and*should be robust under Unicode normalization).You can use another normalization if you prefer; the crucial requirement is to use the same normalization for both inputs. But NFC is recommended for this type of scenario.
See also https://www.unicode.org/faq/normalization.html and Normalizing Unicode
To really solve the stated problem, you have to understand how normalization works. If there are multiple combining diacritics, you have to generate all possible orderings of them, for example. See also How does Zalgo text work?