Python3:
import re
k = "X"
s = "X测试一Q测试二XQ测试三"
print(re.split((r"\b" + k + r"\b"), s))
Output:
['X测试一Q测试二XQ测试三']
Expected:
['', '测试一Q测试二XQ测试三']
Python3:
import re
k = "X"
s = "X测试一Q测试二XQ测试三"
print(re.split((r"\b" + k + r"\b"), s))
Output:
['X测试一Q测试二XQ测试三']
Expected:
['', '测试一Q测试二XQ测试三']
The
测is a letter belonging to the\p{Lo}class and there is no word boundary betweenXand测.A
\bword boundary construct is Unicode-aware by default in Python 3.xrepatterns, so you might switch this behavior off by using there.ASCII/re.Aoption, or the inline(?a)flag:See the regex demo and the Python demo.
If you need to make sure there is no ASCII letter before and after
X, use(?<![a-zA-Z])X(?![a-zA-Z]). Or, including digits,(?<![a-zA-Z0-9])X(?![a-zA-Z0-9]).