I would like to clean some input that was logged from my keyboard with python and regex. Especially when backspace was used to fix a mistake.
Example 1:
[in]: 'Helloo<BckSp> world'
[out]: 'Hello world'
This can be done with
re.sub(r'.<BckSp>', '', 'Helloo<BckSp> world')
Example 2:
However when I have several backspaces, I don't know how to delete exactly the same number of characters before:
[in]: 'Helllo<BckSp><BckSp>o world'
[out]: 'Hello world'
(Here I want to remove 'l' and 'o' before the two backspaces).
I could simply use re.sub(r'[^>]<BckSp>', '', line)
several times until there is no <BckSp>
left but I would like to find a more elegant / faster solution.
Does anyone know how to do this ?
Since there is no support for recursion/subroutine calls, no atomic groups/possessive quantifiers in Python
re
, you may remove these chars followed with backspaces in a loop:See the Python demo
The
"^\b+|[^\b]\b"
pattern will find 1+ backspace chars at the string start (with^\b+
) and[^\b]\b
will find all non-overlapping occurrences of any char other than a backspace followed with a backspace.Same approach in case a backspace is expressed as some enitity/tag like a literal
<BckSp>
:See another Python demo