How do I remove the leading and trailing non-alpha characters in a given string, before and after a certain substring? See the example below

input_string = m#12$my#tr!#$g%

output_string = m12my#tr!g

The substring, in this case, is my#tr!

How can get the output_string given the input_string?

My attempt below removes all the leading characters (including alphanumeric). See the code snippet below). I tried amending \W+ instead of .+ which did not work.

import re
input_string = "m#12$my#tr#$%"
output_string = re.sub(r'.+?(?=my#tr!)', '', "m#12$my#tr!#$g%")

Appreciate any thought on how I could use the regex pattern for this purpose.

1

There are 1 answers

2
Nick On BEST ANSWER

One way to do this is to split the string around the desired substring, replace the non-alphanumeric characters in the first and last parts and then reassemble the string:

import re

input_string = "m#12$my#tr!#$g%"
mid = 'my#tr!'
first, last = input_string.split(mid)
first = re.sub('[^a-z0-9]', '', first)
last = re.sub('[^a-z0-9]', '', last)

output_string = first + mid + last
print(output_string)

Output:

m12my#tr!g

If you use the regex module from PyPi, you can take advantage of variable length lookbehinds and replace any non-alphanumeric character that is before or after the target string:

import regex

input_string = "m#12$my#tr!#$g%"
mid = 'my#tr!'
output_string = regex.sub(rf'[^a-z0-9](?=.*{mid})|(?<={mid}.*)[^a-z0-9]', '', input_string)
# 'm12my#tr!g'

Note that if mid contains characters that are special to regex (e.g. . [ { $ ^ etc) you should escape it before use i.e.

mid = 'my#tr!'
mid = regex.escape(mid)

If you don't want to use regex at all, you could manually strip the non-alphanumeric characters out of the first and last parts. For example:

import string

input_string = "m#12$my#tr!#$g%"
mid = 'my#tr!'
first, last = input_string.split(mid)
first = ''.join(c for c in first if c in string.ascii_letters + string.digits)
last = ''.join(c for c in last if c in string.ascii_letters + string.digits)
output_string = first + mid + last