How to remove # from hashtag using Python RegEx

798 views Asked by At

My requirement is to remove leading "#" symbol from hashtags in a text. For example, sentence: I'm feeling #blessed. should transform to I'm feeling blessed.

I have written this function, but I'm sure I can achieve the same with a simpler logic in RegEx.

  clean_sentence = ""
  space = " "
  for token in sentence.split():
    if token[0] is '#':
      token = token[1:]
    clean_sentence += token + space
  return clean_sentence

Need help here!!

2

There are 2 answers

0
Onno Rouast On BEST ANSWER

The regex provided by by @Tim #(\S+) would also match hashtags in non-starting position if they have another non-whitespace character \S behind them, e.g. as in so#blessed.

We can prevent this by adding a negative lookbehind (?<!\S) before the hash, so that it can't be preceded by anything that is not a whitespace.

inp = "#I'm #feeling #blessed so#blessed .#here#."
output = re.sub(r'(?<!\S)#(\S+)', r'\1', inp)
print(output)

output:

I'm feeling blessed so#blessed .#here#.
1
Tim Biegeleisen On

You may use re.sub as follows:

inp = "I'm feeling #blessed."
output = re.sub(r'#(\S+)', r'\1', inp)
print(output)  # I'm feeling blessed.