This is a simple code which uses regex to identify a pattern and termcolor to replace the pattern with the highlighted version of the same, basically used for highlighting a required text. The code seems to work fine with almost all the patterns. But while trying to identify 'dots (.)', the code seems to run indefinitly and crash jupyter kernel. It would be really heplfull if someone can help me with this. Thank you in advance.
import re
from termcolor import colored,cprint
text = "This is a sample text......"
pattern = re.compile(r"\.")
patternlist = pattern.findall(text)
# print(patternlist)
replacelist = [colored(i,"black", "on_yellow", attrs=["bold"]) for i in patternlist]
print(replacelist)
patterns = [i for i in zip(patternlist,replacelist)]
print(patterns)
for pattern, replacement in patterns:
text = re.sub(pattern, replacement, text)
print(text)
The pattern which I used was: pattern = re.compile(r"."). The findall function seems to be working fine as I am getting the result as expected: ['.', '.', '.', '.', '.', '.']. While I am expected to get the highlighted version as: This is a sample text......, I am not getting any result and the jupyter note seems to run indefinitly and crash.
I verified the pattern using online regex engine (https://regex101.com/) and seems to be working fine.
You are creating too large a replacement. Modifying your code a bit
Makes the "." in the sample text highlighted yellow. Elaborating, your
patternsisKeeping in mind that your new regex pattern is
.which replaces every character, as the characters used to form the colours causes an increase each time you repeat the loop you end up replacing every one of those characters with a new set of characters. You end up having an incredibly long string. At the first iteration, you end up with 27.s, at the second, you end up with513. And it multiplies each time.Edit for the new information:
Things to note is the use of
set, which shouldn't affect the output just the runtime and principle of not having multiple duplicates, and usingre.escapeto make sure any escapable items in the "pattern" that you get fromre.findallget escaped properly. Keep in mind that since you usedre.findallthe "is" in "This" also gets matched and thus highlighted.