Using regular expressions in Python to find specific word

42 views Asked by At

I have following lines (the order of lines can be different, there can be other similar lines as well). And I would like to replace "sid" with "tempvalue" taking into an account that "sid" can be surrounded by any symbol except for letters and digits. How to do that on Python using regular expression?

lines = [
 "VAR0=sid_host1; -",
 "VAR1=sid; -",
 "VAR2=psid; -",
 "VAR3=sid_host1; -",
 "VAR4=psid_host2; -",
 "VAR5 = (file=/dir1/sid_host1/sid/trace/alert_sid.log)(database=sid)"
]

For line 0 desired result is: "VAR0=tempvalue_host1; -"

for line 1: "VAR1=tempvalue; -"

for line 3: "VAR3=tempvalue_host1; -"

for line 5: "VAR5 = (file=/dir1/tempvalue_host1/tempvalue/trace/alert_tempvalue.log)(database=tempvalue)"

Other lines must remain untouched.

1

There are 1 answers

0
Tim Biegeleisen On BEST ANSWER

I think we can just do a regex replace all on (?<![^\W_])sid(?![^\W_]) here:

lines = [
    "VAR0=sid_host1; -",
    "VAR1=sid; -",
    "VAR2=psid; -",
    "VAR3=sid_host1; -",
    "VAR4=psid_host2; -",
    "VAR5 = (file=/dir1/sid_host1/sid/trace/alert_sid.log)(database=sid)"
]

lines = [re.sub(r'(?<![^\W_])sid(?![^\W_])', 'tempvalue', x) for x in lines]
print(lines)

['VAR0=tempvalue_host1; -',
 'VAR1=tempvalue; -',
 'VAR2=psid; -',
 'VAR3=tempvalue_host1; -',
 'VAR4=psid_host2; -',
 'VAR5 = (file=/dir1/tempvalue_host1/tempvalue/trace/alert_tempvalue.log)(database=tempvalue)']

Explanation of regex:

  • (?<![^\W_]) preceding character is either a non word OR underscore
  • sid match literal sid
  • (?![^\W_]) following character is either a non word OR underscore

Note that we are basically building our own custom word boundaries which admit either \W or underscore (usually underscore is a word character).