Extract string with date in Python

Asked by At

I have a list of strings in Python 2.7 like this:

lst = [u'Name1_Cap23_o2_A_20160830_20170831_test.tif', 
    u'Name0_Cap44_o6_B_20150907_20170707.tif',
    u'Name99_Vlog_o88_A_20180101_20180305_exten.tif']

What I would like to do is to extract only the string before the two dates so that I get a list like this:

lst = [u'Name1_Cap23_o2_A_20160830_20170831', 
    u'Name0_Cap44_o6_B_20150907_20170707',
    u'Name99_Vlog_o88_A_20180101_20180305']

What I know is how to extract the two dates with re package, but how can I get the list in the example above using datetime and re package. Does anyone have an idea how I could get the rest of the string?

from datetime import datetime
import re
from datetime import datetime
pattern = re.compile(r'(\d{8})_(\d{8})')
dates = pattern.search(lst[0])
startdate = datetime.strptime(dates.group(1), '%Y%m%d')
enddate = datetime.strptime(dates.group(2), '%Y%m%d')
datestring = format(startdate, '%Y%m%d') + "_" + format(startdate, '%Y%m%d')

2 Answers

2
The fourth bird On Best Solutions

If you only want to match the whole string from the start including the 2 dates you don't need to use a capturing group.

You could match 2 times an underscore and a digit and start the match from the start of the string matching 1+ times a word character \w+ which also matches an underscore.

^\w+_\d{8}_\d{8}

Regex demo | Python demo

For example:

lst = [u'Name1_Cap23_o2_A_20160830_20170831_test.tif',
       u'Name0_Cap44_o6_B_20150907_20170707.tif',
       u'Name99_Vlog_o88_A_20180101_20180305_exten.tif']

pattern = re.compile(r'^\w+_\d{8}_\d{8}')
pattern_list=map(lambda x: pattern.search(x).group(), lst)
print(pattern_list)

Result

[u'Name1_Cap23_o2_A_20160830_20170831', u'Name0_Cap44_o6_B_20150907_20170707', u'Name99_Vlog_o88_A_20180101_20180305']
1
Kamuffel On

Your regular expression was almost correct. I've updated your regular expression from (\d{8})_(\d{8}) to (.+\d{8})_(\d{8}). The added .+ means match any character atleast 1 or more times.

from datetime import datetime
import re

lst = [u'Name1_Cap23_o2_A_20160830_20170831_test.tif',
u'Name0_Cap44_o6_B_20150907_20170707.tif',
u'Name99_Vlog_o88_A_20180101_20180305_exten.tif']

# modify list
for i in range(len(lst)):
  # retrieve full name with date
  new_name_pattern = re.compile(r'(.+\d{8})_(\d{8})')
  new_name = new_name_pattern.search(lst[i])

  # replace current processed string
  lst[i] = new_name.group(1)

# print new list
for i in range(len(lst)):
  print lst[i]

An example can be found here: https://repl.it/repls/InternalOrchidVisitors