I'm sure I am missing something quite trivial here so I have the below code
date1 = re.findall('0000(.*)', date1.encode('utf-8'))
str1 = '-'.join(date1)
print str1
print type(str1)
dt = datetime.strptime(str1,"%B %d, %Y ")
and I get an error of
ValueError: time data '' does not match format '%B %d, %Y '
it seems as if str1 is empty so I checked it with
print str1
print type(str1)
and get the following results
October 24, 2014
<type 'str'>
I cant work out why it thinks str1 is empty any ideas?
Appended full code
from bs4 import BeautifulSoup
import wikipedia
import re
from datetime import datetime
acq = wikipedia.page('List_of_mergers_and_acquisitions_by_Google')
test = acq.html()
#print test
##html = acq.html()
soup = BeautifulSoup(test)
table = soup.find('table', {'class' : 'wikitable sortable'})
company = ""
date1 = ""
for row in table.findAll('tr'):
cells = row.findAll('td')
if len(cells) == 8:
date1 = cells[1].get_text()
company = cells[2].get_text()
##print date
date1 = re.findall('0000(.*)', date1)
str1 = ''.join(date1)
print str1
print type(str1)
dt = datetime.strptime(str1,"%B %d, %Y ")