I have an issue about to insert the tag pubDate in my table mysql, actually I am trying to put into my table noticias the tags (title, link and pubDate) and the last tag (PubDate) have problem.
I explain the code:
the first step read a page rss and write a xml file
the second step generate a csv file with only 3 tag (title, link and pubDate) note: In this code i need to use:
item.findtext('pubDate')
because if i useitem.find('pubDate').text
this generate a error although the file is generated correctly using both cases.and the last step to stored the information the file csv into my table in mysql.
In this step I received a next error:
Connected to pydev debugger (build 171.4694.38)
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 78, in <module>
main()
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 72, in main
testdb()
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 56, in testdb
(r[1:] for r in csv_data.itertuples()))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 654, in executemany
return self.execute(stmt)
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 551, in execute
self._handle_result(self._connection.cmd_query(stmt))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 490, in cmd_query
result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 395, in _handle_result
raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column 'nan' in 'field list'
Process finished with exit code 1
I think that problem to be on pubDate because if I run the program in two parts:
first part:
Create xml and CSV but change the parameter to pubDate by: item.find('pubDate').text
generate the file xml and csv successfully but the code show a error about pubdate.
second part:
insert into mysql from the csv file created in the first step. the program run successfully and without error. check my database and the information is loaded.
but in this option, i can´t to run both program in the same file because the error dont permit continue and it dont permit execute the part about insert into the database.
then the error actually it´s about this code:
# Codigo Python que crea un XML CSV e inserta a una BD MYSQL.
# Llamamos los modulos que necesitamos para ejecutar este script
import csv
import MySQLdb
import requests
import xml.etree.ElementTree as ET
import mysql.connector
import pandas as pd
def loadRSS():
# Configuramos la URL del rss de CNN
url = 'http://rss.cnn.com/rss/edition.xml'
# Creamos un objeto con el que vamos a obtener la url de la variable declarada hace un momento
resp = requests.get(url)
# Procedemos a guardar la informacion en un archivo llamado cnn.XML
with open('cnn.xml', 'wb') as f:
f.write(resp.content)
def loadcsv():
tree = ET.parse("cnn.xml")
root = tree.getroot()
d = open('cnn.csv', 'w')
csvwriter = csv.writer(d)
count = 0
head = ['title', 'link', 'pubDate']
csvwriter.writerow(head)
for item in root.findall('./channel/item'):
row = []
title_name = item.find('title').text
row.append(title_name)
link_name = item.find('link').text
row.append(link_name)
pubDate_name = item.findtext('pubDate')
row.append(pubDate_name)
csvwriter.writerow(row)
d.close()
def testdb():
cnx = mysql.connector.connect(user='root', password='password', host='localhost', database='cnn')
cursor = cnx.cursor()
csv_data = pd.read_csv('cnn.csv')
for row in csv_data.iterrows():
cursor.executemany(
"INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
(r[1:] for r in csv_data.itertuples()))
cnx.commit()
cursor.close()
cnx.close()
#connection = MySQLdb.Connect(host='localhost', user='root', passwd='password', db='cnn')
#cursor = connection.cursor()
#query = "LOAD DATA INFILE 'cnn.csv' INTO TABLE noticias(title, link, pubdate)"
#cursor.execute(query)
#connection.commit()
def main():
# Inicializamos los modulos definidos en el programa.
loadRSS()
loadcsv()
testdb()
if __name__ == "__main__":
# llamamos el metodo main
main()
somebody does have an idea about this error.
Update: i add the line:
print(csv_data.head())
Add the output you comment and the result of the debugger is:
Connected to pydev debugger (build 171.4694.38)
title \
0 Bloodied and broken: The battle against ISIS i...
1 The human cost of ISIS
2 $1B deal to prop up UK government
3 Netanyahu freezes Western Wall plans
4 Only a 'couple of hundred' ISIS fighters left ...
link \
0 http://cnn.it/2sbE6fp
1 http://www.cnn.com/videos/world/2017/06/25/phi...
2 http://www.cnn.com/2017/06/26/europe/theresa-m...
3 http://www.cnn.com/2017/06/26/middleeast/weste...
4 http://www.cnn.com/2017/06/26/middleeast/coupl...
date
0 NaN
1 Mon, 26 Jun 2017 08:49:00 GMT
2 Mon, 26 Jun 2017 11:59:24 GMT
3 Mon, 26 Jun 2017 13:09:30 GMT
4 Mon, 26 Jun 2017 13:16:21 GMT
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 80, in <module>
main()
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 74, in main
testdb()
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 58, in testdb
(r[1:] for r in csv_data.itertuples()))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 654, in executemany
return self.execute(stmt)
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 551, in execute
self._handle_result(self._connection.cmd_query(stmt))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 490, in cmd_query
result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 395, in _handle_result
raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column 'nan' in 'field list'
Process finished with exit code 1
UPDATE 27/06/2017:
I add the part of the testdb and now is this:
def testdb():
cnx = mysql.connector.connect(user='root', password='password', host='localhost', database='cnn')
cursor = cnx.cursor()
with open('cnn.csv') as fh:
cursor.executemany(
"INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
[tuple(row) for row in csv.reader(fh)]
)
cnx.commit()
cursor.close()
cnx.close()
when i debbug the program the error is:
Connected to pydev debugger (build 171.4694.38)
Traceback (most recent call last):
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 75, in __call__
return bytes(self.params[index])
IndexError: tuple index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 79, in <module>
main()
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 73, in main
testdb()
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 56, in testdb
[tuple(row) for row in csv.reader(fh)]
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 652, in executemany
stmt = self._batch_insert(operation, seq_params)
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 594, in _batch_insert
tmp = RE_PY_PARAM.sub(psub, tmp)
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 78, in __call__
"Not enough parameters for the SQL statement")
mysql.connector.errors.ProgrammingError: Not enough parameters for the SQL statement
Process finished with exit code 1
I don't know if i forgot add something.
Only the first Error is relevant:
IndexError: tuple index out of range
The CSV Data must be wrong, check before passing to MySQL:
Therfore you need no
for
Loop to iterate CSV Rows Data, you have to pass the whole CSV Data as a List of Tuples. Second usecsv module
instead ofpandas
. Change to:Tested with Python: 3.4.2
This part of your code is wrong. You are iterating
csv_data
twice.Can't tell if this lead to the above Error but you should change to the following and retry to verify if the Error persists: