Error to import rss feeds tag pubDate into MySQL database using python

261 views Asked by At

I have an issue about to insert the tag pubDate in my table mysql, actually I am trying to put into my table noticias the tags (title, link and pubDate) and the last tag (PubDate) have problem.

I explain the code:

  1. the first step read a page rss and write a xml file

  2. the second step generate a csv file with only 3 tag (title, link and pubDate) note: In this code i need to use: item.findtext('pubDate') because if i use item.find('pubDate').text this generate a error although the file is generated correctly using both cases.

  3. and the last step to stored the information the file csv into my table in mysql.

In this step I received a next error:

Connected to pydev debugger (build 171.4694.38)
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
pydev_imports.execfile(file, globals, locals)  # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 78, in <module>
main()
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 72, in main
testdb()
File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 56, in testdb
(r[1:] for r in csv_data.itertuples()))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 654, in executemany
return self.execute(stmt)
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 551, in execute
self._handle_result(self._connection.cmd_query(stmt))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 490, in cmd_query
result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 395, in _handle_result
raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column 'nan' in 'field list'

Process finished with exit code 1

I think that problem to be on pubDate because if I run the program in two parts:

first part:

Create xml and CSV but change the parameter to pubDate by: item.find('pubDate').text generate the file xml and csv successfully but the code show a error about pubdate.

second part:

insert into mysql from the csv file created in the first step. the program run successfully and without error. check my database and the information is loaded.

but in this option, i can´t to run both program in the same file because the error dont permit continue and it dont permit execute the part about insert into the database.

then the error actually it´s about this code:

# Codigo Python que crea un XML CSV e inserta a una BD MYSQL.
# Llamamos los modulos que necesitamos para ejecutar este script
import csv
import MySQLdb
import requests
import xml.etree.ElementTree as ET
import mysql.connector
import pandas as pd


def loadRSS():
    # Configuramos la URL del rss de CNN
    url = 'http://rss.cnn.com/rss/edition.xml'

    # Creamos un objeto con el que vamos a obtener la url de la variable declarada hace un momento
    resp = requests.get(url)

    # Procedemos a guardar la informacion en un archivo llamado cnn.XML
    with open('cnn.xml', 'wb') as f:
        f.write(resp.content)


def loadcsv():
    tree = ET.parse("cnn.xml")
    root = tree.getroot()

    d = open('cnn.csv', 'w')

    csvwriter = csv.writer(d)

    count = 0

    head = ['title', 'link', 'pubDate']

    csvwriter.writerow(head)

    for item in root.findall('./channel/item'):
        row = []
        title_name = item.find('title').text
        row.append(title_name)
        link_name = item.find('link').text
        row.append(link_name)
        pubDate_name = item.findtext('pubDate')
        row.append(pubDate_name)
        csvwriter.writerow(row)
    d.close()

def testdb():
    cnx = mysql.connector.connect(user='root', password='password', host='localhost', database='cnn')
    cursor = cnx.cursor()
    csv_data = pd.read_csv('cnn.csv')

    for row in csv_data.iterrows():
        cursor.executemany(
            "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
            (r[1:] for r in csv_data.itertuples()))

    cnx.commit()
    cursor.close()
    cnx.close()

    #connection = MySQLdb.Connect(host='localhost', user='root', passwd='password', db='cnn')
    #cursor = connection.cursor()
    #query = "LOAD DATA INFILE 'cnn.csv' INTO TABLE noticias(title, link, pubdate)"
    #cursor.execute(query)
    #connection.commit()

def main():
    # Inicializamos los modulos definidos en el programa.
    loadRSS()
    loadcsv()
    testdb()



if __name__ == "__main__":
    # llamamos el metodo main
    main()

somebody does have an idea about this error.

Update: i add the line:

print(csv_data.head())

Add the output you comment and the result of the debugger is:

Connected to pydev debugger (build 171.4694.38)
                                               title  \
0  Bloodied and broken: The battle against ISIS i...   
1                            The human cost of ISIS    
2                  $1B deal to prop up UK government   
3               Netanyahu freezes Western Wall plans   
4  Only a 'couple of hundred' ISIS fighters left ...   

                                                link  \
0                              http://cnn.it/2sbE6fp   
1  http://www.cnn.com/videos/world/2017/06/25/phi...   
2  http://www.cnn.com/2017/06/26/europe/theresa-m...   
3  http://www.cnn.com/2017/06/26/middleeast/weste...   
4  http://www.cnn.com/2017/06/26/middleeast/coupl...   

                            date  
0                            NaN  
1  Mon, 26 Jun 2017 08:49:00 GMT  
2  Mon, 26 Jun 2017 11:59:24 GMT  
3  Mon, 26 Jun 2017 13:09:30 GMT  
4  Mon, 26 Jun 2017 13:16:21 GMT  
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 80, in <module>
    main()
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 74, in main
    testdb()
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 58, in testdb
    (r[1:] for r in csv_data.itertuples()))
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 654, in executemany
    return self.execute(stmt)
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 551, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 490, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 395, in _handle_result
    raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column 'nan' in 'field list'

Process finished with exit code 1

UPDATE 27/06/2017:

I add the part of the testdb and now is this:

def testdb():
    cnx = mysql.connector.connect(user='root', password='password', host='localhost', database='cnn')
    cursor = cnx.cursor()

    with open('cnn.csv') as fh:
        cursor.executemany(
            "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
            [tuple(row) for row in csv.reader(fh)]
        )

    cnx.commit()
    cursor.close()
    cnx.close()

when i debbug the program the error is:

Connected to pydev debugger (build 171.4694.38)
Traceback (most recent call last):
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 75, in __call__
    return bytes(self.params[index])
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 79, in <module>
    main()
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 73, in main
    testdb()
  File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 56, in testdb
    [tuple(row) for row in csv.reader(fh)]
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 652, in executemany
    stmt = self._batch_insert(operation, seq_params)
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 594, in _batch_insert
    tmp = RE_PY_PARAM.sub(psub, tmp)
  File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 78, in __call__
    "Not enough parameters for the SQL statement")
mysql.connector.errors.ProgrammingError: Not enough parameters for the SQL statement

Process finished with exit code 1

I don't know if i forgot add something.

1

There are 1 answers

2
stovfl On BEST ANSWER

Comment: ... but the error now is

Only the first Error is relevant: IndexError: tuple index out of range
The CSV Data must be wrong, check before passing to MySQL:

import csv
records = []
with open('test/cnn.csv') as fh:
    for row in csv.reader(fh):
        _tuple = tuple(row)
        if len(_tuple) == 3:
            records.append(_tuple)
        else:
            print('[FAIL]: Tupel Length not 3, found {} in {}'.format(len(_tuple), _tuple))

cursor.executemany("INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)", records)

Comment: Error: Not all parameters were used in the SQL statement
According to MySQL Connector/Python Developer Guide: 10.5.5 - MySQLCursor.executemany() Method:

data = [
  ('Jane', date(2005, 2, 12)),
  ('Joe', date(2006, 5, 23)),
  ('John', date(2010, 10, 3)),
]
stmt = "INSERT INTO employees (first_name, hire_date) VALUES (%s, %s)"
cursor.executemany(operation, seq_of_params)

seq_of_params have to be a List of Tuples

Therfore you need no for Loop to iterate CSV Rows Data, you have to pass the whole CSV Data as a List of Tuples. Second use csv module instead of pandas. Change to:

import csv
with open('cnn.csv') as fh:
    cursor.executemany(
        "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
        [tuple(row) for row in csv.reader(fh)]
    )

Tested with Python: 3.4.2


Question: somebody does have an idea about this error.

Unknown column 'nan' in 'field list'

This part of your code is wrong. You are iterating csv_data twice.

for row in csv_data.iterrows():
    cursor.executemany(
        "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
        (r[1:] for r in csv_data.itertuples()))

Can't tell if this lead to the above Error but you should change to the following and retry to verify if the Error persists:

for row in csv_data.iterrows():
    cursor.executemany(
        "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
        ((value for value in row[1]))