How can I make pandas.read_sql_query ignore the datetime datatype of a column in the source database?

488 views Asked by At

The error that I get is "ValueError: month must be in 1..12"

I know that there is "weird" data in that column and I want pandas to treat it as text or ignore the rows with errors and populate the df with the remainder.

(Not a big deal but would be good to avoid having to code an exception for this column in this table when hundreds of others work just fine)

query = 'SELECT troubledDateCol FROM table'
data = pd.read_sql_query(query, cnxn, dtype={'troubledDateCol': pd.StringDtype})

How can I make pandas.read_sql_query ignore the datetime datatype of a column in the source database?

I have tried all sorts to work around this:

  • parse_dates={"troubledDateCol": {"errors": "ignore", "format": "various%formats%that&might&work"}}
  • parse_dates={"troubledDateCol": {"errors": "coerce", "format": "various%formats%that&might%work"}}
  • parse_dates={"troubledDateCol": {"errors": "coerce", "date_parser": myCustomParser"}}
    • Debugging and putting a breakpoint here shows that the error occurs before the custom parser is invoked
  • CASTing and CONVERTing troubledDateCol to VARCHAR in the query
    • This doesn't work because the database is a pernickety, proprietary, flat file DB developed by Sage accounts, based on Pervasive SQL and these fail on execution of the query before pandas tried to load the data
  • Trying to use SQLalchemy and pyOdbc
    • I wasn't able to successfully connect using these despite extensive efforts and found a fair amount of other people having had the same issues
0

There are 0 answers