TypeError: float() argument must be a string or a number, not 'datetime.datetime'?

712 views Asked by At

I am new to Python and am learning LSTM using Pandas with a sample project that I've modified from Github to use with my own data. I am running it on Kaggle.

For reference, the project is found here: https://github.com/abaranovskis-redsamurai/automation-repo/blob/master/forecast-lstm/forecast_lstm_shampoo_sales.ipynb

My data is simply a csv with dates and sales. Here's what the first few lines look like, with the date being YYYY-MM:

"date","num"
"1995-12",700
"1996-1",500
"1997-2",1300
"1996-3",2800
"1996-4",3500

The error I am getting says that "TypeError: float() argument must be a string or a number, not 'datetime.datetime'".

The code is here:

import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
import warnings
warnings.filterwarnings("ignore")


def parser(x):
    return pd.datetime.strptime(x, '%Y-%m')

df = pd.read_csv('../input/smalltestb/smalltest1b.csv', parse_dates=[0], date_parser=parser)
df.tail()

train = df

scaler = MinMaxScaler()
scaler.fit(train)
train = scaler.transform(train)
n_input = 12
n_features = 1
generator = TimeseriesGenerator(train, train, length=n_input, batch_size=6)
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features)))
model.add(Dropout(0.15))

Finally, the error message:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_35/785266029.py in <module>
     25 
     26 scaler = MinMaxScaler()
---> 27 scaler.fit(train)
     28 train = scaler.transform(train)
     29 n_input = 12

/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_data.py in fit(self, X, y)
    334         # Reset internal state before fitting
    335         self._reset()
--> 336         return self.partial_fit(X, y)
    337 
    338     def partial_fit(self, X, y=None):

/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_data.py in partial_fit(self, X, y)
    369         X = self._validate_data(X, reset=first_pass,
    370                                 estimator=self, dtype=FLOAT_DTYPES,
--> 371                                 force_all_finite="allow-nan")
    372 
    373         data_min = np.nanmin(X, axis=0)

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    418                     f"requires y to be passed, but the target y is None."
    419                 )
--> 420             X = check_array(X, **check_params)
    421             out = X
    422         else:

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    596                     array = array.astype(dtype, casting="unsafe", copy=False)
    597                 else:
--> 598                     array = np.asarray(array, order=order, dtype=dtype)
    599             except ComplexWarning:
    600                 raise ValueError("Complex data not supported\n"

/opt/conda/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __array__(self, dtype)
   1991 
   1992     def __array__(self, dtype: NpDtype | None = None) -> np.ndarray:
-> 1993         return np.asarray(self._values, dtype=dtype)
   1994 
   1995     def __array_wrap__(

/opt/conda/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

TypeError: float() argument must be a string or a number, not 'datetime.datetime'

So, I decided to run just the import part and look at the head in another notebook. It didn't format correctly

    date    num
0   1995-12-01 00:00:00 700
1   1996-01-01 00:00:00 500
2   1996-02-01 00:00:00 1300
3   1996-03-01 00:00:00 2800
4   1997-04-01 00:00:00 3500

This is definitely not what I wanted (wanted YYYY-MM) and I know it's saved as such. I know this must be from the parser and it's not saving it to the dataframe in the way that I am expecting.

How do I address this? As a note, the guy on Github had this for is parser but it choked when I tried it:

def parser(x):
    return pd.datetime.strptime('190'+x, '%Y-%m')

df = pd.read_csv('shampoo.csv', parse_dates=[0], index_col=0, date_parser=parser)

(He added '190' to the last digit of a year with a dash and a month number whereas I am using a year dash month number.)

Any suggestions? Thanks for having a look! Thanks!

0

There are 0 answers