I am new to Python and am learning LSTM using Pandas with a sample project that I've modified from Github to use with my own data. I am running it on Kaggle.
For reference, the project is found here: https://github.com/abaranovskis-redsamurai/automation-repo/blob/master/forecast-lstm/forecast_lstm_shampoo_sales.ipynb
My data is simply a csv with dates and sales. Here's what the first few lines look like, with the date being YYYY-MM:
"date","num"
"1995-12",700
"1996-1",500
"1997-2",1300
"1996-3",2800
"1996-4",3500
The error I am getting says that "TypeError: float() argument must be a string or a number, not 'datetime.datetime'".
The code is here:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
import warnings
warnings.filterwarnings("ignore")
def parser(x):
return pd.datetime.strptime(x, '%Y-%m')
df = pd.read_csv('../input/smalltestb/smalltest1b.csv', parse_dates=[0], date_parser=parser)
df.tail()
train = df
scaler = MinMaxScaler()
scaler.fit(train)
train = scaler.transform(train)
n_input = 12
n_features = 1
generator = TimeseriesGenerator(train, train, length=n_input, batch_size=6)
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features)))
model.add(Dropout(0.15))
Finally, the error message:
TypeError Traceback (most recent call last)
/tmp/ipykernel_35/785266029.py in <module>
25
26 scaler = MinMaxScaler()
---> 27 scaler.fit(train)
28 train = scaler.transform(train)
29 n_input = 12
/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_data.py in fit(self, X, y)
334 # Reset internal state before fitting
335 self._reset()
--> 336 return self.partial_fit(X, y)
337
338 def partial_fit(self, X, y=None):
/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_data.py in partial_fit(self, X, y)
369 X = self._validate_data(X, reset=first_pass,
370 estimator=self, dtype=FLOAT_DTYPES,
--> 371 force_all_finite="allow-nan")
372
373 data_min = np.nanmin(X, axis=0)
/opt/conda/lib/python3.7/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
418 f"requires y to be passed, but the target y is None."
419 )
--> 420 X = check_array(X, **check_params)
421 out = X
422 else:
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
596 array = array.astype(dtype, casting="unsafe", copy=False)
597 else:
--> 598 array = np.asarray(array, order=order, dtype=dtype)
599 except ComplexWarning:
600 raise ValueError("Complex data not supported\n"
/opt/conda/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __array__(self, dtype)
1991
1992 def __array__(self, dtype: NpDtype | None = None) -> np.ndarray:
-> 1993 return np.asarray(self._values, dtype=dtype)
1994
1995 def __array_wrap__(
/opt/conda/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
TypeError: float() argument must be a string or a number, not 'datetime.datetime'
So, I decided to run just the import part and look at the head in another notebook. It didn't format correctly
date num
0 1995-12-01 00:00:00 700
1 1996-01-01 00:00:00 500
2 1996-02-01 00:00:00 1300
3 1996-03-01 00:00:00 2800
4 1997-04-01 00:00:00 3500
This is definitely not what I wanted (wanted YYYY-MM) and I know it's saved as such. I know this must be from the parser and it's not saving it to the dataframe in the way that I am expecting.
How do I address this? As a note, the guy on Github had this for is parser but it choked when I tried it:
def parser(x):
return pd.datetime.strptime('190'+x, '%Y-%m')
df = pd.read_csv('shampoo.csv', parse_dates=[0], index_col=0, date_parser=parser)
(He added '190' to the last digit of a year with a dash and a month number whereas I am using a year dash month number.)
Any suggestions? Thanks for having a look! Thanks!