Pandas : TypeError: float() argument must be a string or a number

Question

Pandas : TypeError: float() argument must be a string or a number

134k views Asked by Gingerbread At 21 December 2016 at 06:41

I have a dataframe that contains

user_id    date       browser  conversion  test  sex  age  country
   1    2015-12-03       IE        1         0    M   32.0   US

Here is my code:

from sklearn import tree
data['date'] = pd.to_datetime(data.date)
columns = [c for c in data.columns.tolist() if c not in ["test"]]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(data[columns], data["test"])

I am getting this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-560-95a8a54aa939> in <module>()
      4 from sklearn import tree
      5 clf = tree.DecisionTreeClassifier(max_depth=2, min_samples_leaf = (len(data)/100) )
----> 6 clf = clf.fit(data[columns],data["test"])

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\sklearn\tree\tree.pyc in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    152         random_state = check_random_state(self.random_state)
    153         if check_input:
--> 154             X = check_array(X, dtype=DTYPE, accept_sparse="csc")
    155             if issparse(X):
    156                 X.sort_indices()

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\sklearn\utils\validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    371                                       force_all_finite)
    372     else:
--> 373         array = np.array(array, dtype=dtype, order=order, copy=copy)
    374 
    375         if ensure_2d:

TypeError: float() argument must be a string or a number

How do I overcome this error?

Original Q&A

There are 3 answers

niowniow On 11 February 2021 at 12:52

A solution which keeps the date(time) column:

data['date'] = pd.to_numeric(pd.to_datetime(data['date']))

cottontail On 15 February 2023 at 19:41

Ideas to preserve datetime as features in the model

Assuming the dates are relevant only with respect to how much time has passed since the observation, a solution to keep the datetime column as a feature in the model is to convert it into time difference between now and the datetimes.

data['date'] = (pd.Timestamp('now') - pd.to_datetime(data['date'])).dt.total_seconds()

Or you can convert the datetimes into integers straight up.

data['date'] = pd.to_datetime(data['date']).astype('int64')

N.B. To convert strings to datetime, passing format= makes the conversion run much, much faster (25 times faster). See this post for the benchmark and see this post for ideas to pass the format if your datetime column doesn't have a uniform format.

**jezrael** · Accepted Answer · 2016-12-21T07:06:41+00:00

jezrael On 21 December 2016 at 07:06 BEST ANSWER

IIUC you need exclude column date also:

columns = [c for c in columns if c not in ["test", 'date']]

because error:

TypeError: float() argument must be a string or a number, not 'Timestamp'

TechQA.

Pandas : TypeError: float() argument must be a string or a number

There are 3 answers

Ideas to preserve datetime as features in the model

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DATETIME

Related Questions in DATA-SCIENCE

Popular Questions

Trending Questions