TypeError in cudf.pandas

188 views Asked by At

I am using the Rapids library for Pandas. On top of my code:

import cudf.pandas

cudf.pandas.install()
import pandas as pd

I'm using https://clickhouse-driver.readthedocs.io/en/latest/ and client.insert_dataframe method. When I'm trying to insert data, faced with the error:

TypeError: Unsupported column type: <class 'cudf.pandas._wrappers.numpy.ndarray'>. list or tuple is expected.

I don't know why is happening, checked datetypes of columns they are floats and datetimes.

Please tell me how to convert a dataframe to a view that can be inserted with clickhouse-client

2

There are 2 answers

0
TaureanDyerNV On

According to the docs, the proper way to use cudf.pandas on your code is to:

Just %load_ext cudf.pandas in Jupyter, or pass -m cudf.pandas on the command line.

Please see https://docs.rapids.ai/api/cudf/stable/cudf_pandas/usage for more detailed usage examples.

0
Ashwin Srinath On

(I'm a maintainer for cuDF)

The issue is with cudf.pandas, and it is that when cudf.pandas is enabled, Series.values returns a proxy type rather than a true numpy ndarray:

import cudf.pandas
cudf.pandas.install()

import pandas as pd

print(type(pd.Series([1, 2, 3]).values))
cudf.pandas._wrappers.numpy.ndarray

My guess is that somewhere in the clickhouse-driver codebase, there's an instancecheck for np.ndarray.

We're thinking of solutions. Ideally though, more projects would avoid doing a hard instancecheck() for np.ndarray, and instead allow array-like objects using something like:

arr = np.asarray(obj)

I've opened an issue against cudf where you can track progress: https://github.com/rapidsai/cudf/issues/14537