data_2['col1'] = np.where((df1.year.astype(int) == 2021) & (df1.col1_y.notna()), df1.col1_y, data_2.col1)

This is my original working code in Gen1, but it isgiving following error in gen2.

PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.

I tried adding .to_numpy() but I got another error which is below.

data_2['col1'] = np.where((df1.year.to_numpy().astype(int) == 2021) & (df1.col1_y.notna()), df1.col1_y, data_2.col1)

AttributeError: 'numpy.ndarray' object has no attribute '_internal'

I could not understand why is it looking for _internal. Could someone help in resolution of this error?

I tried

data_2['col1'] = np.where((df1.year.astype(int) == 2021) & (df1.col1_y.notna()), df1.col1_y, data_2.col1) 

this code first and then

data_2['col1'] = np.where((df1.year.to_numpy().astype(int) == 2021) & (df1.col1_y.notna()), df1.col1_y, data_2.col1)

this code.

The value in column col1 should be as per condition mentioned in code, but instead I am getting error.

1

There are 1 answers

0
JayashankarGS On BEST ANSWER

Convert pyspark-pandas to pandas dataframe and execute your code, then it will work successfully.

df1 = df1.to_pandas()
data_2 = data_2.to_pandas()
data_2['col1'] = np.where((df1.year.astype(int) == 2021) & (df1.col1_y.notna()), df1.col1_y, data_2.col1)
data_2

enter image description here