Pyspark.pandas PandasNotImplementedError: The method `pd.Series.iter()` is not implemented. If you want to collect your data as an NumPy array

65 views Asked by mitchorek At 24 October 2023 at 00:09

I am new in Pyspark. I am using pyspark.pandas and want to test how it can be used with scipy library. I've got a really basic code of scipy and pandas:

import pandas as pd
from scipy.stats import pearsonr

# Example data
data = {
    'Age': [23, 45, 34, 65, 34, 29, 40],
    'Day_of_Week': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
}

df = pd.DataFrame(data)

# Assigning numerical values for the days of the week
days_of_week = {'Monday': 1, 'Tuesday': 2, 'Wednesday': 3, 'Thursday': 4, 'Friday': 5, 'Saturday': 6, 'Sunday': 7}
df['Day_of_Week_num'] = df['Day_of_Week'].map(days_of_week)

# Calculating Pearson correlation
corr, _ = pearsonr(df['Age'], df['Day_of_Week_num'])
print('Pearson correlation coefficient:', corr)

When I changed "pandas" to "pyspark.pandas", I received an error.

PandasNotImplementedError: The method pd.Series.__iter__() is not implemented. If you want to collect your data as an NumPy array.

I encountered an issue after changing "pandas" to "pyspark.pandas". I came across a similar problem in this PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array. However, the person applied the function over the entire dataframe, and not on the pyspark.pandas.Series. As far as I know, pd.Series.__iter__() is not implemented because it requires collecting the data to a single node, (such as iterating over a Series) . Therefore, I'm not sure if it is possible to implement scipy.stats.pearsonr on pyspark.pandas.Series.

Original Q&A

TechQA.

Pyspark.pandas PandasNotImplementedError: The method `pd.Series.iter()` is not implemented. If you want to collect your data as an NumPy array

There are 0 answers

Related Questions in PANDAS

Related Questions in PYSPARK

Related Questions in DATABRICKS

Related Questions in PYSPARK-PANDAS

Related Questions in SPARK-KOALAS

Popular Questions

Popular Tags

Trending Questions

Pyspark.pandas PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array

There are 0 answers

Related Questions in PANDAS

Related Questions in PYSPARK

Related Questions in DATABRICKS

Related Questions in PYSPARK-PANDAS

Related Questions in SPARK-KOALAS

Popular Questions

Popular Tags

Trending Questions

Pyspark.pandas PandasNotImplementedError: The method `pd.Series.iter()` is not implemented. If you want to collect your data as an NumPy array