I am trying to use Spark PANDAS API on Spark Connect but I am getting assertion error
assert isinstance(spark_frame, SparkDataFrame)
AssertionError
I dont get any error if I use the spark Dataframe API. Are Pandas-Spark API supported on Spark connect ?
Below is the code I am running.
import pyspark.pandas as pd
from pyspark.sql import Row
# Stopping regular Spark Session before trying the SPARK Connect Functionality
from pyspark.sql import SparkSession
SparkSession.builder.master("local[*]").getOrCreate().stop()
# Start the spark connect server running below
#./start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0
# Start Spark Session by Specifying the Spark Cluster Address ( local host.)
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(d)
print(df.head())
import pyspark.pandas as pd
from pyspark.sql import Row
# Stopping regular Spark Session before trying the SPARK Connect Functionality
from pyspark.sql import SparkSession
SparkSession.builder.master("local[*]").getOrCreate().stop()
# Start the spark connect server running below
#./start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0
# Start Spark Session by Specifying the Spark Cluster Address ( local host.)
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(d)
print(df.head())
'''
df = spark.createDataFrame([
Row(a=1, b=2., c='string1'),
Row(a=2, b=3., c='string2'),
Row(a=4, b=5., c='string3')
])
df.show()
'''