Pandas API support on Spark Connect

127 views Asked by At

I am trying to use Spark PANDAS API on Spark Connect but I am getting assertion error

assert isinstance(spark_frame, SparkDataFrame)
AssertionError

I dont get any error if I use the spark Dataframe API. Are Pandas-Spark API supported on Spark connect ?

Below is the code I am running.

import pyspark.pandas as pd
from pyspark.sql import Row
# Stopping regular Spark Session before trying the SPARK Connect Functionality
from pyspark.sql import SparkSession
SparkSession.builder.master("local[*]").getOrCreate().stop()
# Start the spark connect server running below
#./start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0

# Start Spark Session by Specifying the Spark Cluster Address ( local host.)
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(d)
print(df.head())

import pyspark.pandas as pd
from pyspark.sql import Row
# Stopping regular Spark Session before trying the SPARK Connect Functionality
from pyspark.sql import SparkSession
SparkSession.builder.master("local[*]").getOrCreate().stop()
# Start the spark connect server running below
#./start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0

# Start Spark Session by Specifying the Spark Cluster Address ( local host.)
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(d)
print(df.head())

'''
df = spark.createDataFrame([
    Row(a=1, b=2., c='string1'),
    Row(a=2, b=3., c='string2'),
    Row(a=4, b=5., c='string3')
])

df.show()

'''
0

There are 0 answers