dataframe or sqlctx (sqlcontext) generated "Trying to call a package" error

Question

dataframe or sqlctx (sqlcontext) generated "Trying to call a package" error

743 views Asked by Tara At 04 June 2015 at 14:56

I am using spark 1.3.1. In PySpark, I have created a DataFrame from a RDD and registered the schema, something like this :

dataLen=sqlCtx.createDataFrame(myrdd, ["id", "size"])
dataLen.registerTempTable("tbl")

at this point everything is fine I can make a "select" query from "tbl", for example "select size from tbl where id='abc'".

Then in a Python function, I define something like :

def  getsize(id):
    total=sqlCtx.sql("select size from tbl where id='" + id + "'")
    return total.take(1)[0].size

at this point still no problem, I can do getsize("ab") and it does return a value.

The problem occurred when I invoked getsize within a rdd, say I have a rdd named data which is of (key, value) list, when I do

data.map(lambda x: (x[0], getsize("ab"))

this generated an error which is

py4j.protocol.Py4JError: Trying to call a package

Any idea?

Original Q&A

There are 1 answers

**zero323** · Answer 1 · 2015-12-04T07:02:36+00:00

Spark doesn't support nested actions or transformations and SQLContext is not accessible outside the driver. So what you're doing here simply cannot work. It is not exactly clear what you want here but most likely a simple join, either on RDDs or DataFrames should do the trick.

TechQA.

dataframe or sqlctx (sqlcontext) generated "Trying to call a package" error

There are 1 answers

Related Questions in PYTHON

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in PYSPARK

Popular Questions

Popular Tags

Trending Questions