I am using Zeppelin notebook with Spark1.3.1 and Hadoop version 2.6. I am able to run entire tutorial shipped with it without any issues.
Then I created new notebook to run simple code that fetches data from parquet file stored in HDFS on local machine. Following is code:
val param_alarms = sqlc.parquetFile("hdfs://localhost/ge_alarm/alarm_param")
param_alarms.registerTempTable("Alarm")
param_alarms.count()
This fails with following error message:
param_alarms: org.apache.spark.sql.DataFrame = [PARAMETER_ID: int, PARAMETER_NAME: string, OSM_NAME: string, DESCRIPTION: string, TIMESTAMP: bigint, Value: int]
java.lang.NoSuchMethodError: org.json4s.JsonDSL$.string2jvalue(Ljava/lang/String;)Lorg/json4s/JsonAST$JValue;
at org.apache.spark.sql.types.StructType$$anonfun$jsonValue$5.apply(dataTypes.scala:1065)
at org.apache.spark.sql.types.StructType$$anonfun$jsonValue$5.apply(dataTypes.scala:1065)
at org.json4s.JsonDSL$JsonAssoc.$tilde(JsonDSL.scala:86)
at org.apache.spark.sql.types.StructType.jsonValue(dataTypes.scala:1065)
at org.apache.spark.sql.types.StructType.jsonValue(dataTypes.scala:1015)
at org.apache.spark.sql.types.DataType.json(dataTypes.scala:265)
at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToString(ParquetTypes.scala:404)
at org.apache.spark.sql.parquet.ParquetRelation2.buildScan(newParquet.scala:437)
at org.apache.spark.sql.sources.DataSourceStrategy$$anonfun$1.apply(DataSourceStrategy.scala:38)
at org.apache.spark.sql.sources.DataSourceStrategy$$anonfun$1.apply(DataSourceStrategy.scala:38)
at org.apache.spark.sql.sources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:107)
at org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:34)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:152)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1081)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1079)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1085)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1085)
at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:815)
at org.apache.spark.sql.DataFrame.count(DataFrame.scala:827)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:50)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:52)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:54)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:56)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:60)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:62)
at $iwC$$iwC$$iwC.<init>(<console>:64)
at $iwC$$iwC.<init>(<console>:66)
at $iwC.<init>(<console>:68)
at <init>(<console>:70)
at .<init>(<console>:74)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:582)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:558)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:551)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:277)
at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I also tried with adding following dependency:
%dep
z.reset()
z.load("org.json4s:json4s-native_2.11:3.2.10")
By the way when I try running %sql select * from Alarm
i get following error:
java.lang.reflect.InvocationTargetException
No luck with this either. Can someone help?
One on colleagues finally found solution, root cause it incompatible json4s. In case others are facing same problem. It was caused with incompatible json4s. To fix this issue I needed to update following:
Modify zeppelin-server/pom.xml to use swagger-jersey-jaxrs_2.10 version 1.3.11
Modify zeppelin-engine/pom.xml to use org.reflections version 0.9.9-RC1
export MAVEN_OPTS='-Xmx2048m -XX:MaxPermSize=2048m'
Compile for hadoop2.6 and Spark1.3 mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
One on colleagues finally found solution, root cause it incompatible json4s. In case others are facing same problem. It was caused with incompatible json4s. To fix this issue I needed to update following:
Modify zeppelin-server/pom.xml to use swagger-jersey-jaxrs_2.10 version 1.3.11
Modify zeppelin-engine/pom.xml to use org.reflections version 0.9.9-RC1
export MAVEN_OPTS='-Xmx2048m -XX:MaxPermSize=2048m'
Compile for hadoop2.6 and Spark1.3 mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests