How to resolve 'unsupported type NULL' error when running a datafusion pipeline

1.3k views Asked by At

I am working on extracting schema from .txt data into bigQuery by google cloud platform datafusion.

First the datafusion was created in developer mode.

Second, I pointed to Google Cloud Storage, where data was stored. and I converted it to JSON format and achieved 100% of the completion(which means is no space in the column).

And I connected the bigQuery in the sink in the DataFusion UI.

When I depoly and run the data pipeline, it runs for about five minutes. An error occurs after that.

2020-10-08 01:19:54,947 - ERROR [SparkRunnerphase- 
1:i.c.c.i.a.r.ProgramControllerServiceAdapter@93] - Spark program 'phase-1' failed with         
error: Unsupported type NULL. Please check the system logs for more details.
java.lang.IllegalStateException: Unsupported type NULL at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.getTableDataType(AbstractBigQuerySink.java:488) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.getTableDataType(AbstractBigQuerySink.java:484) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.generateTableFieldSchema(AbstractBigQuerySink.java:379) ~[na:na]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_265]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[na:1.8.0_265]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[na:1.8.0_265]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[na:1.8.0_265]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_265]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_265]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[na:1.8.0_265]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.getBigQueryTableFields(AbstractBigQuerySink.java:372) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.initOutput(AbstractBigQuerySink.java:156) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.BigQuerySink.prepareRunInternal(BigQuerySink.java:104) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:110) ~[na:na]
at io.cdap.plugin.gcp.bigquery.sink.AbstractBigQuerySink.prepareRun(AbstractBigQuerySink.java:72) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.lambda$prepareRun$0(WrappedBatchSink.java:52) ~[na:na]
at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[na:na]
at io.cdap.cdap.etl.common.plugin.StageLoggingCaller.call(StageLoggingCaller.java:40) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:51) ~[na:na]
at io.cdap.cdap.etl.common.plugin.WrappedBatchSink.prepareRun(WrappedBatchSink.java:37) ~[na:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.lambda$prepareRun$2(SubmitterPlugin.java:71) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext$2.run(AbstractContext.java:555) ~[na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~[na:na]
at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:211) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:550) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:538) ~[na:na]
at io.cdap.cdap.app.runtime.spark.BasicSparkClientContext.execute(BasicSparkClientContext.java:333) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na]
at io.cdap.cdap.etl.common.submit.SubmitterPlugin.prepareRun(SubmitterPlugin.java:69) ~[na:na]
at io.cdap.cdap.etl.common.submit.PipelinePhasePreparer.prepare(PipelinePhasePreparer.java:118) ~[na:na]
at io.cdap.cdap.etl.spark.AbstractSparkPreparer.prepare(AbstractSparkPreparer.java:85) ~[na:na]
at io.cdap.cdap.etl.spark.batch.SparkPreparer.prepare(SparkPreparer.java:89) ~[na:na]
at io.cdap.cdap.etl.spark.batch.ETLSpark.initialize(ETLSpark.java:112) ~[na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:131) ~[na:na]
at io.cdap.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:33) ~[na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:167) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService$2.initialize(SparkRuntimeService.java:162) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$initializeProgram$1(AbstractContext.java:644) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:604) ~[na:na]
at io.cdap.cdap.internal.app.runtime.AbstractContext.initializeProgram(AbstractContext.java:641) ~[na:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.initialize(SparkRuntimeService.java:433) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na]
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:208) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~

Thank you for reading. :)

1

There are 1 answers

0
oyugi.collins On

I faced this issue and resolved it by populating the relevant schema fields on the Output Schema Section. This answer here -> Google Cloud Data Fusion -- building pipeline from REST API endpoint source also assisted me in the process