Linked Questions

Popular Questions

How to fix "Null Pointer Exception" in Spark ML Pipeline?

Asked by At

I am performing image classification using spark's dataframes rasterframes . I am using a tif imagery as my sample image and a Geojson based training dataset with 4 classes created from the same file. For classification of imagery i am using DecisionTreeClassifier but as i try to build the model using all this data it throws error Caused by: java.lang.NullPointerException: Value at index 0 is null. The excetpion is being triggered from this line. here is the complete stacktrace for the error

Here is how i am loading the data:

object rasterframeclassification extends App {
  object Flattener extends TileReducer(
    (l: Int, r: Int) ⇒ if (isNoData(r)) l else r
  )(
    (l: Double, r: Double) ⇒ if (isNoData(r)) l else r
  )
  implicit val spark = SparkSession.builder().
    master("local").appName("rasterframeclassification").
    config("spark.ui.enabled", "false").
    getOrCreate().
    withRasterFrames
  import spark.implicits._
  implicit val bandCount = PairRDDConverter.forSpatialMultiband(2)
  val tiff =MultibandGeoTiff(getClass.getResource("/raster.tif").getPath)
  val filename = "../biggis-landuse/radar_data/raster.tif"
  val json = Filesystem.readText(getClass.getResource("/training_data.geojson").getPath)
  //println(json)
  val wgs84 = CRS.fromEpsgCode(4326)
  val features = json.extractFeatures[Feature[MultiPolygon, Map[String, String]]]()
  val featuresInt: Seq[Feature[MultiPolygon, Map[String, Int]]] =
    features.map(_.mapData(_.map { case (k, v) => k -> v.toInt }))
  val layers = for {
    f ← featuresInt
    pf = f.reproject(wgs84, tiff.crs)
    raster = pf.geom.rasterizeWithValue(tiff.rasterExtent, f.data("classes"), UByteUserDefinedNoDataCellType(255.toByte))
  } yield raster

  val result = Flattener(layers.map(_.tile))
  val targetCol = "finalraster"
  val training = SinglebandGeoTiff(result,tiff.extent,tiff.crs).  mapTile(_.convert(DoubleConstantNoDataCellType)).
    projectedRaster.toRF(908,597,targetCol)
  def rf =  MultibandGeoTiff(filename).projectedRaster.toRF
  val bandColNames = Array( "tile_1", "tile_2")

Here is how i am building the model:

val abt = rf.spatialJoin(training)
  val exploder = new TileExploder()

  val noDataFilter = new NoDataFilter().
    setInputCols(bandColNames :+ targetCol)
  val assembler = new VectorAssembler().
    setInputCols(bandColNames).
    setOutputCol("features")
  val classifier = new DecisionTreeClassifier().
    setLabelCol(targetCol).
    setFeaturesCol(assembler.getOutputCol)
  val pipeline = new Pipeline().
    setStages(Array(exploder, noDataFilter, assembler, classifier))
  val evaluator = new MulticlassClassificationEvaluator().
    setLabelCol(targetCol).
    setPredictionCol("prediction").
    setMetricName("accuracy")
  val paramGrid = new ParamGridBuilder().
    addGrid(classifier.maxDepth, Array(1,2,3,4)).
    build()
  val trainer = new CrossValidator().
    setEstimator(pipeline).
    setEvaluator(evaluator).
    setEstimatorParamMaps(paramGrid).
    setNumFolds(4)
  val model = trainer.fit(abt)

Related Questions