Requirement failed: Nothing has been added to this summarizer

Question

Requirement failed: Nothing has been added to this summarizer

4.5k views Asked by wookieluvr13 At 12 November 2019 at 22:43

I am trying to test that pyspark is running properly on my system, but when I try to call fit on my data I get and error, "Requirement failed: Nothing has been added to this summarizer"

import findspark
import os
spark_location='/usr/local/spark/'
java8_location= '/usr/lib/jvm/java-8-openjdk-amd64'
os.environ['JAVA_HOME'] = java8_location
findspark.init(spark_home=spark_location)
import pyspark, itertools, string, datetime, math
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.sql import SparkSession
from pyspark.mllib.evaluation import RegressionMetrics
from pyspark.sql.functions import isnan, isnull, when, count, col

def main():
    spark = pyspark.sql.SparkSession.builder.appName("test").getOrCreate()
    sc = spark.sparkContext
    #data = spark.read.option("inferSchema", True).option("header", True).csv("ml-20m/ratings.csv").drop("timestamp")
    data = spark.read.option("inferSchema", True).option("header", True).csv("ml-20m/ratings_test.csv").drop("timestamp")
    train,test= data.randomSplit([0.8, 0.2])
    print("before als")
    als = ALS(userCol="userId", itemCol="movieId", ratingCol="rating", coldStartStrategy="drop", nonnegative=True)
    print("before param_grid")
    param_grid = ParamGridBuilder().addGrid(als.rank, [12,13,14]).addGrid(als.maxIter, [18,19,20]).addGrid(als.regParam, [.17,.18,.19]).build()





    #################### RMSE ######################
    print("before evaluator")
    evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating", predictionCol="prediction")
    print("before cv")
    cv = CrossValidator(estimator=als, estimatorParamMaps=param_grid, evaluator=evaluator, numFolds=3)
    print("before fit")
    model = cv.fit(train)

    model = model.bestModel
    print("before transform")
    predictions = model.transform(test)
    print("before rmse")   
    rmse = evaluator.evaluate(predictions)

    print("RMSE", rmse)
    print("rank", model.rank)
    print("MaxIter", model._java_obj.parent().getMaxIter())
    print("RegParam", model._java_obj.parent().getRegParam())

main()

I tested the dataframe to make sure there is no Null or NaN within the dataframe.

Original Q&A

There are 2 answers

**Joe Ganser** · Answer 1 · 2020-02-14T17:53:18+00:00

Ensure that your train and test set both contain at least 1 instance of the same user id. ALS cannot make predictions on UNSEEN user IDs. Thus, if your data set is very sparse its possible that the set of users in the train set has no overlap with the set of users in the test set.

I had the same error, and that was the cause. The solution was to make both data sets large enough to create an overlap of users.

**archit jain** · Answer 2 · 2021-06-10T12:01:21+00:00

archit jain On 10 June 2021 at 12:01

I had the same error, only to realize that my test set was empty (the split was not right)

Make sure your train set and test set have the items.

After you perform train,test= data.randomSplit([0.8, 0.2])
do train.show(), test.show()

TechQA.

Requirement failed: Nothing has been added to this summarizer

There are 2 answers

Related Questions in PYTHON

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Popular Questions

Trending Questions