I try to execute the code and i get the the followind errors: java.lang.OutOfMemoryError: Java heap space org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 4

The code can execute on small files(some kb), but on "big" files(5mb) i get error. I try to increase the VM memory and spark.driver.memory but i have the same errors again.

parkConf sparkConf = new SparkConf().setAppName("aName");
        JavaSparkContext sc = new JavaSparkContext(sparkConf);
        JavaRDD<String> lines = sc.textFile(args[0]);
        JavaPairRDD<String, String> edges = lines.flatMapToPair(t -> {
            List<Tuple2<String,String>> result = new ArrayList<>();
            if(!t.contains("#")) {
                String [] nodes = SPACE.split(t);
                if(Long.parseLong(nodes[0])<Long.parseLong(nodes[1])) {
                    result.add(new Tuple2<>(nodes[0], nodes[1]));
                } else {
                    result.add(new Tuple2<>(nodes[1], nodes[0]));
            return result.iterator();
        JavaPairRDD<String, String> edgesReverse = edges.mapToPair(t -> {
            return new Tuple2<>(t._2(), t._1());
        JavaPairRDD<String, Tuple2<String, String>> rdd1 = edges.join(edgesReverse);
        JavaPairRDD<String, Tuple2<String, String>> rdd2 = edges.join(edges);
        JavaPairRDD<String, Tuple2<String, String>> allRDD = rdd1.union(rdd2).distinct();
        JavaPairRDD<Tuple2<String, String>,Double> commonNeighbors = allRDD.mapToPair(t -> {
            if(Long.parseLong(t._2()._1())<Long.parseLong(t._2()._2())) {
                return new Tuple2<>(t._2()._1(),t._2()._2());
            } else {
                return new Tuple2<>(t._2()._2(),t._2()._1());
            return new Tuple2<>(t,Double.parseDouble("1"));
        }).reduceByKey((a,b)->a+b).mapToPair(t -> {           
            return new Tuple2<>(t._2(),t._1());
        }).sortByKey(false).mapToPair(t -> {
            return new Tuple2<>(t._2(),t._1());

There are 1 answers

Amardeep Flora On

I'd suggest repartition and increase partition size.