I am learning to use Spark on a personal computer with hardware capable of running Hadoop. Here's the config:
Cloudera CDH 5.5.0 w/ Cloudera Quickstart, Spark 2.4.7, JDK1.8.0_181, Hadoop 2.6.0, Python 3.6.9.
When running a Python script (copied from a Udemy video on YouTube), I ran into and fixed several errors, but I could not find any solution for the following one:
java.io.IOException: Incomplete HDFS URI, no host: hdfs: /user/cloudera / Spark / ml - 100 k / u.data
Traceback (most recent call last):
File "/home/cloudera/Spark/LowestRatedMovieDataFrame.py", line 75, in < module >
movieDataset = spark.createDataFrame(movies)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 746, in createDataFrame
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 390, in _createFromRDD
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 361, in _inferSchema
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1378, in first
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1327, in take
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2517, in getNumPartitions
File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred
while calling o27.partitions.: java.io.IOException: Incomplete HDFS URI, no host: hdfs: /user/cloudera / Spark / ml - 100 k / u.data