I have a spark sql 2.1.1 job on a yarn cluster in cluster mode where I want to create an empty external hive table (partitions with location will be added in a later step).
CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
When I run the job I get the error:
CREATE EXTERNAL TABLE must be accompanied by LOCATION
But when I run the same query on Hive Editor on Hue it runs just fine. I was trying to find an answer in the SparkSQL 2.1.1 documentation but came up empty.
Does anyone know why Spark SQL is more strict on queries?
TL;DR
EXTERNAL
with noLOCATION
is not allowed.The definitive answer is in Spark SQL's grammar definition file SqlBase.g4.
You can find the definition of
CREATE EXTERNAL TABLE
as createTableHeader:This definition is used in the supported SQL statements.
Unless I'm mistaken
locationSpec
is optional. That's according to the ANTLR grammar. The code may decide otherwise and it seems it does.The default
SparkSqlParser
(withastBuilder
asSparkSqlAstBuilder
) has the following assertion that leads to the exception:I'd consider reporting an issue in Spark's JIRA if you think that the case should be allowed. See SPARK-2825 to have a strong argument to have the support: