Task: Trying to load a bunch of JSON files from s3 buckets to Redshift using Redshift Spectrum. Problem: JSON objects in few files are having data wrapped with Square brackets but other JSON files are having the same objects without square brackets. Is there a way to consume data both with/ without square brackets "[ ]" while creating an external table using the Redshift spectrum table?
JSON FILES to be consumed:
File 1: "x":{"y":{ "z":["ABCD"]}}
File 2: "x":{"y":{ "z":"EFGH"}}
Case 1
When column z is defined as an array, I am missing out the data from JSON file which are "without square brackets"
CREATE EXTERNAL TABLE spectrum.table
(x struct<y:struct<z:array<varchar(256)>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'dots.in.keys'='true') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 's3://****'
Query: Select c from spectrum.table t , t.x.y.z c;
Case 2
When column z is defined as varchar (without declaring as an array), below is the error:
Create Statement:
CREATE EXTERNAL TABLE spectrum.table
(x struct<y:struct<z:varchar(256)>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'dots.in.keys'='true') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 's3://****'
Query: Select regexp_replace( t.x.y.z ,'\\([\\"])', '' ) from spectrum.table t;
or Select t.x.y.z from spectrum.table t
;
[XX000][500310] [Amazon](500310) Invalid operation: Spectrum Scan Error
Details:
-----------------------------------------------
error: Spectrum Scan Error
code: 15001
context: Unsupported implicit cast: Column ('x' 'y' 'z'), From Type: LIST, To Type: VARCHAR,
----------------------------------------------