Task: Trying to load a bunch of JSON files from s3 buckets to Redshift using Redshift Spectrum. Problem: JSON objects in few files are having data wrapped with Square brackets but other JSON files are having the same objects without square brackets. Is there a way to consume data both with/ without square brackets "[ ]" while creating an external table using the Redshift spectrum table?

JSON FILES to be consumed:

File 1: "x":{"y":{ "z":["ABCD"]}}

File 2: "x":{"y":{ "z":"EFGH"}}

Case 1

When column z is defined as an array, I am missing out the data from JSON file which are "without square brackets"

CREATE EXTERNAL TABLE spectrum.table
(x struct<y:struct<z:array<varchar(256)>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'dots.in.keys'='true') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 's3://****'

Query: Select c from spectrum.table t , t.x.y.z c;

Case 2

When column z is defined as varchar (without declaring as an array), below is the error:

Create Statement:

CREATE EXTERNAL TABLE spectrum.table
(x struct<y:struct<z:varchar(256)>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'dots.in.keys'='true') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 's3://****'

Query: Select regexp_replace( t.x.y.z ,'\\([\\"])', '' ) from spectrum.table t; or Select t.x.y.z from spectrum.table t;

 [XX000][500310] [Amazon](500310) Invalid operation: Spectrum Scan Error
 Details:
-----------------------------------------------
 error:  Spectrum Scan Error
 code:      15001
 context:   Unsupported implicit cast: Column ('x' 'y' 'z'), From Type: LIST, To Type: VARCHAR, 

 ----------------------------------------------
0

There are 0 answers