Redshift Spectrum query returns 0 row from S3 file

1k views Asked by At

I tried Redshift Spectrum. Both of query below ended success without any error message, but I can't get the right count of the uploaded file in S3, it's just returned 0 row count, even though that file has over 3 million records.

-- Create External Schema
CREATE EXTERNAL SCHEMA spectrum_schema FROM data catalog 
database 'spectrum_db' 
iam_role 'arn:aws:iam::XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
create external database if not exists;

-- Create External Table

create EXTERNAL TABLE spectrum_schema.principals(
tconst VARCHAR (20),
ordering BIGINT,
nconst VARCHAR (20),
category VARCHAR (500),
job VARCHAR (500),
characters VARCHAR(5000)
)
row format delimited
fields terminated by '\t'
stored as textfile
location 's3://xxxxx/xxxxx/'

I also tried the option, 'stored as parquet', the result was same. My iam role has "s3:","athena:", "glue:*" permissions, and Glue table created successfully.

And just in case, I confirmed the same S3 file could be copied into table in Redshift Cluster successfully. So, I concluded the file/data has no issue by itself.

If there is something wrong with my procedure or query. Any advice would be appreciated.

1

There are 1 answers

1
Prabhakar Reddy On BEST ANSWER

As your DDL is not scanning any data it looks like the issue seems to be with it not understanding actual data in s3. To figure this out you can simply generate table using AWS Glue crawler.

Once the table is created you can compare this table properties with another table created using DDL in Glue data catalog. That will give you the difference and what is missing in your table that is created using DDL manually.