AWS Redshift - Copy data from s3 with wildcard

17k views Asked by At

I am trying to copy data from S3 bucket into Redshift using copy command. When I pass the exact file name in copy command it works perfectly fine but when I use wildcard (*) in my from clause it throws an error - "The specified S3 prefix 'suiteX*' does not exist;"

All my files follow this naming convention - SuiteX_Date_Hour.csv for example - SuiteX-20150101-1.csv

Here is my copy command that works :

copy mytable from 's3://mybucket/suiteX_20150101_1.csv'
CREDENTIALS 'aws_access_key_id=XXXXX;aws_secret_access_key=XXXX'
delimiter ',' REGION AS 'us-east-1';

Here is the copy command that does not work:

copy mytable from 's3://mybucket/suiteX*'
CREDENTIALS 'aws_access_key_id=XXXXX;aws_secret_access_key=XXXX' 
delimiter ',' REGION AS 'us-east-1';

Any idea how can make copy command to work with wildcard when I copy data from S3 ?

2

There are 2 answers

1
Masashi M On

According to the Redshift docs, I don't think COPY command supports wildcard for s3 file source path. Instead, you can specify a bucket directory as follows.

copy data
from 's3://mybucket/data/'
credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'; 

In this case, the files under data/ directory will be copied to Redshift.

0
d5ve On

The redshift COPY command doesn't have an explicit wildcard syntax. The object path you provide is treated like a prefix, and any matching objects will be COPY-ed. If the object path matches multiple folders, all objects in all those folders will be COPY-ed.

You should be able to get it to work for your example with:

copy mytable
from s3://mybucket/suiteX' 
CREDENTIALS 'aws_access_key_id=XXXXX;aws_secret_access_key=XXXX' 
delimiter ',' 
REGION AS 'us-east-1';

The relevent section from the COPY from Amazon S3 docs says:

FROM 's3://copy_from_s3_objectpath'

Specifies the path to the Amazon S3 objects that contain the data—for example, 's3://mybucket/cust.txt'. The s3://copy_from_s3_objectpath parameter can reference a single file or a set of objects or folders that have the same key prefix. For example, the name custdata.txt is a key prefix that refers to a number of physical files: custdata.txt.1, custdata.txt.2, and so on. The key prefix can also reference a number of folders. For example, 's3://mybucket/custfolder' refers to the folders custfolder_1, custfolder_2, and so on. If a key prefix references multiple folders, all of the files in the folders will be loaded.