How to ignore some columns when copy Parquet file into AWS Redshift?

259 views Asked by At

I want to copy some parquet files into AWS Redshift, but the Redshift table schema has fewer columns compared to the parquet files, because those columns contain sensitive information. Therefore, I want to skip them during the copy process. How should I proceed?

1

There are 1 answers

1
MP24 On BEST ANSWER

The COPY command does not allow to skip columns, as described in the documentation:

COPY inserts values into the target table's columns in the same order as the columns occur in the columnar data files. The number of columns in the target table and the number of columns in the data file must match.

If you can use the Glue Data Catalog, you can create an external schema, where your parquet files would be an external table. You can then SELECT from this external table and use only the rows you are interested in.