Concat S3 files to single file on partition using python

30 views Asked by At

I want to generate single parquet file from files which are in S3, unloaded from Redshift with partition, with following path

s3://bucket_name/rs_tables/name='part1'/key='abc'/date=''/part1_0000.parquet
s3://bucket_name/rs_tables/name='part1'/key='abc'/date=''/part1_0001.parquet
s3://bucket_name/rs_tables/name='part2'/key='qwe'/date=''/part2_0000.parquet
s3://bucket_name/rs_tables/name='part2'/key='qwe'/date=''/part3_0001.parquet

using bellow code to read file data to dataframe

import awswrangler as wr

df = wr.s3.read_parquet(path="s3://bucket_name/rs_tables/name='part1'/key='part2'/", dataset=True)

I want to combine files wrt to combination of name and key partition.
output:
s3://bucket_name/rs_tables/name='part1_abc'/part1_0000.parquet
s3://bucket_name/rs_tables/name='part2_qwe'/part1_0000.parquet

is it possible with python?

0

There are 0 answers