Copy S3 ambiguous folder structure to simple s3 folder

47 views Asked by At

I have S3 folder structure like this bucket/market/date/business/hr/*.parquet where only bucket name is fixed and rest all are variable.

I want to merge and copy this data to single folder for each market in date on daily basis. Eg -

Before bucket structure

-bucket
-----usa
----------2020-10-11
-----------------------07
--------------------------1.parquet
--------------------------2.parquet
-----------------------09
--------------------------1.parquet
----------2020-10-12
-----------------------12
--------------------------1.parquet
--------------------------2.parquet
-----------------------22
--------------------------1.parquet
--------------------------2.parquet
--------------------------3.parquet
-----mx
----------2020-10-11
-----------------------17
--------------------------1.parquet
--------------------------2.parquet
-----------------------19
--------------------------1.parquet

After processing bucket structure i am looking for with merging all data bucket/market/date/*.parquet

-bucket
-----usa
----------2020-10-11
------------------------1.parquet
----------2020-10-12
------------------------1.parquet
-----mx
----------2020-10-11
------------------------1.parquet

What would be the best approach, should i schedule glue job. How can i merge these variables to understand variables here is example, business can be b1,b2,b3 and hr where parquet data reside 1,2,9 for date-x . business can be b2,b3 and hr 6,7,10 for date-y Looking for suggestions.

I have tried reading this data outside AWS and merging and publishing back to S3 but its costly, so looking for alternate option.

0

There are 0 answers