I'm using AWS Athena to query S3 bucket, that have partitioned data by day only, the partitions looks like day=yyyy/mm/dd. When I tried to us Glue to run update the partitions every day, It creates new table for each day (sync 2017, around 1500 tables).
I tried to use Partition projection with like this:
PARTITIONED BY (
day string)
TBLPROPERTIES (
'has_encrypted_data'='false',
'projection.day.format'='yyyy/mm/dd',
'projection.day.interval'='1',
'projection.day.interval.unit'='DAYS',
'projection.day.range'='2017/01/01,NOW',
'projection.day.type'='date',
'projection.enables'='true'
But the partition not updated without MSCK Repair. Any ideas? Do I miss something with the partition projection?
You don't need to use Glue or
MSCK REPAIR TABLE
if you are loading partitions using Partition Projection. Just run theCREATE TABLE
script once from the query editor and that should be it. If you are loading partitions using Partition Projection, you won't be able to see the partitions in the Glue Data Catalog.or maybe the below script can help you
You can run a glue job with a similar script to create the partitions daily. Just change the
ALTER TABLE
part accordingly and it should be good to go.