I was trying to partition the parquet on S3 and it worked with AWS Wrangler.
basename_template = 'part.'
partitioning = ['cust_id', 'file_name', 'added_year', 'added_month', 'added_date']
loop = asyncio.get_event_loop()
s3_path = "s3://customer-data-lake/main/parquet_data"
await loop.run_in_executor(None, lambda: wr.s3.to_parquet(
df=batch.to_pandas() ,
path=s3_path,
dataset=True,
max_rows_by_file=MAX_ROWS_PER_FILE,
use_threads=True,
partition_cols = partitioning,
mode='append',
boto3_session=s3_session,
filename_prefix=basename_template
))
Then I tried to convert it to lakeFs, I changed the endpoint to LakeFS
wr.config.s3_endpoint_url = lakefsEndPoint
Then suddenly partitioning was not working anymore. It just appends to the same partition.
This image is the original S3 one
Then this is after I changed to lakeFs
It just appends to the csv_1. What am I doing wrong here?