I get the an error and cannot understand why it happens. We are using Glue4 with pyspark and apache iceberg 1.3.1. So below happens when incoming data make an inner join with historical data based on some id:s.
ERROR GlueExceptionAnalysisListener: [Glue Exception Analysis]
{
"Event": "GlueExceptionAnalysisTaskFailed",
"Timestamp": 1697631532873,
"Failure Reason": "Failed to open Parquet file: s3://year=2023/ts=20230101/somefile.parquet",
"Stack Trace": [
{
"Declaring Class": "org.apache.iceberg.parquet.ReadConf",
"Method Name": "newReader",
"File Name": "ReadConf.java",
"Line Number": 233
},
"Task Launch Time": 1697631526045,
"Stage ID": 24,
"Stage Attempt ID": 0,
"Task Type": "ShuffleMapTask",
"Executor ID": "1",
"Task ID": 1052
}
Cannot reproduce the issue locally since then it works.