When i use this function to write a parquet to the Datalake executing my pipeline from local it does write the parquet without any problems. However, once the pipeline is deployed on AKS, i get this error: ValueError: I/O operation on closed file.
def write_parquet_to_blob(self, file_path: str, df: pd.DataFrame) -> None:
"""
Converts the dataframe to a Parquet byte stream and writes it to the specified file path in Azure Data Lake.
:param df: The dataframe to write.
:param file_path: The Azure file path where the dataframe should be written as a Parquet file.
"""
try:
file_write_client = self._create_file_client(file_path)
data = df.to_parquet()
file_write_client.upload_data(data, len(data), overwrite=True)
except Exception as exc:
logger.error(f"Error writing Parquet data to {file_path}: {exc}")
raise
I tried different versions of the function using bytesIo as a buffer and doing a buffer.seek([0]) but still there is no way to make it work once deployed. This error does not appear when writing txt files or csvs.