Azure error writing parquet to ADLS Gen 2

48 views Asked by At

When i use this function to write a parquet to the Datalake executing my pipeline from local it does write the parquet without any problems. However, once the pipeline is deployed on AKS, i get this error: ValueError: I/O operation on closed file.

    def write_parquet_to_blob(self, file_path: str, df: pd.DataFrame) -> None:
        """
        Converts the dataframe to a Parquet byte stream and writes it to the specified file path in Azure     Data Lake.
        :param df: The dataframe to write.
        :param file_path: The Azure file path where the dataframe should be written as a Parquet file.
        """
        try:
            file_write_client = self._create_file_client(file_path)
            data = df.to_parquet()
            file_write_client.upload_data(data, len(data), overwrite=True)
        except Exception as exc:
            logger.error(f"Error writing Parquet data to {file_path}: {exc}")
            raise


I tried different versions of the function using bytesIo as a buffer and doing a buffer.seek([0]) but still there is no way to make it work once deployed. This error does not appear when writing txt files or csvs.

0

There are 0 answers