What is causing malformed level error on parquet file generated via stream analytics?

105 views Asked by At

I have a service bus topic subscription and a logic app that sends the data to event hub, this data is read by stream analytics and a parquet file is generated and stored in data lake. I am then trying to read these parquet files in synapse with an SQL Query, however I get this error:

Error handling external file: 'Malformed levels. min: 0 max: 2 out of range. Max Level: 1'.

I have narrowed it down to being a problem with some array and record type columns, does anyone have any idea what could be causing the problem?

The issue seems to go away if I select the items out of the array and record fields, however I'd rather keep them as they are and not flatten them.

1

There are 1 answers

0
DileeprajnarayanThumula On

As you mentioned being a problem with some array and record type column.

It is possible that the Parquet file you are trying to read has a schema that is not compatible with Synapse. also another possible cause of the error is that the Parquet file is corrupted.

You can read the content of a Parquet file using the OPENROWSET function.

I have tried to read the parquet file using the OPENROWSET function

    TOP 100 *
FROM
    OPENROWSET(
        BULK 'https://dileepstggen2.dfs.core.windows.net/folder02/sample_nested_pyspark.parquet/*.snappy.parquet',
        FORMAT = 'PARQUET'
    ) AS [result]

enter image description here

You can also specify the columns of interest when you query Parquet files. If you have an array of scalar values in some columns, you can expand them and join them with the main row by using the OPENJSON function.

Know more about Query Parquet files using serverless SQL pool in Azure Synapse Analytics