How to get the version of parquet from file header using parquet-tools?

403 views Asked by At

I am using version 1.11.2 of the following library. However, it doesn't seem to have a method to retrieve the version from file metadata header. What is the alternative?

<dependency>
            <groupId>org.apache.parquet</groupId>
            <artifactId>parquet-tools</artifactId>
</dependency>
1

There are 1 answers

1
Koedlt On

Using parquet-tools, you can do the following:

Find a parquet file of your choice, and use the inspect command of parquet-tools. The command + output should looks something like this:

$ parquet-tools.exe inspect part-00000-7323dfbe-aea5-4d96-a4cf-7a3397c1e888-c000.snappy.parquet

############ file meta data ############
created_by: parquet-mr version 1.10.1 (build a89df8f9932b6ef6633d06069e50c9b7970bebd1)
num_columns: 2
num_rows: 1
num_row_groups: 1
format_version: 1.0
serialized_size: 498


############ Columns ############
col1
col2

############ Column(col1) ############
name: col1
path: col1
max_definition_level: 0
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE
compression: SNAPPY (space_saved: -4%)

############ Column(col2) ############
name: col2
path: col2
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: SNAPPY (space_saved: -5%)

The created_by field tells you which parquet-mr version was used (in this example, parquet-mr version 1.10.1 (build a89df8f9932b6ef6633d06069e50c9b7970bebd1)