Write Parquet MAP datatype by PyArrow

Question

Write Parquet MAP datatype by PyArrow

1.4k views Asked by Yucan At 06 October 2020 at 01:01

I'm writing in Python and would like to use PyArrow to generate Parquet files.

Per my understanding and the Implementation Status, the C++ (Python) library already implemented the MAP type. From the Data Types, I can also find the type map_(key_type, item_type[, keys_sorted]).

So, I tested with several different approaches in Python/PyArrow. But all of them failed.

E.g.:

df = pd.DataFrame({
        'col1': pd.Series([
            [('key', 'aaaa'), ('value', '1111')],
            [('key', 'bbbb'), ('value', '2222')],
        ]),
        'col2': pd.Series(['foo', 'bar'])
    }
)

udt = pa.map_(pa.string(), pa.string())
schema = pa.schema([pa.field('col1', udt), pa.field('col2', pa.string())])

table = pa.Table.from_pandas(df, schema)
pq.write_table(table, FILE_NAME)

When I read the file with parquet-tools cat rand_gen_test_map.parquet, I got:

col1:
.key_value:
.key_value:
col2 = foo

col1:
.key_value:
.key_value:
col2 = bar

It seems to me that the Map values are not outputted correctly (or missed). Though the schema is correct:

message schema {
  optional group col1 (MAP) {
    repeated group key_value {
      required binary key (UTF8);
      optional binary value (UTF8);
    }
  }
  optional binary col2 (UTF8);
}

All in all, I have two questions (all in Python):

what is the best way to generate Parquet files with MAP datatype (if will be great if an example can be attached)
I understand that we can use a STRUCT to mimic a map structure. But since Parquet provided the MAP type, we still want to use it. If the MAP data type can't be generated, what is the reason behind providing a MAP type?

Original Q&A

There are 1 answers

**Micah Kornfield** · Answer 1 · 2020-10-21T17:59:49+00:00

Micah Kornfield On 21 October 2020 at 17:59

There was a bug in writing map types. This should be fixed in pyarrow 2.0 (also reading is now supported natively)

TechQA.

Write Parquet MAP datatype by PyArrow

There are 1 answers

Related Questions in PYTHON

Related Questions in PYARROW

Related Questions in APACHE-ARROW

Popular Questions

Popular Tags

Trending Questions