I followed pyarrow data types for columns that have lists of dictionaries? to create an Arrow table which includes a column of MapType.
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
print(f'PyArrow Version = {pa.__version__}')
print(f'Pandas Version = {pd.__version__}')
df = pd.DataFrame({
'col1': pd.Series([
[('id', 'something'), ('value2', 'else')],
[('id', 'something2'), ('value','else2')],
]),
'col2': pd.Series(['foo', 'bar'])
}
)
udt = pa.map_(pa.string(), pa.string())
schema = pa.schema([pa.field('col1', udt), pa.field('col2', pa.string())])
table = pa.Table.from_pandas(df, schema)
pq.write_table(table, './test_map.parquet')
The above code runs smoothly on my developing computer:
PyArrow Version = 1.0.1
Pandas Version = 1.1.2
And generated the test_map.parquet file successfully.
Then I use parquet-tools (1.11.1) to read the file, but get the following output:
col1:
.key_value:
.key_value:
col2 = foo
col1:
.key_value:
.key_value:
col2 = bar
The keys and values are missing... Could you help me on this?
I've tried to replicate but I get this error:
As mentioned in the error list of structs as well as maps are not well supported when it come to reading from parquet.
I'd recommend using a simpler schema for your data like this one:
which outputs: