How does one convert an HDF5 file into a Parquet file?

Question

How does one convert an HDF5 file into a Parquet file?

1.9k views Asked by ShanZhengYang At 06 January 2017 at 01:28

I have stored approximately 800 GB of a huge dataframe into HDF5 via pandas with pandas.HDFStore().

import pandas as pd
store = pd.HDFStore('store.h5')
df = pd.Dataframe() # imagine the data being munged into a dataframe
store['df'] = df

I would like to query this with Impala. Is there a straightforward way to parse this data into Parquet? Or does Impala allow you to work with HDF5 directly? Is there another option for data on HDF5?

Original Q&A

There are 1 answers

**John Readey** · Answer 1 · 2017-01-06T02:06:27+00:00

John Readey On 06 January 2017 at 02:06

I haven't tried this myself, but here's a link showing how to convert a HDFStore to Parquet using Spark: https://gist.github.com/jiffyclub/905bf5e8bf17ec59ab8f.

TechQA.

How does one convert an HDF5 file into a Parquet file?

There are 1 answers

Related Questions in HADOOP

Related Questions in HDF5

Related Questions in IMPALA

Related Questions in PARQUET

Related Questions in HDFSTORE

Popular Questions

Popular Tags

Trending Questions