How to convert parquet file to Avro file?

Question

How to convert parquet file to Avro file?

6.5k views Asked by PrinceChamp At 23 December 2016 at 01:41

I am new to hadoop and Big data Technologies. I like to convert a parquet file to avro file and read that data. I search in few forums and it suggested to use AvroParquetReader.

AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>(file);
GenericRecord nextRecord = reader.read();

But I am not sure how to include AvroParquetReader. I am not able to import it at all.

I can read this file using spark-shell and may be convert it to some JSON and then that JSON can be converted to avro. But I am looking for a simpler solution.

Original Q&A

There are 1 answers

**Denny Lee** · Answer 1 · 2016-12-23T07:26:32+00:00

If you are able to use Spark DataFrames, you will be able to read the parquet files natively in Apache Spark, e.g. (in Python pseudo-code):

df = spark.read.parquet(...)

To save the files, you can use the spark-avro Spark Package. To write the DataFrame out as an avro, it would be something like:

df.write.format("com.databricks.spark.avro").save("...")

Don't forget that you will need to include the right version of the spark-avro Spark Package with your version of your Spark cluster (e.g. 3.1.0-s2.11 corresponds to spark-avro package 3.1 using Scala 2.11 which matches the default Spark 2.0 cluster). For more information on how to use the package, please refer to https://spark-packages.org/package/databricks/spark-avro.

Some handy references include:

Spark SQL Programming Guide
spark-avro Spark Package.

TechQA.

How to convert parquet file to Avro file?

There are 1 answers

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in PARQUET

Related Questions in SPARK-AVRO

Popular Questions

Popular Tags

Trending Questions