hive standalone metastore reading avro data with schema not working

551 views Asked by At

we have usecase of presto hive accessing s3 file present in avro format. When we try to use standalone hive-metastore and read this avro data using external table ,we are getting issue SerDeStorageSchemaReader class not found issue

    MetaException(message:org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader class not found)
    at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:54)

We understand this error is coming because SerDeStorageSchemaReader class is not available in standalone-metastore.

i want to understand can be run hive-metastore without using hive/hadoop or there is any other option too?

2

There are 2 answers

0
Vish On BEST ANSWER

standalone hive doesnt support avro. we need to install full hadoop plus hive version and start only hive metastore to fix it

0
John Fitzgerald On

I managed to tweak Hive Standalone to work with Avro files and S3 by doing the following:

  1. In the metastore-site.xml file I added the following:

     <property>
     <name>metastore.storage.schema.reader.impl</name>
     <value>org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader</value> </property>
    
  2. I added the following jars to ${HIVE_HOME}/lib/

  • hive-metastore-${METASTORE_VERSION}.jar (full hive version)
  • hive-common-${METASTORE_VERSION}.jar
  • hive-serde-${METASTORE_VERSION}.jar
  1. I created the table like this:

    CREATE TABLE IF NOT EXISTS table_xyz (col1 INT, col2 INT) WITH (format = 'AVRO', partitioned_by = ARRAY['col1', col2], external_location = 's3a://my_bucket/path/blah', avro_schema_url = 's3a://mybucket/avro_file_schema.avsc');