I'd like to use Presto to query Iceberg tables stored in S3 as parquet files, therefore I need to use Hive metastore. I'm running a standalone hive metastore service backed by MySql. I've configured Iceberg to use Hive catalog:
import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.catalog.Namespace;
import org.apache.iceberg.hive.HiveCatalog;
public class MetastoreTest {
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("hive.metastore.uris", "thrift://x.x.x.x:9083");
conf.set("hive.metastore.warehouse.dir", "s3://bucket/warehouse");
HiveCatalog catalog = new HiveCatalog(conf);
catalog.createNamespace(Namespace.of("my_metastore"));
}
}
I'm getting the following error: Caused by: MetaException(message:Got exception: org.apache.hadoop.fs.UnsupportedFileSystemException No FileSystem for scheme "s3")
I've included /hadoop-3.3.0/share/hadoop/tools/lib
in HADOOP_CLASSPATH
, also copied aws related jars to apache-hive-metastore-3.0.0-bin/lib
. What else is missing?
Finally figured this out. First (as I already mentioned before) I had to include
hadoop/share/hadoop/tools/lib
inHADOOP_CLASSPATH
. However neither modifyingHADOOP_CLASSPATH
nor copying particular files from tools to common worked for me. Then I switched to hadoop-2.7.7 and it worked. Also, I had to copy jackson related jars from tools to common. Myhadoop/etc/hadoop/core-site.xml
looks like this:at this point, you should be able to ls your s3 bucket:
hadoop fs -ls s3a://{bucket}/