Remote Flink job execution with query to Hive on Flink cluster

326 views Asked by Ruslan At 26 November 2020 at 08:39

I use Flink 1.11.2, Hive 2.1.1, Java 8. Attempt to execute remotely query to Hive, packaged it in jar and run it by Flink's RestClient:

private static String jar = "/path/Job.jar";
Configuration config = RemoteConfiguration.getConfiguration(host, port);
PackagedProgram packagedProgram = PackagedProgram.newBuilder()
                                                     .setJarFile(new File(jar))
                                                     .setArguments(arguments)
                                                     .build();
    RestClusterClient<StandaloneClusterId> client =
        new RestClusterClient<StandaloneClusterId>(config, StandaloneClusterId.getInstance());
    JobGraph jobGraph = PackagedProgramUtils.createJobGraph(packagedProgram, config, 1, false);
    client.submitJob(jobGraph).get();

where job is:

StreamExecutionEnvironment streamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();
    DataStreamSource<String> source = streamExecutionEnvironment.fromElements(
        tableName
    );
    source
        .map(new MapFunction<String, String>() {
          String hiveConfDir = "hive-conf";
          String hiveCatalogName = "myhive";
          String databaseName = "default";
          String location = "'hdfs:///tmp/location'";

          @Override
          public String map(String tableName) {
            HiveCatalog hive = new HiveCatalog(hiveCatalogName, databaseName, hiveConfDir, "2.1.1");
            EnvironmentSettings batchSettings = EnvironmentSettings.newInstance().useBlinkPlanner().build();
            TableEnvironment tableEnv = TableEnvironment.create(batchSettings);
            tableEnv.registerCatalog(hiveCatalogName, hive);
            tableEnv.useCatalog(hiveCatalogName);
            tableEnv.getConfig().setSqlDialect(SqlDialect.HIVE);

            hive.getHiveConf().set("hive.vectorized.execution.enabled", "false");
            hive.getHiveConf().set("hive.vectorized.execution.reduce.enabled", "false");
            hive.getHiveConf().set("hive.vectorized.execution.reduce.groupby.enabled", "false");
            tableEnv.executeSql("CREATE TABLE " + tableName + "(\n"
                                    + "  test INT,\n"
                                    + "  age INT\n"
                                    + ") STORED AS ORC LOCATION " + location + " TBLPROPERTIES ('orc'\n"
                                    + "'.compress'='NONE')");
            
            return tableName;
          }
        })
        .print();
    streamExecutionEnvironment.execute();

In flink-conf.yaml only one additional parameter:

env.java.home: /path/to/JAVA_HOME

And when I run it, these errors occur every other time:

java.lang.OutOfMemoryError: Java heap space

or:

MetaException(message:Got exception: java.lang.ClassCastException class [Ljava.lang.Object; cannot be cast to class [Ljava.net.URI; ([Ljava.lang.Object; and [Ljava.net.URI; are in module java.base of loader 'bootstrap'))

Can you explain it?

Original Q&A

TechQA.

Remote Flink job execution with query to Hive on Flink cluster

There are 0 answers

Related Questions in JAVA

Related Questions in HIVE

Related Questions in APACHE-FLINK

Related Questions in FLINK-TABLE-API

Popular Questions

Popular Tags

Trending Questions