Unable to perform select count(*) when using RegexSerde on Hive

355 views Asked by At

I am reading data from a flat file with fixed length, and I applied the following script:

CREATE EXTERNAL TABLE `test_table`.`test_data`
(test_column1 STRING,
test_column2 STRING
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ("input.regex" = "(.{10})(.{10})" )
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '<specified location>';

I ran the select * from `test_table`.`test_data` query, and it worked fine.

When I ran the select count(1) from `test_table`.`test_data` query, it gives the following error.

SQL Error [2] [08S01]: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1620900284233_0004_195_00, diagnostics=[Task failed, taskId=task_1620900284233_0004_195_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1620900284233_0004_195_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Error creating SerDe for LLAP IO

Can anyone advise what exactly does Error creating SerDe for LLAP IO imply, and how I can go about solving the issue, please?

1

There are 1 answers

0
Kong Yong On

Issue resolved. Below answer quoted from here

'RegexSerDe' is part of 'hive-contrib' library. From Hive 0.10 onwards the serde is part of 'hive-serde-<version>.jar'

Create the table with latest serde 'org.apache.hadoop.hive.serde2.RegexSerDe' instead of 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

You can also run alter table to modify the serde as below

ALTER TABLE <TABLENAME> SET SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe';