How to handle comma separated decimal values in Hive?

6.8k views Asked by At

I have one CSV file and metadata for the same. Columns in this CSV is are delimited by pipe | symbol. Sample data is as follows:

name|address|age|salary|doj
xyz | abcdef|29 |567,34|12/02/2001

Here salary column is of type decimal but instead of using period . as decimal separator, comma , is used.

I created Hive external table as below and for this data Hive shows NULL for salary column.

create external table employee as(
      name string,
      address string,
      age int,
      salary decimal(7,3),
      doj string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LOCATION 's3://bucket/folder_having_many_csv_files/';

If I change data type of salary column to String then as expected, Hive works fine.

I would like to know how to tell Hive that this particular column is of type DECIMAL and decimal separator is comma (,) and not a period (.) symbol.

1

There are 1 answers

1
invoketheshell On BEST ANSWER

You could easily build table with salary as a string and replace the comma in a view on top. This is probably the easiest thing to do since the data is big and likely someone else owns it.

create view table employee_decimal as
 select name
  , address
  , age
  , cast(regexp_replace(salary, ',', '.') as decimal(7,3)) as salary
  , doj
 from employee;