I am trying to set the empty values in a csv file to zero in hive. But this code doesn't seem to work. What changes should I make?

2.1k views Asked by At

This is the input .csv file

"1","","Animation"

"2","Jumanji",""

"","Grumpier Old Men","Comedy"

Hive Code

CREATE TABLE IF NOT EXISTS movies(movie_id int, movie_name string,genre string)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

WITH SERDEPROPERTIES (

"separatorChar" = ",",

"quoteChar" = "\"",

"serialization.null.format" = '0'

);

Output

1 Animation

2 Jumanji

    Grumpier Old Men        Comedy
1

There are 1 answers

0
leftjoin On

Empty strings in csv are interpreted as empty strings, not NULLs. To represent NULL inside a delimited text file you should use "\N". Also Hive provides you a table property “serialization.null.format” which can be used to treat a character of your choice as null in Hive SQL. In your case it should be empty string "". To convert NULLs to zeroes use NVL(col, 0) or COALESCE(col, 0) function depending on your hive version (COALESCE should work for all).