Hive external table not reading entirety of string from CSV source

1.3k views Asked by At

Relatively new to the Hadoop world so apologies if this is a no-brainer but I haven't found anything on this on SO or elsewhere.

In short, I have an external table created in Hive that reads data from a folder of CSV files in HDFS. The issue is that while the first column has the string datatype it isn't reading the entire value for a values that contain a hyphen.

For example:

Hive table definition:

CREATE EXTERNAL TABLE BG (
  `Id` string,
  `Someint` int,
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/Path/To/Folder'
TBLPROPERTIES ("skip.header.line.count"="1");

Example Data:

506253-A,10
506254-C,19
506233,8

When I query the data in Hive or browse in HCat this is what I see:

Id Someint
506253 10
506254 19
506233 8

Any ideas? Thanks in advance!

2

There are 2 answers

1
maxymoo On BEST ANSWER

What version of Hive are you using? On Amazon EMR, Hive version 0.13.1 I run your code and get the following

hive> CREATE EXTERNAL TABLE BG (
    >   `Id` string,
    >   `Someint` int
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    > LOCATION '/tmp/example'
    > TBLPROPERTIES ("skip.header.line.count"="1");
OK
Time taken: 0.598 seconds
hive> select * from BG;
OK
506253-A        10
506254-C        19
506233  8
Time taken: 0.714 seconds, Fetched: 3 row(s)
0
Vijay Kumar On

I have discussed on Auto Generation of Hive Scripts from CSV Files, it generates "CREATE Statements with all String data types, but you may update Hive script and change to proper types as needed" with examples at following ST post,

You may refer to Stackoverflow thread at "Hadoop/Hive : Loading data from .csv on a local machine" for more details.