Schema on read in hive for tsv format file

Question

Schema on read in hive for tsv format file

1.3k views Asked by Priyanka Shekhawat At 02 August 2018 at 20:06

I am new on hadoop. I have data in tsv format with 50 columns and I need to store the data into hive. How can I create and load the data into table on the fly without manually creating table using create table statementa using schema on read?

Original Q&A

There are 2 answers

phaneendra kumar On 03 August 2018 at 08:04

you can use Hue :

http://gethue.com/hadoop-tutorial-create-hive-tables-with-headers-and/

or with Spark you can infer the schema of csv file and you can save it as a hive table.

val df=spark.read
  .option("delimiter", "\t")
  .option("header",true)
  .option("inferSchema", "true") // <-- HERE
  .csv("/home/cloudera/Book1.csv")

**OneCricketeer** · Accepted Answer · 2018-08-04T17:32:09+00:00

Hive requires you to run a CREATE TABLE statement because the Hive metastore must be updated with the description of what data location you're going to be querying later on.

Schema-on-read doesn't mean that you can query every possible file without knowing metadata beforehand such as storage location and storage format.

SparkSQL or Apache Drill, on the other hand, will let you infer the schema from a file, but you must again define the column types for a TSV if you don't want everything to be a string column (or coerced to unexpected types). Both of these tools can interact with a Hive metastore for "decoupled" storage of schema information

TechQA.

Schema on read in hive for tsv format file

There are 2 answers

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in HIVE-TABLE

Popular Questions

Popular Tags

Trending Questions