First of all, I load data from a local file to an unpartitioned table using hive
Creating raw data file:
stephen@stephen-VirtualBox:~/Workspace$ cat> static_source_demo.txt
11,test,2300,admin,c1
12,test2,2220,IT,c2
21,test3,2342,admin,c1
34,test5,2422,admin,c2
35,test6,2411,admin1,c1
Creating Unpartitioned Table
hive> CREATE TABLE employee_source_demo ( eid int, name string,
> salary string, destination string,city string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',';
OK
Time taken: 0.153 seconds
hive>
I then load the data from the file to the source table
Loading Data to Source Table:
hive> load data local inpath '/home/stephen/Workspace/static_source_demo.txt' into table employee_source_demo;
Loading data to table zipcodes.employee_source_demo
OK
Time taken: 0.773 seconds
I Confirmed if the data is in the table
hive> SELECT * FROM employee_source_demo;
OK
11 test 2300 admin c1
12 test2 2220 IT c2
21 test3 2342 admin c1
34 test5 2422 admin c2
35 test6 2411 admin1 c1
Time taken: 0.228 seconds, Fetched: 5 row(s)
hive>
Now, I create the partitioned table in the same database
hive> CREATE TABLE employee_part1 ( eid int, name String,
> salary String, destination String) PARTITIONED by (city string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',';
OK
Time taken: 0.151 seconds
hive>
After that, I tried to insert data into the new table, taking partition into account.
hive> INSERT INTO TABLE employee_part1 PARTITION (city='c1') SELECT eid, name, salary,
destination FROM employee_source_demo WHERE city='c1';
I think every thing went well. The following are the messages I got during the building/executing process
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = stephen_20220114230858_14789266-c13d-4e53-b411-474ac5bcbde7
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1642123763239_0009, Tracking URL = http://stephen-VirtualBox:8088/proxy/application_1642123763239_0009/
Kill Command = /home/stephen/opt/hadoop-2.7.3/bin/hadoop job -kill job_1642123763239_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-01-14 23:09:11,139 Stage-1 map = 0%, reduce = 0%
2022-01-14 23:09:22,019 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.23 sec
MapReduce Total cumulative CPU time: 2 seconds 230 msec
Ended Job = job_1642123763239_0009
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/zipcodes.db/employee_part1/city=c1/.hive-staging_hive_2022-01-14_23-08-58_343_2569185367107439586-1/-ext-10000
Loading data to table zipcodes.employee_part1 partition (city=c1)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.23 sec HDFS Read: 5027 HDFS Write: 57 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 230 msec
OK
Time taken: 25.388 seconds
I don't think there was error. So, I checked the partitioned table as shown below. Nothing in there.
hive> SELECT * FROM employee_part1;
OK
Time taken: 0.34 seconds
hive>
I also checked the hive warehouse. It's like the file is there but no data
hive> !hadoop fs -ls /user/hive/warehouse/zipcodes.db/employee_part1/city=c1;
Found 1 items
-rwxrwxr-x 1 stephen supergroup 0 2022-01-14 23:09 /user/hive/warehouse/zipcodes.db/employee_part1/city=c1/000000_0
hive> !hadoop fs -cat /user/hive/warehouse/zipcodes.db/employee_part1/city=c1/000000_0;
hive>
I would appreciate any solution to this. I don't know what I have done wrong.
Try below statement
'MSCK repair table employee_part1 '