Some of the articles did help on the stack, however, could not find to delete rows by counts in Hive.
There are 2 row_counts for Apple. How do I select only 1 row count for Apple?
--What data looks like...Total 14 records
customerID date product_type
1234abc 20140105 Orange
1234abc 20140105 Apple
1234abc 20140205 Orange
1234abc 20140205 Apple
1234abc 20140205 Apple
1234abc 20140305 Orange
1234abc 20140305 Apple
1234abc 20140305 Apple
1234abc 20140405 Orange
1234abc 20140405 Apple
1234abc 20140405 Apple
1234abc 20140505 Orange
1234abc 20140505 Apple
1234abc 20140505 Apple
--Final Output. Total 10 records
customerID date product_type
1234abc 20140105 Orange
1234abc 20140105 Apple
1234abc 20140205 Orange
1234abc 20140205 Apple
1234abc 20140305 Orange
1234abc 20140305 Apple
1234abc 20140405 Orange
1234abc 20140405 Apple
1234abc 20140505 Orange
1234abc 20140505 Apple
I'd suggest a 2 steps approach. step 1: create a temp table with the duplicate record list inserted, using insert and select like so:
Then delete the duplicates with
step 2: Insert the #temp table contents, which has the unique row into the original table.