How to delete customer information from hdfs

38 views Asked by At

Suppose, I have several customers today so I am storing their information like customer_id, customer_name, customer_emailid etc. If my customer is leaving and he wants that his personal information should be removed from my hdfs.

So I have below two approaches to achieve the same.

Approach 1:

1.Create Internal Table on top of HDFS

2.Create external table from first table using filter logic

3.While Creating 2nd Table apply udfs on specific columns for more column filtering

Approach 2:

Spark=> Read, filter, write

Is there any other solution?

1

There are 1 answers

0
leftjoin On

Approach 2 is possible in Hive - select, filter, write

Create a table on top of directory in hdfs (external or managed, does not matter in this context, better external if you are going to drop table later and keep the data as is). Insert overwrite table or partition from select with filter.

insert overwrite mytable 
select *                       
 from mytable --the same table
where customer_id not in (...) --filter rows