I have 13 billion records as mfs file in abinito. I need to count distinct imsis that are grouped by date,city,district. I tried the two things coming to my mind but the operation is soo slow. How to count distinct values faster ?
1)
length_of(vector_sort_dedup_first(accumulation( in.imsi_4g ))) in rollup having keys {date; city; district}
2)
PBK {date; city; district; imsi_4g} , dedup sorted having keys {date_id; city_name; district_name; imsi_max_4g}
Do the processing in parallel
(each thread would process five hundred million records)