how to get distinct count on 13 billion records in ABINITIO

Question

how to get distinct count on 13 billion records in ABINITIO

105 views Asked by charmander At 23 January 2024 at 13:00

I have 13 billion records as mfs file in abinito. I need to count distinct imsis that are grouped by date,city,district. I tried the two things coming to my mind but the operation is soo slow. How to count distinct values faster ?

1) length_of(vector_sort_dedup_first(accumulation( in.imsi_4g ))) in rollup having keys {date; city; district}

2) PBK {date; city; district; imsi_4g} , dedup sorted having keys {date_id; city_name; district_name; imsi_max_4g}

Original Q&A

There are 1 answers

**X3R0** · Answer 1 · 2024-01-23T13:06:13+00:00

X3R0 On 23 January 2024 at 13:06

Do the processing in parallel

(each thread would process five hundred million records)

let distinct_count = length_of(in.imsi_max_4g) in rollup keys {date, city, district} parallel 26;

TechQA.

how to get distinct count on 13 billion records in ABINITIO

There are 1 answers

Related Questions in COUNT

Related Questions in DUPLICATES

Related Questions in DISTINCT

Related Questions in AB-INITIO

Popular Questions

Trending Questions