Algorithm to combine two arrays for the smallest CQL filter

52 views Asked by At

I have an array of tables containing an array of privilege_ids. I create CQL filters with the ids like (((table_id in (1,2,3)) and (privilege_id in ('abc','def','ghi'))) and ((table_id in (4,5)) and (privilege_id in ('def','ghi','jkl')))).

So I need an algorithm that finds the best grouping of table_id and privilege_id to get the shortest CQL out of it.

Input Tables

[
  { id: 1, privilege_ids: ['abc','def','ghi', 'mno', ..., 'yza'] },
  { id: 2, privilege_ids: ['abc','def','ghi', 'mno', ..., 'yza'] },
  { id: 3, privilege_ids: ['abc','def','ghi', 'mno', ..., 'yza'] },
  { id: 4, privilege_ids: ['def','ghi','jkl', 'mno', ..., 'yza'] },
  { id: 5, privilege_ids: ['def','ghi','jkl', 'mno', ..., 'yza'] }
]

Output

[
   { table_ids: [1, 2, 3], privilege_ids: ['abc'] },
   { table_ids: [4, 5], privilege_ids: ['jkl'] },
   { table_ids: [1, 2, 3, 4, 5], privilege_ids: ['def','ghi', 'mno', ..., 'yza'] }
]

The output is used to create the CQL. In this example you can also group 1, 2, 3 and 4, 5. However the overlapping privilege_ids would create more text than the additional grouping of table_ids 1,2,3,4,5. So this is the algorithm I need.

There are around 1 - 10 tables per query with each 1 - 500 privilege_ids. Most of the privilege_ids overlap. The goal is to find the biggest overlapping privilege_id combination with their table_id to create the smallest CQL filter in extreme cases (500 privilege_ids) without making it to big for smaller usecases (10 privilege_ids).

0

There are 0 answers