Efficient mapping of arrays

Asked by At

I am trying to find out the best / most efficient or most functional way to compare / merge / manipulate two arrays (lists) simultaneously in JS.

The example I give below is a simple example of the overall concept. In my current project, I deal with some very crazy list mapping, filtering, etc. with very large lists of objects.

As delinated below, my first idea (version1) on comparing lists would be to run through the first list (i.e. map), and in the anonymous/callback function, filter the second list to meet the criteria needed for the compare (match ids for example). This obviously works, as per version1 below.

I had a question performance-wise, as by this method on every iteration/call of map, the entire 2nd list gets filtered just to find that one item that matches the filter.

Also, the filter passes every other item in list2 which should be matched in list1. Meaning (as that sentence probably did not make sense):

list1.map   list2.filter

id:1        [id:3,id:2,id:1]
                          ^-match
id:2        [id:3,id:2,id:1]
                     ^-match
id:3        [id:3,id:2,id:1]
                ^-match

Ideally on the first iteration of map (list1 id:1), when the filter encounters list2 id:3 (first item) it would just match it to list1 id:3

Thinking with the above concept (matching to a later id when it is encountered earlier, I came up with version2).

This makes list2 into a dictionary, and then looks up the value in any sequence by key.

const list1 = [
  {id: '1',init:'init1'},
  {id: '2',init:'init2'},
  {id: '3',init:'init3'}
];
const list2 = [
  {id: '2',data:'data2'},
  {id: '3',data:'data3'},
  {id: '4',data:'data4'}
];

/* ---------
* version 1
*/

const mergedV1 = list1.map(n => (
  {...n,...list2.filter(f => f.id===n.id)[0]}
));
/* [ 
  {"id": "1", "init": "init1"}, 
  {"id": "2", "init": "init2", "data": "data2"}, 
  {"id": "3", "init": "init3", "data": "data3"} 
] */

/* ---------
* version 2
*/

const dictList2 = list2.reduce((dict,item) => (dict[item.id]=item,dict),{}); 
// does not handle duplicate ids but I think that's 
// outside the context of this question.

const mergedV2 = list1.map(n => ({...n,...dictList2[n.id]}));
/* [ 
  {"id": "1", "init": "init1"}, 
  {"id": "2", "init": "init2", "data": "data2"}, 
  {"id": "3", "init": "init3", "data": "data3"} 
] */

JSON.stringify(mergedV1) === JSON.stringify(mergedV2);
// true

// and just for fun
const sqlLeftOuterJoinInJS = list1 => list2 => on => {
  const dict = list2.reduce((dict,item) => ( 
    dict[item[on]]=item,dict
  ),{});
  return list1.map(n => ({...n,...dict[n[on]]}
))};

Obviously the above examples are pretty simple (merging two lists, each list having a length of 3). There are more complex instances that I am working with.

I don't know if there are some smarter (and ideally functional) techniques out there that I should be using.

4 Answers

3
Nina Scholz On Best Solutions

You could take a closure over the wanted key for the group and a Map for collecting all objects.

function merge(key) {
    var map = new Map;
    return function (r, a) {
        a.forEach(o => {
            if (!map.has(o[key])) r.push(map.set(o[key], {}).get(o[key]));
            Object.assign(map.get(o[key]), o);
        });
        return r;
    };
}

const
    list1 = [{ id: '1', init: 'init1' }, { id: '2', init: 'init2' }, { id: '3', init: 'init3' }],
    list2 = [{ id: '2', data: 'data2' }, { id: '3', data: 'data3' }, { id: '4', data: 'data4' }],
    result = [list1, list2].reduce(merge('id'), []);

console.log(result);
.as-console-wrapper { max-height: 100% !important; top: 0; }

1
user633183 On

Using filter for search is a misstep. Your instinct in version 2 is much better. Map and Set provide much faster lookup times.

Here's a decomposed approach. It should be pretty fast, but maybe not as fast as Nina's. She is a speed demon >_<

const merge = (...lists) =>
  Array .from
    ( lists
        .reduce (merge1, new Map)
        .values ()
    )

const merge1 = (cache, list) =>
  list .reduce
    ( (cache, l) =>
        cache .has (l.id)
          ? update (cache, l.id, l)
          : insert (cache, l.id, l)
    , cache
    )

const insert = (cache, key, value) =>
  cache .set (key, value)

const update = (cache, key, value) =>
  cache .set
    ( key
    , { ...cache .get (key)
      , ...value
      }
    )

const list1 =
  [{ id: '1', init: 'init1' }, { id: '2', init: 'init2' }, { id: '3', init: 'init3' }]

const list2 =
  [{ id: '2', data: 'data2' }, { id: '3', data: 'data3' }, { id: '4', data: 'data4' }]

console .log (merge (list1, list2))

0
vol7ron On

I'm offering this for completeness as I think Nina and @user633183 have offered most likely more efficient solutions.

If you wish to stick to your initial filter example, which is a max lookup N*M, and your arrays are mutable; you could consider reducing the set as you traverse through. In the old days shrinking the array had a huge impact on performance.

The general pattern today is to use a Map (or dict) as indicated in other answers, as it is both easy to understand and generally efficient.

Find and Resize

const list1 = [
  {id: '1',init:'init1'},
  {id: '2',init:'init2'},
  {id: '3',init:'init3'}
];
const list2 = [
  {id: '2',data:'data2'},
  {id: '3',data:'data3'},
  {id: '4',data:'data4'}
];

// combine by ID
let merged = list1.reduce((acc, obj)=>{
  acc.push(obj);

  // find index by ID
  let foundIdx = list2.findIndex( el => el.id==obj.id );
  // if found, store and remove from search
  if ( foundIdx >= 0 ){
    obj.data = list2[foundIdx].data;
    list2.splice( foundIdx, 1 );        // shrink lookup array
  }
  return acc;
},[]);

// store remaining (if you want); i.e. {id:4,data:'data4'}
merged = merged.concat(list2)

console.log(merged);
.as-console-wrapper {
  max-height: 100% !important;
  top: 0;
}

0
Aadit M Shah On

I'm not sure whether I should mark this question as a duplicate because you phrased it differently. Anyway, here's my answer to that question copied verbatim. What you want is an equijoin:

const equijoin = (xs, ys, primary, foreign, sel) => {
    const ix = xs.reduce((ix, row) => // loop through m items
        ix.set(row[primary], row),    // populate index for primary table
    new Map);                         // create an index for primary table

    return ys.map(row =>              // loop through n items
        sel(ix.get(row[foreign]),     // get corresponding row from primary
        row));                        // select only the columns you need
};

You can use it as follows:

const equijoin = (xs, ys, primary, foreign, sel) => {
    const ix = xs.reduce((ix, row) => ix.set(row[primary], row), new Map);
    return ys.map(row => sel(ix.get(row[foreign]), row));
};

const list1 = [
    { id: "1", init: "init1" },
    { id: "2", init: "init2" },
    { id: "3", init: "init3" }
];

const list2 = [
    { id: "2", data: "data2" },
    { id: "3", data: "data3" },
    { id: "4", data: "data4" }
];

const result = equijoin(list2, list1, "id", "id",
    (row2, row1) => ({ ...row1, ...row2 }));

console.log(result);

It takes O(m + n) time to compute the answer using equijoin. However, if you already have an index then it'll only take O(n) time. Hence, if you plan to do multiple equijoins using the same tables then it might be worthwhile to abstract out the index.