Reduce allocation sorting large list (or vector)

Question

Reduce allocation sorting large list (or vector)

373 views Asked by Grzegorz Chrupała At 08 December 2024 at 01:54

I am trying to reduce GC time in my program. The main suspect is the following piece of code:

Data.Vector.Unboxed.fromList . take n . List.sortBy (flip $ Ord.comparing id) 
 $ [ ( sum [ (c + a) * wsum z | (z,c) <- IntMap.toList zt_d ] , d)
   | d <- IntMap.keys $ m
   , let zt_d = IntMap.findWithDefault IntMap.empty d $ m ]

The list being sorted would typically contain several thousand elements. I think the list sort is the culprit, because if I replace take n . List.sortBy (flip $ Ord.comparing id) with return . List.maximum my productivity goes from 60% to 95%.

Is there anything I can do to reduce allocation here?

Update

As recommended, I replaced the List.sort by an inplace sort from vector-algorithms. Perhaps I'm doing it wrong, but what I'm seeing is that there is no allocation (productivity 97% as opposed to 63% with lists), but the program is many times slower: it runs in 85 seconds with List.sortBy; with inplace sort I killed it after waiting 7 minutes. I tried both Intro and Merge sorts. Here is my code:

import qualified Data.Vector.Generic.Mutable as GM
import qualified Data.Vector.Generic as G
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Algorithms.Merge as Sort
import qualified Data.Vector.Fusion.Stream as Stream
import Control.Monad.ST   

sortBy :: (Ord a, U.Unbox a) => (a -> a -> Ordering) -> [a] -> U.Vector a
sortBy cmp xs = runST $ do
  mv  <- GM.unstream . Stream.fromList $ xs
  Sort.sortBy cmp mv
  G.unsafeFreeze mv

Original Q&A

There are 1 answers

**Daniel Fischer** · Accepted Answer · 2012-03-02T17:50:58+00:00

The sorting does indeed look like it will cause a lot of allocation. While the sorting is performed on a list, that cannot be completely changed, since sorting lists causes the construction of many intermediate lists. If necessary, you could try to do the sorting on an MVector using for example the vector-algorithms package which provides efficient sorting algorithms.

However, there are further inefficiencies that cause more allocation than necessary in

Data.Vector.Unboxed.fromList . take n . List.sortBy (flip $ Ord.comparing id) 
 $ [ ( sum [ (c + a) * wsum z | (z,c) <- IntMap.toList zt_d ] , d)
   | d <- IntMap.keys $ m
   , let zt_d = IntMap.findWithDefault IntMap.empty d $ m ]

When you write

d <- IntMap.keys m, let zt_d = IntMap.findWithDefault IntMap.empty d m
-- The '$' are unnecessary, I left them out

you are 1) traversing the entire map to collect the list of keys, and 2) then look up each key on its own. Since you only look up keys present in the map, you never use the default. Much more efficient is to create the list of key/value pairs in one traversal of the map:

(d,zt_d) <- IntMap.assocs m

Then if id in flip $ Ord.comparing id is indeed the identity function, that would be more readable (and possibly more efficient) as sortBy (flip compare).

Depending on the type of the summed elements (and possibly the optimisation level), it might be better to use Data.List.foldl' (+) 0 instead of sum.

TechQA.

Reduce allocation sorting large list (or vector)

There are 1 answers

Related Questions in SORTING

Related Questions in HASKELL

Related Questions in ALLOCATION

Popular Questions

Popular Tags

Trending Questions