When profiling a very slow method, I discovered the lag is on searching and filtering of a collection.
The method does the following (in order). According to profiler, 80% of time are spend on step 1-3.
- Read sorted collection from a file and deserialize using Protobuf-net (v2)
- From a sorted collection, filter based on a start and end integer (name
.RangeFromTo()) - From the same sorted collection, get the next element of the collection (name .
Right()) - Perform some task...
.RangeFromTo() filters for a given range, for example:
[3,7,9,12].RangeFromTo(2,9) -> [3,7,9]
[3,7,9,12].RangeFromTo(2,8) -> [3,7]
[3,7,9,12].RangeFromTo(7,13) -> [7,9,12]
[3,7,9,12].RangeFromTo(13,14) -> [ ]
.Right() finds an element in the collection and gives you the next on in the list. If the element doesn't exist it gives you the closest one counting to the right. For example:
[3,7,9,12].Right(0) -> 3
[3,7,9,12].Right(3) -> 7
[3,7,9,12].Right(4) -> 7
[3,7,9,12].Right(12) -> null
Currently the collection is using SortedArray from C5 (https://github.com/sestoft/C5/). Is there a more suitable collection that I can use?
Note: Step 1. takes around 30% of the total time. If I use a List instead, protobuf actually takes 40% less time deserializing! I guess when inserting into an SortedArray the collection doesn't know the data is already sorted and is doing a whole bunch of work. The ideal collection (if exist) should also be able to bypass that.
Edit: To clarify, the list are around 1000-5000 and there are 90k different collections! The method in question needs to load all the collections in memory to perform some business task.
Edit 2: I have added some sample benchmark here:
https://github.com/cchanpromys/so_19188345
It compares SortedArray from C5 with SortedSet from .Net. So far the results are as follows:
C5 sorted array deserialize took 904
Sorted set deserialize took 1040
C5 sorted array .Right() took 5
Sorted set .Right() took 798 <--- Not sure what happened here...
C5 sorted array .RangeFromTo() took 217
Sorted set .RangeFromTo() took 140
Edit 3 This is out of my expectations but I ended up with a custom implementation of a list.
The problem that I had is that SortedArray's Find operation (in general) takes O(Log(N)), while I want it to be an O(1) operation.
Also, the list is sorted by nature, you will never add to the middle of the list.
So I ended up implementing a list that has an internal indexer array, for example:
For example:
indexer: [0,0,0,0,1,1,1,1,2,2]
list: [3,7,9]
So .Right(3) would be list[indexer[3]++].
The code can be found here.
It is hard to believe that this type of list is not already implemented somewhere on the internet. If possible, I would like to use a library so I won't have to manage my own list.
Do such implementation exist on the internet?
If your arrays are small enough (maybe under 10-20 elements) than there is a good chance that simple linear search is good enough (which is somewhat shown by
Listbeing faster in your measurements) and you can cut out ranges withWhere/TakeWhile: