I have a list with over a million objects in it, trying to find the fastest way to search through it

Question

I have a list with over a million objects in it, trying to find the fastest way to search through it

3.6k views Asked by Euthyphro At 09 December 2013 at 03:40

I have a list that stores well over a million objects within it. I need to look through the list and update the objects found through the below query as efficiently as possible.

I was thinking of using a Dictionary or HashSet, but I'm relatively new to C# and could not figure out how to implement these other two approaches. My current code is simply a LINQ statement searching through an IList.

public IList<LandObject> landObjects = new List<LandObject>();    

var lObjsToUpdate = from obj in landObjects 
                        where 
            obj.position.x >= land.x - size.x && 
            obj.position.x <= land.x + size.x && 
            obj.position.y >= land.y - size.y && 
            obj.position.y <= land.y + size.y
                    select obj;

 foreach(var item in lObjsToUpdate)
 {
     //do what I need to do with records I've found
 }

Could anyone be so kind as to suggest how I might approach this efficiently?

Original Q&A

There are 3 answers

**sinelaw** · Answer 1 · 2013-12-09T04:01:00+00:00

The real answer should involve performance tests and comparisons, and depends on your runtime environment (memory vs. cpu, etc).

The first thing I would try, since land and size are constant, you can maintain a simple HashSet<LandObject> of objects that fit the criteria (in addition to a list or set of all objects or just all other objects). Every time a new object is added, check if it fits the criteria and if so - add it to that set of objects. I don't know how good HashSet is at avoiding collisions when working with object references, but you can try to measure it.

BTW, this related question discussing memory limits of HashSet<int> might interest you. With .NET < 4.5 your HashSet should be ok up to several million items. From what I understand, with some configuration .NET 4.5 removes the limitation of 2gb max object size and you'll be able to go crazy, assuming you have a 64-bit machine.

**tinstaafl** · Answer 2 · 2013-12-09T04:56:55+00:00

One thing that will probably help with that many iterations is do the calculations once and use different variables inside your query. Also it should help some to increase the range by 1 on each end and eliminate checking for equals:

public IList<LandObject> landObjects = new List<LandObject>();    
float xmax = land.x + size.x + 1;
float xmin = land.x - size.x - 1;
float ymax = land.y + size.y + 1;
float ymin = land.y - size.y - 1;

var lObjsToUpdate = from obj in landObjects 
                        where 
            obj.position.x > xmin && 
            obj.position.x < xmax && 
            obj.position.y > ymin && 
            obj.position.y < ymax
                    select obj;

**Corey** · Answer 3 · 2013-12-09T04:58:17+00:00

Ideally you need a way to partition elements so that you don't have to test every single one to find the ones that fit and the ones that should be thrown out. How you partition will depend on how dense the items are - it might be as simple as partitioning on the integer portion of the X coordinate for instance, or by some suitable scaled value of that coordinate.

Given a method (let's call it Partition for now) that takes an X coordinate and returns a partition value for it, you can filter on the X coordinate fairly quickly as a first-pass to reduce the total number of nodes you need to check. You might need to play with the partition function a little to get the right distribution though.

For example, say that you have floating-point coordinates in the range -100 < X <= 100, with your 1,000,000+ objects distributed fairly uniformly across that range. That would divide the list into 200 partitions of (on average) 5000 entries if partitioned on integer values of X. That means that for every integer step in the X dimension of your search range you only have ~5,000 entries to test.

Here's some code:

public interface IPosition2F
{
    float X { get; }
    float Y { get; }
}

public class CoordMap<T> where T : IPosition2F
{
    SortedDictionary<int, List<T>> map = new SortedDictionary<int,List<T>>();
    readonly Func<float, int> xPartition = (x) => (int)Math.Floor(x);

    public void Add(T entry)
    {
        int xpart = xPartition(entry.X);
        List<T> col;
        if (!map.TryGetValue(xpart, out col))
        {
            col = new List<T>();
            map[xpart] = col;
        }
        col.Add(entry);
    }

    public T[] ExtractRange(float left, float top, float right, float bottom)
    {
        var rngLeft = xPartition(left) - 1;
        var rngRight = xPartition(right) + 1;

        var cols =
            from keyval in map
            where keyval.Key >= rngLeft && keyval.Key <= rngRight
            select keyval.Value;

        var cells =
            from cell in cols.SelectMany(c => c)
            where cell.X >= left && cell.X <= right &&
                cell.Y >= top && cell.Y <= bottom
            select cell;

        return cells.ToArray();
    }

    public CoordMap()
    { }

    // Create instance with custom partition function
    public CoordMap(Func<float, int> partfunc)
    {
        xPartition = partfunc;
    }
}

That will partition on the X coordinate, reducing your final search space. If you wanted to take it a step further you could also partition on the Y coordinate... I'll leave that as an exercise for the reader :)

If your parition function is very finely grained and could result in a large number of partitions, it might be useful to add a ColumnRange function similar to:

    public IEnumerable<List<T>> ColumnRange(int left, int right)
    {
        using (var mapenum = map.GetEnumerator())
        {
            bool finished = mapenum.MoveNext();
            while (!finished && mapenum.Current.Key < left)
                finished = mapenum.MoveNext();

            while (!finished && mapenum.Current.Key <= right)
            {
                yield return mapenum.Current.Value;
                finished = mapenum.MoveNext();
            }

        }
    }

The ExtractRange method can then use that like so:

    public T[] ExtractRange(float left, float top, float right, float bottom)
    {
        var rngLeft = xPartition(left) - 1;
        var rngRight = xPartition(right) + 1;

        var cells =
            from cell in ColumnRange(rngLeft, rngRight).SelectMany(c => c)
            where cell.X >= left && cell.X <= right &&
                cell.Y >= top && cell.Y <= bottom
            select cell;

        return cells.ToArray();
    }

I used SortedDictionary for convenience, and because it makes it possible to do an ExtractRange method that is reasonably quick. There are other container types that are possibly better suited to the task.

TechQA.

I have a list with over a million objects in it, trying to find the fastest way to search through it

There are 3 answers

Related Questions in C#

Related Questions in DICTIONARY

Related Questions in HASH

Related Questions in HASHSET

Related Questions in ILIST

Popular Questions

Popular Tags

Trending Questions