Does Dictionary<string, int> preserves the insertion order?

135 views Asked by At

I have been using C# for more than 10 years, but today I discovered that Dictionary<string, int> retains the insertion order? As far as I know, Dictionary is implemented internally using a hash table, so the internal arrangement should be in no particular order, but the test code below seems to show that the order during traversal is exactly the same as the order of insertion?

        const int N = 1000000;

        List<string> ss = new List<string>();
        Dictionary<string, int> dict = new Dictionary<string, int>();

        for (int i = 0; i < N; ++i)
        {
            var s = string.Empty;
            for (int j = 0; j < 15; ++j)
                s += (char)('a' + UnityEngine.Random.Range(0, 26));

            //ss.add(s);
            ss.Add(new string(s));

            //dict[s] = UnityEngine.Random.Range(0, 1024);
            dict.Add(s, UnityEngine.Random.Range(0, 1024));
        }

        int same = 0;

        var keys = dict.Keys.ToArray();

        for (int i = 0; i < N; ++i)
        {
            if (keys[i] == ss[i])
                ++same;
        }

        Debug.Log($"N={N}, same={same}");
        
        // the output is : N=1000000, same=1000000

Now I wonder if there is something wrong with my test code?

Can anyone confirm that the traversal order of Dictionary in C# is the insertion order?

Is it reliable?

3

There are 3 answers

2
Olivier Jacot-Descombes On BEST ANSWER

I looked at the implementation of Dictionary<TKey, TValue> (by clicking Alt-F12 with the text cursor on Dictionary<string, int> in Visual Studio). I do not understand it fully; however, I noticed that two arrays are used to store the entries:

private int[]? _buckets;
private Entry[]? _entries;

The _buckets are indeed accessed through the hash code. I.e., this is the hash table. By debugging I noticed that the entries are added in a random order into _buckets but in a sequential order into _entries. _buckets contains an index into _entries. This explains why the entries can be retrieved in the same order as they have been added to the dictionary.

This behavior is due to implementation detail. So, I wouldn't rely on it as the implementation could change in future releases. In fact, the implementation of Dictionary<TKey, TValue> has changed in the past. Also, I don't know if this order is always maintained, or if there are conditions where it might be broken.


Test setup: .NET 8.0 Console Application / C# 12

public static void Test()
{
    const int N = 1000000;

    List<string> list = [];
    Dictionary<string, int> dict = [];
    Type type = dict.GetType();

    for (int i = 0; i < N; ++i) {
        string s = String.Empty;
        for (int j = 0; j < 4; ++j) {
            s += (char)('a' + Random.Shared.Next(26));
        }
        list.Add(s);
        dict[s] = i; // Adding i simplifies analysis, as _entries[i].value now reflects the order.

        // Retrieve the dictionary's private fields at every iteration, since they are reallocated when they are resized.
        int[]? _buckets = (int[]?)type.GetField("_buckets", BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(dict);
        object? _entries = type.GetField("_entries", BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(dict);
    } // <<< place a breakpoint here and inspect _buckets and _entries.

    int same = 0;
    string[] keys = dict.Keys.ToArray();
    for (int i = 0; i < keys.Length; ++i) {
        if (keys[i] == list[i]) {
            ++same;
        }
    }

    Console.WriteLine($"N={keys.Length}, same={same}");
}
0
Peppermintology On

Your understanding that Dictionary<TKey, TValue> uses a hash table under the hood is correct. The observed behaviour that when enumerating your collection the returned order is the same as your insertion order often causes confusion, is purely coincidental and should not be relied upon.

1
21k On

Agreed:

It seems very unlikely that one million random strings will be inserted in the hash table in ascending order by accident. -Olivier Jacot-Descombes

Out of curiosity, I wanted to know what would happen if the strings were shuffled before inserting into the dictionary (meaning that the addresses of the strings are no longer in ascending order), however the test results were still the same.

            const int N = 1000000;
            //const int N = 100;

            List<string> ss = new List<string>();
            Dictionary<string, int> dict = new Dictionary<string, int>();

            for (int i = 0; i < N; ++i)
            {
                var s = string.Empty;
                for (int j = 0; j < 15; ++j)
                    s += (char)('a' + UnityEngine.Random.Range(0, 26));

                ss.Add(s);
            }

            ss.Shuffle();

            for (int i = 0; i < N; ++i)
            {
                dict[ss[i]] = UnityEngine.Random.Range(0, 1024);
            }

            int same = 0;

            var keys = dict.Keys.ToArray();

            for (int i = 0; i < N; ++i)
            {
                if (keys[i] == ss[i])
                    ++same;
            }

            Debug.Log($"N={N}, same={same}");
            // the output is : N=1000000, same=1000000


And what if the keys are ints?

            List<int> ints = new List<int>();
            Dictionary<int, int> dict = new Dictionary<int, int>();

            const int N = 1000000;
            //const int N = 100;

            for (int i = 0; i < N; ++i)
            {
                var num = 100 + i;
                ints.Add(num);
                dict.Add(num, UnityEngine.Random.Range(0, 1024));
            }

            int same = 0;

            var keys = dict.Keys.ToArray();

            for (int i = 0; i < N; ++i)
            {
                if (keys[i] == ints[i])
                    ++same;
            }

            Debug.Log($"N={N}, same={same}");
            // the output is : N=1000000, same=1000000

Without looking at the implementation, I can't explain why this happens, but according to the official documentation, it is unreliable.

For purposes of enumeration, each item in the dictionary is treated as a KeyValuePair<TKey,TValue> structure representing a value and its key. The order in which the items are returned is undefined.