Sorted TStringList, how does the sorting work?

3.9k views Asked by At

I'm simply curious as lately I have been seeing the use of Hashmaps in Java and wonder if Delphi's Sorted String list is similar at all.

Does the TStringList object generate a Hash to use as an index for each item? And how does the search string get checked against the list of strings via the Find function?

I make use of Sorted TStringLists very often and I would just like to understand what is going on a little bit more.

Please assume I don't know how a hash map works, because I don't :)

Thanks

6

There are 6 answers

2
David Heffernan On BEST ANSWER

I'm interpreting this question, quite generally, as a request for an overview of lists and dictionaries.

  • A list, as almost everyone knows, is a container that is indexed by contiguous integers.
  • A hash map, dictionary or associative array is a container whose index can be of any type. Very commonly, a dictionary is indexed with strings.

For sake of argument let us call our lists L and our dictionaries D.

Lists have true random access. An item can be looked-up in constant time if you know its index. This is not the case for dictionaries and they usually resort to hash-based algorithms to achieve efficient random access.

A sorted list can perform binary search when you attempt to find a value. Finding a value, V, is the act of obtaining the index, I, such that L[I]=V. Binary search is very efficient. If the list is not sorted then it must perform linear search which is much less efficient. A sorted list can use insertion sort to maintain the order of the list – when a new item is added, it is inserted at the correct location.

You can think of a dictionary as a list of <Key,Value> pairs. You can iterate over all pairs, but more commonly you use index notation to look-up a value for a given key: D[Key]. Note that this is not the same operation as finding a value in a list – it is the analogue of reading L[I] when you know the index I.

In older versions of Delphi it was common to coax dictionary behaviour out of string lists. The performance was terrible. There was little flexibility in the contents.

With modern Delphi, there is TDictionary, a generic class that can hold anything. The implementation uses a hash and although I have not personally tested its performance I understand it to be respectable.

There are commonly used algorithms that optimally use all of these containers: unsorted lists, sorted lists, dictionaries. You just need to use the right one for the problem at hand.

0
Misha On

BTW, the Unicode sort routines for TStringList are quite slow. If you override the TStringList.CompareStrings method then if the strings only contain Ansi characters (which if you use English exclusively they will), you can use customised Ansi string comparisons. I use my own customised TStringList class that does this and it is 4 times faster than the TStringList class for a sorted list for both reading and writing strings from/to the list.

0
splash On

There is also a THashedStringList, which could be an option (especially in older Delphi versions).

4
AudioBubble On

You could look at the source code, since that comes with Delphi. Ctrl-Click on the "sort" call in your code.

It's a simple alphabetical sort in non-Unicode Delphi, and a slightly more complex Unicode one in later versions. You can supply your own comparison for custom sort orders. Unfortunately I don't have a recent version of Delphi so can't confirm, but I expect that under the hood there's a proper Unicode-aware and locale-aware string comparison routine. Unicode sorting/string comparison is not trivial and a little web searching will point out some of the pitfalls.

Supplying your own comparison routine is often done when you have delimited text in the strings or objects attached to them (the Objects property). In those cases you often wat to sort by a property of the object or something other than the first field in the string. Or it might be as simple as wanting a numerical sort on the strings (so "2" comes after "1" rather than after "19")

0
Warren  P On

Delphi's dictionary type (in generics-enabled versions of Delphi) is the closest thing to a hashmap, that ships with Delphi. THashedStringList makes lookups faster than they would be in a sorted string list. you can do lookups using a binary search in a sorted stringlist, so it's faster than brute force searches, but not as fast as a hash.

The general theory of a hash is that it is unordered, but very fast on lookup and insertion. A sorted list is reasonably fast on insertion until the size of the list gets large, although it's not as efficient as a dictionary for insertion.

The big benefit of a list is that it is ordered but a hash-lookup dictionary is not.

5
Thorsten Engler On

TStringList holds the strings in an array.

If you call Sort on an otherwise unsorted (Sorted property = false) string list then a QuickSort is performed to sort the items.

The same happens if you set Sorted to true.

If you call Find (or IndexOf which calls find) on an unsorted string list (Sorted property = false, even if you explicitly called Sort the list is considered unsorted if the Sorted property isn't true) then a linear search is performed comparing all strings from the start till a match is found.

If you call Find on a sorted string list (Sorted property = true) then a binary search is performed (see http://en.wikipedia.org/wiki/Binary_search for details).

If you add a string to a sorted string list, a binary search is performed to determine the correct insertion position, all following elements in the array are shifted by one and the new string is inserted.

Because of this insertion performance gets a lot worse the larger the string list is. If you want to insert a large number of entries into a sorted string list, it's usually better to turn sorting off, insert the strings, then set Sorted back to true which performs a quick sort.

The disadvantage of that approach is that you will not be able to prevent the insertion of duplicates.

EDIT: If you want a hash map, use TDictionary from unit Generics.Collections