Implementing sample code for unicode collation algorithm

483 views Asked by At

I have the following requirement in my project. I need to sort strings based on order of the characters provided by the client.

For example:

Order provided by the user:d,a,A,D,z,p,P,Z

So if we have some strings like AaP,aAp,PpZ,pPz.

After sorting the output should be aAp,AaP,pPz,PpZ as a>A>p>P according to initial order given by the user.

Now I am thinking of picking Unicode Collation algorithm(http://unicode.org/reports/tr10/) for implementing the above requirement.

Can some one suggest me the data structures to use for the following few things for better performance.

1.)Mapping the ascii values of the characters to given order order of user--I am thinking of using map.But it can be O(logn) for access.I could not use hashmap as I code in c++.

2.)What sorting techniques can be used for comparing the sort key after generating the sort keys.Can some thing like radix sort be used here?

Please share your thoughts..

Though the following requirement is not needed for my project,I just want to know

how are collation elements actually created from the Unicode values or ascii values like this as mentioned in the above link for Unicode collation algorithm?

Character    Collation Element       Name

0300 "`"    [.0000.0021.0002]   COMBINING GRAVE ACCENT
0061 "a"    [.06D9.0020.0002]   LATIN SMALL LETTER A
0062 "b"    [.06EE.0020.0002]   LATIN SMALL LETTER B
0063 "c"    [.0706.0020.0002]   LATIN SMALL LETTER C
0043 "C"    [.0706.0020.0008]   LATIN CAPITAL LETTER C
0064 "d"    [.0712.0020.0002]   LATIN SMALL LETTER D
0

There are 0 answers