Sorting and Grouping of Korean Character (Not familiar with Korean language)

1.1k views Asked by At

I'm trying to sort and group Korean character in Java.

Currently i'm about to sort by using

final Collator collator = Collator.getInstance(Locale.KOREA);
    Collections.sort(words, collator);

However, i have difficulty grouping them into their vowel group (ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ)

1

There are 1 answers

0
defectus On BEST ANSWER

This shouldn't be too difficult although I'm not sure if that's been already done before.

What you have to do is to take this first Unicode letter. In Unicode all hangul letters are composed following this formula which is nicely described at wikipedia http://en.wikipedia.org/wiki/Korean_language_and_computers#Example

So you can de-structure this using a formula:

(int)((54620 - 44032) / 588)

and using the key mentioned in the wikipedia article you can do something like this (I used Groovy as it's much simpler but I'm sure you can adapt it to Java)

def words = ['곰', '세', '마리','가', '한', '집에', '있어', '아빠', '곰', '엄마', '곰', '애기', '곰'], 
result = [('ㄱ'):[],
('ㄲ'):[],
('ㄴ'):[],
('ㄷ'):[],
('ㄸ'):[],
('ㄹ'):[],
('ㅁ'):[],
('ㅂ'):[],
('ㅃ'):[],
('ㅅ'):[],
('ㅆ'):[],
('ㅇ'):[],
('ㅈ'):[],
('ㅉ'):[],
('ㅊ'):[],
('ㅋ'):[],
('ㅌ'):[],
('ㅍ'):[],
('ㅎ'):[]], matrix = [(0):'ㄱ',
(1):'ㄲ',
(2):'ㄴ',
(3):'ㄷ',
(4):'ㄸ',
(5):'ㄹ',
(6):'ㅁ',
(7):'ㅂ',
(8):'ㅃ',
(9):'ㅅ',
(10):'ㅆ',
(11):'ㅇ',
(12):'ㅈ',
(13):'ㅉ',
(14):'ㅊ',
(15):'ㅋ',
(16):'ㅌ',
(17):'ㅍ',
(18):'ㅎ']

for (word in words) {
    def offset = (int)((word.charAt(0) - 44032) / 588)
    def firstJamo = matrix[offset]
    result[firstJamo] << word   
}

result

To see this code in action see https://groovyconsole.appspot.com/script/5767123439714304.