I am attempting to iterate the following string:
mɔ̃tr
But no matter what I do, it always ends up getting processed as:
m ɔ ̃ t r
The tilde seems to detach from the reversed c.
One of my first attempts was to do the following:
"mɔ̃tr".map {
print(it)
}
But the tilde would not stay with the reversed c.
I saw suggestions for the following iterator:
fun codePoints(string: String): Iterable<String> {
return object : Iterable<String> {
override fun iterator(): MutableIterator<String> {
return object : MutableIterator<String> {
var nextIndex = 0
override fun hasNext(): Boolean {
return nextIndex < string.length
}
override fun next(): String {
val result = string.codePointAt(nextIndex)
nextIndex += Character.charCount(result)
return String(Character.toChars(result))
}
override fun remove() {
throw UnsupportedOperationException()
}
}
}
}
}
But this gave the same output as the previous example.
I have been stuck on this seemingly simple problem for a day now, I just want to process this string as though it had 4 characters, not 5.
Any tips?
"ɔ̃" consists of two Unicode code points. This is why the code point iterator you showed still treats ɔ̃ as separate.
"ɔ̃" is a single grapheme cluster. To iterate over those, you need a
java.text.BreakIterator. In the documentation, there is an example that shows you how.In Kotlin, you can write an extension function on
Stringthat returns you aSequenceof the grapheme clusters.Now
"mɔ̃tr".graphemeClusterSequence().forEach { println(it) }prints: