Can you have combining symbols with astral symbols in Unicode?

58 views Asked by At

I'm working on a project Pure Terminal and I want to unit tests handling of characters that have more then one codepoints. I want to have 100% coverage.

This is the code I want to test:

function make_next_char_fun(string: string) {
    const tests: Array<(arg: string) => string | void> = [];
    [
        entity_re,
        emoji_re,
        combine_chr_re
    ].forEach(function(re) {
        if (string.match(re)) {
            tests.push(make_re_fn(re));
        }
    });
    if (string.match(astral_symbols_re)) {
        tests.push(function(string: string) {
            const m1 = string.match(astral_symbols_re);
            if (starts_with(m1)) {
                const m2 = string.match(combine_chr_re);
                if (m2 && m2.index === 1) {
                    return string.slice(0, 3);
                }
                return m1[1];
            }
        });
    }
    return function(string: string) {
        for (let i = 0; i < tests.length; ++i) {
            const test = tests[i];
            const ret = test(string);
            if (ret) {
                return ret;
            }
        }
        return string[0];
    };
}

The not covered line is inside this part of the code:

tests.push(function(string: string) {
    const m1 = string.match(astral_symbols_re);
    if (starts_with(m1)) {
        const m2 = string.match(combine_chr_re);
        if (m2 && m2.index === 1) {
            return string.slice(0, 3);
        }
        return m1[1];
    }
});

To be exact I want to make the code inside 2nd if statement to execute.

But the question I have, which I'm not able to find the answer to. Can astral symbols in Unicode have combing characters? If not I could delete that part of the code to have 100% coverage.

I was trying to search for some astral symbols and for some combine characters but my first test didn't work:

I was picking a random character from a Wikipedia article about Combining characters with an astral symbol:

I know that you can have combine characters in emoji but emoji are handled by emoji_regex so for emoji that part of the code is not even executed.

The code is in TypeScript but the issue is with JavaScript, the types don't matter.

1

There are 1 answers

2
CharlotteBuff On BEST ANSWER

There is no functional difference between astral and non-astral characters; they are all treated equally by the Unicode Standard and can be freely combined every which way. The astral planes contain writing systems of the same nature as those in the BMP, so extensive use of combining marks is normal there as well.