Text replacements with splice do not work with smiles (or multibyte chars)

96 views Asked by At

I have a problem with a complex replacement algorithm. In the end I was able to reduce the problem to this minimal code:

const input="test  hello test world"
let start = 0
let output = [...input]
const replacements = []
for (let end = 0; end <= input.length; end++) {
    const c = input[end]
    if (c == ' ') {
        if (start !== end) {
            const word = input.substring(start, end).toLowerCase()
            if (word == 'test') {
                replacements.push({start, length:(end - start), text:'REPLACEMENT'})
            }
        }
        start = end + 1
    }
}
for(let i=replacements.length-1;i>=0;i--) {
    output.splice(replacements[i].start, replacements[i].length, replacements[i].text)
}
console.log(output.join(''))

My input is "test hello test world" and the expected output would be "REPLACEMENT hello REPLACEMENT world", but it is actually "REPLACEMENT hello tREPLACEMENTworld". I can remember from the Twitter API that JavaScript has a strange way to handle byte positions and char indices. So the issue is caused oblicious by the smiley.

How can I fix my code, so that the replacement works as expected? Bonus question why is that happening?

1

There are 1 answers

1
rekire On BEST ANSWER

Well that was quick:

const input="test  hello test world"
let start = 0
let output = [...input]
const replacements = []
for (let end = 0; end <= output.length; end++) {
    const c = output[end]
    if (c == ' ') {
        if (start !== end) {
            const word = output.slice(start, end).join('').toLowerCase()
            if (word == 'test') {
                replacements.push({start, length:(end - start), text:'REPLACEMENT'})
            }
        }
        start = end + 1
    }
}
for(let i=replacements.length-1;i>=0;i--) {
    output.splice(replacements[i].start, replacements[i].length, replacements[i].text)
}
console.log(output.join(''))

When I use output array as input the indices work as expected and my replacement works again. However I will give anyone the accepted state who can explain why that change is required.