How can I get the portion of html that produces a certain string after stripped?

61 views Asked by At

Basically, I want to be able to call a function on some html string and get back an array of the start and end indices of the occurrences. It would look like this in the console:

var html = "<b>Hello</b>&nbsp;<mark>World</mark>";
> getIndices(html, "Hello\u00A0World");
< [[3, 29]]

The end goal of doing this is to be able to wrap the html with some tags given a string to search in the document, much like the ctrl+f functionality of most browsers do.

I wrote the code snippet below but its performance is horrible, specially on long webpages when calling it on the entire body's inner html. This code can definitely be optimized by doing a binary search as opposed to brute force and doing some other things a bit differently but I'm having trouble implementing that. Thoughts on this?

function getIndices(html, searchTerm){
    var i = 0,
        indices = [];
    while(html.slice(i).replace(/<[^>]*>/g, '').indexOf(searchTerm) !== -1){
        i = html.indexOf(searchTerm[0], i);
        if(html.slice(i).replace(/<[^>]*>/g, '').indexOf(searchTerm) === 0){
            indices.push(i);
        }
        i++;
    }
    return indices;
}

Thanks!

1

There are 1 answers

0
Brian HK On BEST ANSWER

Create a treewalker and check which nodes contain the search string