I apologize up front for the title, I'm not sure how to word the question.
I am trying to find the index for a similar character or set of characters in two different, but similar strings.
- String A:
I <color=red><b>really</b></color> don't like spiders!
- String B:
I really don't like spiders!
The relevant text is the same, however A
has some formatting while B
does not. I got B
by taking A
and running a regex to find and replace all <contents>
with an empty string.
Now lets say I have selected a character at an index of 9 in B
, this would be the letter d
in the word don't
. How can I then determine in string A
that the letter d
in don't
needs to also be selected which is at an index of 35 (if I counted correctly)?
Edit: Possibly important information, these tags are for the rich text within Unity. Very similar to HTML in almost all regards.
As I already suggested in the comments, you should write your own parser for this format that keeps the formatting as metadata next to the text. For example, you could keep a simple list of string parts where each part represents consecutive text with the same formatting.
You could start with something simplistic as this:
For your text, this will give you the following result:
As you can see, you get pairs of text, with a list of formatting information for that particular part of text. If you then want to get the underlying text, you can just iterate the first list elements:
And on top of that, you can also create your own indexing method (note that this one is really crude):
Finally, you can put this all inside a custom type, which implements the iterator protocol and the senquence protocol, so you can use it like a normal string.