Undo the str_wrap()

52 views Asked by At

I would like to un-wrap the wrapped text in r. However, it is not successful after as.character().

Here is a toy code:

t<-c("The adds fundamental principle”, The discriminatory ")
str_wrap(t,3)

Indeed, I did have a look at the question here, but I am not sure what are {0,2}$ and .{3}. Not sure if they are helpful in solving my problem though.

Some more extreme cases might contain line breaks ("\n") or multiple spaces between the words/ at the end of the strings. For these cases, I acknowledge that str_wrap() might not be the best option to wrap the string. Any suggestion on how to wrap and unwrap the string is greatly appreciated. Below is another toy data:

example<-c("The extreme long long\nlonglong long.  long long strings”, The theory                is about.   ")
str_wrap(example,3)
2

There are 2 answers

1
Andy Baxter On

Might not get you fully back to exactly the same string but you can just paste it back together in this example:

library(stringr)
#> Warning: package 'stringr' was built under R version 4.3.2

t<-c("The adds fundamental principle”, The discriminatory ")
out_string <- str_wrap(t,3)

out_string
#> [1] "The\nadds\nfundamental\nprinciple”,\nThe\ndiscriminatory"

str_replace_all(out_string, "\n", " ")
#> [1] "The adds fundamental principle”, The discriminatory"

It obviously here leaves off the space at the end of the string, but the wrapped string contains no data whatsoever about trailing white space in the original string. Also, if the original string did contain line breaks, the reverse step would be unable to distinguish this from wrap-inserted line break.

0
Mark On

To answer your questions:

I am not sure what are .{0,2}$ and .{3}

These are regular expressions. A regular expression is "a sequence of characters that specifies a match pattern in text" [from Wikipedia].

. matches any character except a newline, and the number in brackets says how many times it should be matched. For the first example, it's 0 to 2 times, and with the second, it's 3 exactly.

| is or: basically, match the thing on the left, or the one on the right.

$ looks for the end of the string.

So putting it all together, ".{3}|.{0,2}$" matches three characters, or zero to two characters then the end of the string.

Unwrapping strings

It really depends on if you are wanting to unwrap strings made by strwrap, or those made by the function given in the linked answer.

strwrap() doesn't add an extra line break where one already exists, so there is no way to tell, unless you still have the original string. If you are using the linked answer one, the answer should be straightforward- for every five characters, extract the first three, then concatenate the results.