Place an escape sign before every non-alphanumeric characters

404 views Asked by At

I am trying to place an escape sign before every non-alphanumeric character:

 > my $b = "!@#%^||" ~ "/welcome xyz:!@#\$%^&*()|:;.,?/-."
!@#%^||/welcome xyz:!@#$%^&*()|:;.,?/-.

> my $c = $b.subst(/<:!L + :!N - [./-]>/, "\\" ~ $/, :g)
\ \ \ \ \ \ \ /welcome\ xyz\ \ \ \ \ \ \ \ \ \ \ \ \ \ .\ \ /-.

This is the result after running the code the first time. After I run the code the second time, the results is a long string of repeating matches. Similar result if I use the "?" quantifier.

> my $c = $b.subst(/<:!L + :!N - [./-]>/, "\\" ~ $/, :g)
\! @ # % ^ | |   : ! @ # $ % ^ & * ( ) | : ; , ?\! @ # % ^ | |   : ! @ # $ % ^ & * ( ) | : ; , ?\! @ # % ^ | |      # This is truncated long string of undesired result.

Then I tried comb to substitute a single char, but I get multiple errors

> $b.comb.map( {.subst(/<:!L + :!N - [./-]>/, "\\" ~ $/)} )
Use of uninitialized value element of type Any in string context.
Methods .^name, .raku, .gist, or .say can be used to stringify it to something meaningful.
  in block  at <unknown file> line 1
(\! \! \@ \# \% \^ \| / w e l c o m e \ x y z \ \: \! \@ \# \$ \% \^ \& \* \( \) \| \: . \ \, / - .)

And the result is slightly different if I run the code a second time:

(\ \! \@ \# \% \^ \| / w e l c o m e \ x y z \ \: \! \@ \# \$ \% \^ \& \* \( \) \| \: . \ \, / - .)

Also I cannot join this list:

> $b.comb.map( { if $_.so { .subst(/<:!L + :!N - [./-]>/, "\\" ~ $/)} } ).join
Use of uninitialized value element of type Any in string context.
Methods .^name, .raku, .gist, or .say can be used to stringify it to something meaningful.
  in block  at <unknown file> line 1
  in block <unit> at <unknown file> line 1

The routine tr/// does not do what I try to accomplish.

What is a quick way to place "\" before every non-alnum char in a string? Seems so simple yet so hard. Thanks.

3

There are 3 answers

0
raiph On BEST ANSWER

TL;DR Wrap the substitution in a code block ({ ... }).

A problem when using $/ with subst

Quoting $/ doc:

set to the result of the last Regex match

This isn't always true.

Quoting, with suitable added emphasis, from the "Publication" of match variables by Rakudo section of an SO answer I wrote (in a reply to another SO Q you wrote):

the regex/grammar engine makes a conservative call about [when it's worth "publishing" (updating) $/]. By "conservative" I mean that the engine often avoids doing publication, because it slows things down and is usually unnecessary. Unfortunately it's sometimes too optimistic about when publication is actually necessary. Hence the need for programmers to sometimes intervene by explicitly inserting a code block to force publication of match variables...

The context of the above verbiage was your earlier Q, which wasn't a subst's substitution. And I haven't read the compiler code to check what is going on in this new Q's scenario.

However, as I read your new SO Q, I immediately felt fairly confident that, when considering the doc's phrasing about $/ being "set to the result of the last Regex match", and doing so in the context of a subst call, the updating of $/ would not occur if the substitution passed to subst is just a string.


To see in more detail what's going on when you don't wrap the substitution in a code block, one can use the $/ with a code block that does not wrap it but does still say it "en passant":

my $c = $b.subst(/<:!L + :!N - [./-]>/, "\\" ~ $/.&{ say $_; $_ }, :g);

As you'll see if you run that code, $/ is initially set to Nil.

Then, after that statement has been executed, but before the next statement is executed, the $/ value gets updated. That's why you get a different (but still unhelpful!) result in each subsequent statement. That is to say, $/ is getting updated, but the update is too late for it to work if you just use $/ in a string expression for the substitution rather than putting it inside a code block.

A solution when using subst in a recent Rakudo

Quoting again from the same doc:

A fresh [$/] is created in every routine.

Again, I haven't checked the compiler source code, but felt somewhat confident that a fresh $/ is created not just in every Routine but in every Block

So I tested wrapping the substitution in a code block, and sure enough that meant the match variable "publication" (updating of $/ et al) did occur.

So I think that's one solution.

Another solution

What is a quick way to place "\" before every non-alnum char in a string?

$_ = '42!@#%^||' ~ '/welcome xyz:!@#$%^&*()|:;.,?/-.'; # (No need for any `\`)

.=subst: / <-:L -:N> /, { q:!b:s '\$/' }, :g;

.say; # 42\!\@\#\%\^\|\|\/welcome\ xyz\:\!\@\#\$\%\^\&\*\(\)\|\:\;\.\,\?\/\-\.

This is just the same solution in different clothes.

I've used the Q Lang's q. This defaults to '...' string like interpretation of its argument. But one can extensively control its behavior by using options. I've used the :!b option to turn interpreting backslashes off, and :s to turn interpreting of scalar variables (with $ sigil) on.

Footnotes

¹ Modulo optimization of course. That is to say, I'm ignoring optimization that is semantically invisible to user code (I'm ignoring it precisely because it's semantically invisible). This is in stark contrast to the "conservative call" I discussed in the quote from my prior SO answer, by which I meant something akin to a WONTFIX.

3
wamba On

The following code literally places an escape sign before every non-alphanumeric character.

my $b = '!@#%^||' ~ '/welcome xyz:!@#\\$%^&*()|:;.,?/-.';
say $b.subst: / <?before <-alnum>> /, '\\', :g
\!\@\#\%\^\|\|\/welcome\ xyz\:\!\@\#\\\$\%\^\&\*\(\)\|\:\;\.\,\?\/\-\.
0
jubilatious1 On

Apologies if what follows has been considered by the OP already:

If the only reason you're trying to backslash escape non-alnum characters in Raku is to place the result into variable and then test via Raku regex, then calling .raku on your string variable can often be helpful (i.e. it will get you part-of-the-way there):

~~$ echo '!@#%^||/welcome xyz:!@#\$%^&*()|:;.,?/-.' | raku -ne '.raku.put'

Returns:

"!\@#\%^||/welcome xyz:!\@#\\\$\%^\&*()|:;.,?/-."

Looking at the result above, you'll see an intermediate level of backslash-escaping, with backslashes protecting the $, @, %, & and \ characters.

You can remove the surrounding double-quotes from the resultant string as follows:

~$ echo '!@#%^||/welcome xyz:!@#\$%^&*()|:;.,?/-.' | raku -ne '.raku.comb(/\" <(.+)> \"/).put'
!\@#\%^||/welcome xyz:!\@#\\\$\%^\&*()|:;.,?/-.

...or the slightly more whimsical:

~$ echo '!@#%^||/welcome xyz:!@#\$%^&*()|:;.,?/-.' | raku -ne '.raku.chop.flip.chop.flip.put'
!\@#\%^||/welcome xyz:!\@#\\\$\%^\&*()|:;.,?/-.