Cumulative application of a gsub sequence in R

60 views Asked by At

I'm working on a project dealing with chess games. After some processing of the data I need to get the FEN (https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation) notation of a particular position. I've already written the code for each piece FEN encoding, but I'm having a hard time encoding the character that represents the number of consecutive squares that are not occupied.

As an example, take the following FEN code:

"rnbq1rk1/pppp1ppp/1b11pn11/11111111/11PP1111/11111NP1/PP11PPBP/RNBQ1RK1 w KQkq c6 0 2"

Each 1 represents an unoccupied square inside the chess board. So, for example: 11111111 is telling us that this row inside the board is not occupied by pieces.

Problem is, R packages that plot chess boards using FEN as input don't like this notation and they want the more suscint, original notation where all the 1s are represented by one character: the sum of all this consecutive 1s. For the previous example, that would be:

"rnbq1rk1/pppp1ppp/1b2pn2/8/2PP4/5NP1/PP2PPBP/RNBQ1RK1 w KQkq c6 0 2"

Note that, for example, the 11111111 sequence was replaced by 8, the sum of all consecutive 1s

I've tried use mapply with gsub to get the replacements done, but it iterates over the strings applying the pattern-replacement pair one at a time. The result is the following:

Code:

pattern <- c("11111111","1111111","111111","111111","1111","111","11")
replacement <- c("8","7","6","5","4","3","2")
FENCodeToBeChanged  <-  "rnbq1rk1/pppp1ppp/1b11pn11/11111111/11PP1111/11111NP1/PP11PPBP/RNBQ1RK1 w KQkq c6 0 2"
mapply(gsub,pattern,replacement,FENCodeToBeChanged)

Result:

                                                                              11111111 
  "rnbq1rk1/pppp1ppp/1b11pn11/8/11PP1111/11111NP1/PP11PPBP/RNBQ1RK1 w KQkq c6 0 2" 
                                                                           1111111 
 "rnbq1rk1/pppp1ppp/1b11pn11/71/11PP1111/11111NP1/PP11PPBP/RNBQ1RK1 w KQkq c6 0 2" 
                                                                            111111 
"rnbq1rk1/pppp1ppp/1b11pn11/611/11PP1111/11111NP1/PP11PPBP/RNBQ1RK1 w KQkq c6 0 2" 
                                                                            111111 
"rnbq1rk1/pppp1ppp/1b11pn11/511/11PP1111/11111NP1/PP11PPBP/RNBQ1RK1 w KQkq c6 0 2" 
                                                                              1111 
       "rnbq1rk1/pppp1ppp/1b11pn11/44/11PP4/41NP1/PP11PPBP/RNBQ1RK1 w KQkq c6 0 2" 
                                                                               111 
   "rnbq1rk1/pppp1ppp/1b11pn11/3311/11PP31/311NP1/PP11PPBP/RNBQ1RK1 w KQkq c6 0 2" 
                                                                                11 
       "rnbq1rk1/pppp1ppp/1b2pn2/2222/2PP22/221NP1/PP2PPBP/RNBQ1RK1 w KQkq c6 0 2"

As you can see, it does the replacements but one at a time and for the next pattern-replacement pair it starts from the original string, it does not accumulate them in the sequence that I've specified in the pattern - replace vectors.

I´ve tried the strategies described here and here, but they also didn't work. As it mention in the last link, I'm trying to avoid at all cost to loop over gsubs to get the job done, as it seems quite inefficient.

Any thoughts on how to proceed?

Thanks!

1

There are 1 answers

1
r2evans On BEST ANSWER

The issue with mapply is that it is looking at a fresh copy of the FEN string for each replacement, which is not what you need. I think you can use a Reduce mindset:

(BTW, your pattern for "5" has 6 ones, this fixed that.)

pattern <- c("11111111","1111111","111111","11111","1111","111","11")
Reduce(function(txt, ptn) gsub(ptn, as.character(nchar(ptn)), txt), pattern, init=FENCodeToBeChanged)
# [1] "rnbq1rk1/pppp1ppp/1b2pn2/8/2PP4/5NP1/PP2PPBP/RNBQ1RK1 w KQkq c6 0 2"

To be able to reduce over multiple arguments takes a little bit of work, usually iterating along a list of pairs or such. With this problem, it's easy enough to replace a pattern with its length instead of including another vector of strings, ergo nchar(ptn). (Technically as.character(.) is not required as gsub will implicitly convert it, but I wanted to be a bit "declarative" in that that's what I want. There are many tools in R that are less deterministic in this way (e.g., ifelse). Style.)