How to pull specific characters out of a string in R?

69 views Asked by At

I am trying to extract clock times from PGN chess notation. For example if I have the string:

"1. e4 {[%clk 0:00:59.5]} 1... b6 {[%clk 0:00:57.4]} 2. Nc3 {[%clk 0:00:59.4]} 2... Bb7 {[%clk 0:00:57.2]}"

How do I just get 59.5, 57.4 etc. ?

I am a beginner in R, I tried strsplit() with no luck

4

There are 4 answers

7
Dan On

Here's a way using stringr package:

library(stringr)
        
t <- "1. e4 {[%clk 0:00:59.5]} 1... b6 {[%clk 0:00:57.4]} 2. Nc3 {[%clk 0:00:59.4]} 2... Bb7 {[%clk 0:00:57.2]}"

result <- str_extract_all(t, "\\d:\\d\\d:\\d\\d.\\d", simplify = TRUE) %>% gsub("\\d:\\d\\d:", "", .)
    
result
     [,1]   [,2]   [,3]   [,4]  
[1,] "59.5" "57.4" "59.4" "57.2"
0
r2evans On

(I'll modify the data so that I can demonstrate extracting "seconds" with hours and minutes.)

We need to first extract the time-strings with something like

st <- "1. e4 {[%clk 0:00:59.5]} 1... b6 {[%clk 0:00:57.4]} 2. Nc3 {[%clk 0:00:59.4]} 2... Bb7 {[%clk 1:02:57.2]}"
st2 <- regmatches(st, gregexpr("(?<=clk )[0-9:.]+", st, perl = TRUE))
st2
# [[1]]
# [1] "0:00:59.5" "0:00:57.4" "0:00:59.4" "1:02:57.2"

and then we can use a helper function to convert that to "seconds":

time2num <- function(x) {
  vapply(strsplit(x, ':'), function(y) sum(as.numeric(y) * 60^((length(y)-1):0)),
         numeric(1), USE.NAMES=FALSE)
}
time2num(unlist(st2))
# [1]   59.5   57.4   59.4 3777.2

1*3600 + 2*60 + 57.2
# [1] 3777.2

As an alternative, I suspect you're also planning to extract the move itself. Here's the code that keeps them together in a frame:

unlist(regmatches(st, gregexpr(r"{\S+\s*\{\[%clk.*?]}", st))) |>
  strcapture(r"{(\S+)\s*\{\[.* ([0-9.:]+)}", x = _, list(move="", time="")) |>
  transform(time_num = time2num(time))
#   move      time time_num
# 1   e4 0:00:59.5     59.5
# 2   b6 0:00:57.4     57.4
# 3  Nc3 0:00:59.4     59.4
# 4  Bb7 1:02:57.2   3777.2
0
ThomasIsCoding On

Probably you can try regmatches like below

> s <- "1. e4 {[%clk 0:00:59.5]} 1... b6 {[%clk 0:00:57.4]} 2. Nc3 {[%clk 0:00:59.4]} 2... Bb7 {[%clk 0:00:57.2]}"

> as.numeric(regmatches(s, gregexpr("(?<=\\d{2}:).*?(?=\\])", s, perl = TRUE))[[1]])
[1] 59.5 57.4 59.4 57.2
0
Chris Ruehlemann On

Here's a tidyverse solution in one pipe:

tibble(st) %>%
  mutate(st = str_extract_all(st, "[\\d:.]{9}")) %>%
  unnest(st) %>%
  mutate(sec = sapply(strsplit(st, ":"), function(x) sum(c(3600,60,1) * as.numeric(x))))
# A tibble: 4 × 2
  st           sec
  <chr>      <dbl>
1 0:00:59.5   59.5
2 0:00:57.4   57.4
3 0:00:59.4   59.4
4 1:02:57.2 3777. 

Data (thanks to revans):

st <- "1. e4 {[%clk 0:00:59.5]} 1... b6 {[%clk 0:00:57.4]} 2. Nc3 {[%clk 0:00:59.4]} 2... Bb7 {[%clk 1:02:57.2]}"