Parsing with Periods in R

67 views Asked by At

I have a bit of code that I'm using to parse a character vector in R. It appears to be treating some observations differently than others; and, I can't figure out how to correct it. Here is the code as presently constituted:

superbowl$Receiver <- as.factor(ifelse(superbowl$Is.Pass == TRUE, ifelse(superbowl$Is.Complete == TRUE, gsub("(\\w+\\s)*pass\\s(\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+).*", "\\6\\7", superbowl$Detail), gsub("(\\w+\\s)*pass\\s(\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+).*", "\\7\\8", superbowl$Detail)), NA))

And, here is an except from the three vectors it references:

> dput(superbowl$Is.Pass[1:25])
c(FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, 
TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, 
FALSE, TRUE, FALSE, TRUE, FALSE, FALSE)
> dput(superbowl$Is.Complete[1:25])
c(NA, TRUE, NA, NA, FALSE, NA, FALSE, NA, NA, TRUE, FALSE, NA, 
TRUE, NA, TRUE, NA, NA, TRUE, NA, NA, NA, NA, FALSE, NA, NA)
> dput(superbowl$Detail[1:25])
c("Brandon McManus kicks off 57 yards returned by Tim Hightower for 31 yards (tackle by Brandon McManus)", 
"Drew Brees pass complete short left to Coby Fleener for 8 yards (tackle by Sylvester Williams)", 
"Mark Ingram right guard for 5 yards (tackle by Jared Crick)", 
"Mark Ingram right guard for 4 yards (tackle by Todd Davis)", 
"Drew Brees pass incomplete short right intended for Willie Snead (defended by Bradley Roby)", 
"Penalty on Andrus Peat: False Start 5 yards (no play)", "Drew Brees pass incomplete short left intended for Willie Snead (defended by Chris Harris)", 
"Thomas Morstead punts 34 yards fair catch by Jordan Norwood", 
"Devontae Booker left tackle for 6 yards (tackle by Craig Robertson and Dannell Ellerbe)", 
"Trevor Siemian pass complete short left to Demaryius Thomas for 14 yards (tackle by Vonn Bell)", 
"Trevor Siemian pass incomplete short right intended for A.J. Derby (defended by Kenny Vaccaro)", 
"Devontae Booker left guard for 3 yards (tackle by Tyeler Davison)", 
"Trevor Siemian pass complete short right to A.J. Derby for 10 yards (tackle by Vonn Bell)", 
"Kapri Bibbs right end for 2 yards (tackle by Cameron Jordan and Craig Robertson)", 
"Trevor Siemian pass complete deep right to Jordan Taylor for 18 yards (tackle by Vonn Bell)", 
"Devontae Booker right tackle for 2 yards (tackle by Paul Kruger)", 
"Timeout #1 by Denver Broncos", "Trevor Siemian pass complete short right to Demaryius Thomas for 8 yards (tackle by Delvin Breaux)", 
"NOR challenged the first down ruling and the play was overturned. Trevor Siemian pass complete short right to Demaryius Thomas for 7 yards (tackle by Delvin Breaux)", 
"Trevor Siemian left guard for 3 yards (tackle by Tyeler Davison)", 
"Trevor Siemian sacked by Nick Fairley for -5 yards", "Devontae Booker right end for 11 yards (tackle by Sterling Moore)", 
"Trevor Siemian pass incomplete short right intended for Jordan Taylor (defended by Jairus Byrd)", 
"DEN challenged the incomplete pass ruling and the play was overturned. Trevor Siemian pass complete short right to Jordan Taylor for 14 yards touchdown", 
"Brandon McManus kicks extra point good")

My results are:

> superbowl$Receiver <- as.factor(ifelse(superbowl$Is.Pass == TRUE, ifelse(superbowl$Is.Complete == TRUE, gsub("(\\w+\\s)*pass\\s(\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+).*", "\\6\\7", superbowl$Detail), gsub("(\\w+\\s)*pass\\s(\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+).*", "\\7\\8", superbowl$Detail)), NA))
> superbowl$Receiver[1:25]
 [1] <NA>                                                                                          
 [2]  Coby Fleener                                                                                 
 [3] <NA>                                                                                          
 [4] <NA>                                                                                          
 [5]  Willie Snead                                                                                 
 [6] <NA>                                                                                          
 [7]  Willie Snead                                                                                 
 [8] <NA>                                                                                          
 [9] <NA>                                                                                          
[10]  Demaryius Thomas                                                                             
[11] Trevor Siemian pass incomplete short right intended for A.J. Derby (defended by Kenny Vaccaro)
[12] <NA>                                                                                          
[13] Trevor Siemian pass complete short right to A.J. Derby for 10 yards (tackle by Vonn Bell)     
[14] <NA>                                                                                          
[15]  Jordan Taylor                                                                                
[16] <NA>                                                                                          
[17] <NA>                                                                                          
[18]  Demaryius Thomas                                                                             
[19] <NA>                                                                                          
[20] <NA>                                                                                          
[21] <NA>                                                                                          
[22] <NA>                                                                                          
[23]  Jordan Taylor                                                                                
[24] <NA>                                                                                          
[25] <NA>                                                                                          
21 Levels:  Andy Janovich ... Trevor Siemian pass incomplete short right intended for A.J. Derby (defended by Kenny Vaccaro)

In my looking at the entirety of this data set, it would appear that every time A.J. Derby is the intended target, R returns the entirety of superbowl$Detail rather than parsing it. Is this because of the initials he has for a first name? How could I get R to ignore periods, and merely identify words by spaces? Thanks for the help!

0

There are 0 answers