How can I create a loop where i only take words that start with capital letters

47 views Asked by At

I have an excel sheet with a lot of rows and i want: to split the rows in a specific column by commas (this column describes ancestry and it has numbers and commas), then create a function where i only take words that start with capital letters. Then abstract these words and put them in a loop, so I can create a list of words that go together in a row that start with capital letters. After that i want to create a list where i can see the frequencies of each of these words.

I used the function str_extract_all(data$INITIAL SAMPLE DESCRIPTION, "\b[A-Z]\w*") |> unique() Where INITIAL SAMPLE DESCRIPTION is the name of the column of my interest.

1

There are 1 answers

2
Rui Barradas On

Something like this? Extract words with an initial capital letter followed by any alphabetic character zero or more times.
To table the result above, well, unlist and table it.

x <- 'I have an excel sheet with a lot of rows and i want: to split the rows in a specific column by commas (this column describes ancestry and it has numbers and commas), then create a function where i only take words that start with capital letters. Then abstract these words and put them in a loop, so I can create a list of words that go together in a row that start with capital letters. After that i want to create a list where i can see the frequencies of each of these words.
I used the function str_extract_all(data$INITIAL SAMPLE DESCRIPTION, "\\b[A-Z]\\w*") |> unique() Where INITIAL SAMPLE DESCRIPTION is the name of the column of my interest.
'
cap <- stringr::str_extract_all(x, "[A-Z][[:alpha:]]*")
cap
#> [[1]]
#>  [1] "I"           "Then"        "I"           "After"       "I"          
#>  [6] "INITIAL"     "SAMPLE"      "DESCRIPTION" "A"           "Z"          
#> [11] "Where"       "INITIAL"     "SAMPLE"      "DESCRIPTION"

cap |> unlist() |> table()
#> 
#>           A       After DESCRIPTION           I     INITIAL      SAMPLE 
#>           1           1           2           3           2           2 
#>        Then       Where           Z 
#>           1           1           1

Created on 2023-12-22 with reprex v2.0.2


Apply the code above to each column element.
Written as a function:

extract_cap_words <- function(x) {
  stringr::str_extract_all(x, "[A-Z][[:alpha:]]*")
}

lapply(data$`INITIAL SAMPLE DESCRIPTION`, extract_cap_words)