Given the following sample data frame:
Question <- c("Q1", "Q1", "Q1","Q1","Q2", "Q2", "Q2","Q2")
Answer <- c("I like to be creative when I cook with crock pots.","I like to be creative when I cook with crock pots.",
"I like to be creative when I cook with crock pots.","I like to be unique when I cook with a skillet.",
"I like to be creative when I cook with crock pots.","I like to be unique when I cook with a skillet.",
"I like to be unique when I cook with a skillet.","I like to be unique when I cook with a skillet.")
QAID <- c("Q11", "Q12", "Q13","Q14","Q21", "Q22", "Q23","Q24")
v <- data.frame(Question, Answer, QAID)
Given the following code:
library(dplyr)
library(udpipe)
#Download your own instance of the english model to call here
udmodel_english <- udpipe_load_model(file = "english-ewt-ud-2.4-190531.udpipe")
t <- udpipe_annotate(udmodel_english, v$Answer, doc_id = paste0(v$QAID,'~',v$Question))
x <- data.frame(t)
x <- x %>%
mutate(Question = sub(".*~", "", doc_id),
ID = sub("~.*", "", doc_id))
stats <- keywords_rake(x = x, term = "lemma", group = "Question",
relevant = x$upos %in% c("NOUN", "ADJ"))
x$term <- txt_recode_ngram(x$lemma, compound = stats$keyword, ngram = stats$ngram)
x$term <- ifelse(!x$term %in% stats$keyword, NA, x$term)
x <- x %>%
left_join(stats, by = c("term" = "keyword")) %>%
filter(!is.na(term))
I would expect the following output:
I would expect this output as I am trying to group the RAKE output by the question, not across both questions:
keywords_rake(x = x, term = "lemma", group = "Question",
relevant = x$upos %in% c("NOUN", "ADJ"))
However, my output looks like this:
Even though the keyword Crock Pot is used only once within the group Q2, and 3 times within the group Q1, I get the same rake score, and a freq of 4.
Checking the notes for the group
argument within the keywords_rake
function turns up the following:
a character vector with 1 or several columns from x which indicates for example a document id or a sentence id. Keywords will be computed within this group in order not to find keywords across sentences or documents for example.
My Question:
Am I using the group
argument incorrectly? How should I use the RAKE algorithm to get a rake score for a keyword within a single question, not across all questions? I know I could loop through questions, but before I add that overhead, I want to check to see if there is a built in way to handle this. Am I thinking about this function incorrectly?