How to run the R RAKE function in udpipe across individual groups

268 views Asked by At

Given the following sample data frame:

Question <- c("Q1", "Q1", "Q1","Q1","Q2", "Q2", "Q2","Q2")
Answer <- c("I like to be creative when I cook with crock pots.","I like to be creative when I cook with crock pots.",
            "I like to be creative when I cook with crock pots.","I like to be unique when I cook with a skillet.",
            "I like to be creative when I cook with crock pots.","I like to be unique when I cook with a skillet.",
            "I like to be unique when I cook with a skillet.","I like to be unique when I cook with a skillet.")
QAID <- c("Q11", "Q12", "Q13","Q14","Q21", "Q22", "Q23","Q24")

v <- data.frame(Question, Answer, QAID)

Given the following code:

library(dplyr)
library(udpipe)

#Download your own instance of the english model to call here
udmodel_english <- udpipe_load_model(file = "english-ewt-ud-2.4-190531.udpipe")

t <- udpipe_annotate(udmodel_english, v$Answer, doc_id = paste0(v$QAID,'~',v$Question))
x <- data.frame(t)

x <- x %>%
  mutate(Question = sub(".*~", "", doc_id),
         ID = sub("~.*", "", doc_id))

stats <- keywords_rake(x = x, term = "lemma", group = "Question", 
                       relevant = x$upos %in% c("NOUN", "ADJ"))

x$term <- txt_recode_ngram(x$lemma, compound = stats$keyword, ngram = stats$ngram)
x$term <- ifelse(!x$term %in% stats$keyword, NA, x$term)

x <- x %>%
  left_join(stats, by = c("term" = "keyword")) %>%
  filter(!is.na(term))

I would expect the following output:

enter image description here

I would expect this output as I am trying to group the RAKE output by the question, not across both questions:

keywords_rake(x = x, term = "lemma", group = "Question", 
                       relevant = x$upos %in% c("NOUN", "ADJ"))

However, my output looks like this:

enter image description here

Even though the keyword Crock Pot is used only once within the group Q2, and 3 times within the group Q1, I get the same rake score, and a freq of 4.

Checking the notes for the group argument within the keywords_rake function turns up the following:

a character vector with 1 or several columns from x which indicates for example a document id or a sentence id. Keywords will be computed within this group in order not to find keywords across sentences or documents for example.

My Question:

Am I using the group argument incorrectly? How should I use the RAKE algorithm to get a rake score for a keyword within a single question, not across all questions? I know I could loop through questions, but before I add that overhead, I want to check to see if there is a built in way to handle this. Am I thinking about this function incorrectly?

0

There are 0 answers