Trying this command:
library("spacyr")
library("dplyr", warn.conflicts = FALSE)
mytext <- data.frame(text = c("test text", "section 2 sending"),
id = c(32,41))
df2 <- tidyr::separate_rows(mytext, text)
df3 <- data.frame(text = df2$text, id = df2$id)
dflemma <- spacy_parse(structure(df3$text, names = df3$id),
lemma = TRUE, pos = FALSE) %>%
mutate(id = doc_id) %>%
group_by(id) %>%
summarize(body = paste(lemma, collapse = " "))
the expected output is the long to wide format using the same id and separate the merge text with a space. Here the expected output
data.frame(text = c("test text", "section 2 send"),
id = c(32,41)
However the command provide this error:
Error in process_document(x, multithread) : Docnames are duplicated.
You get this error because you separate each of your text phrases to words. You shouldn't do that. Consider the following code:
Output
Update
If you have to do the separation, then you need to further modify your
id
column to ensure that each observation in it is unique. Later you can change thoseid
s back at thegroup_by
stage. Consider the following code.