I wish to extract main keywords from the column 'title', for each group (1st column).
Desired result in column 'desired title':
Reproducible data:
myData <-
structure(list(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3), title = c("mentoring aug 8th 2018",
"mentoring aug 9th 2017", "mentoring aug 9th 2018", "mentoring august 31",
"mentoring blue care", "mentoring cara casual", "mentoring CDP",
"mentoring cell douglas", "mentoring centurion", "mentoring CESO",
"mentoring charlotte", "medication safety focus", "medication safety focus month",
"medication safety for nurses 2017", "medication safety formulations errors",
"medication safety foundations care", "medication safety general",
"communication surgical safety", "communication tips", "communication tips for nurses",
"communication under fire", "communication webinar", "communication welling",
"communication wellness")), row.names = c(NA, -24L), class = c("tbl_df",
"tbl", "data.frame"))
I've looked into record linkage solutions, but that's mainly for grouping the full titles. Any suggestions would be great.
I concatenated all titles by group, and tokenized them:
Below is the resulting dataframe:
I'm happy with the result:
While applying the algorithm to my real data of about 100000 lines, I made a function to tackle the problem group by group: