Visualizing word concordance throughout time (dplyr)

79 views Asked by At

I am trying to use word concordance to visualize how the mentions of a specific term (MoM) changes across time in my dataset. Specifically, I am interested in how the term mentions changes relative to a variable called "treatment_implementation" that is coded (i.e. a variable coded 0 for before policy and 1 for post-policy implementation).

To do this, I ran the following:

word_concordances <- kwic(toks, pattern = c("mom")) %>%  as.data.frame() %>%  dplyr::select(-to, -from, -pattern)# Keep distinct rows onlyword_concordances <-   distinct(word_concordances, post,.keep_all = TRUE)

which gives us the following df:

dput(word_concordances[1:10,c(1,2,3,4)])

data output:

structure(list(docname = c("25", "38", "98", "119", "119", "119", "119", "122", "125", "125"), pre = c("grabu find biz course ITE", "thanks people pop", "complain certain companies employees ceca", "Ministry Manpower", "arrested investigation ongoing Modus Operandi", "pass illegally detailed analyses months", "persons employers contravene EFMA matter", "must act act bit whole", "finally enforcement lazy", "thank LMW whatever name Without"), keyword = c("MOM", "mom", "MOM", "MOM", "MOM", "MOM", "MOM", "MOM", "MOM", "MOM"), post = c("making good money levy want", "hawkers hawkers parents discourage us", "proof work pass approved system", "mounted enforcement operation locations islandwide", "began investigations upon obtaining information", "uncovered potential syndicate suspected setting", "momfmmdmomgovsg information kept strictly confidential", "sacked acting blur donkey showing", "want thank LMW whatever name", "likely sleeping")), row.names = c(NA, 10L), class = "data.frame")

However, the "treatment_implementation"indicator is stored in the main df, here is a data example and I am not sure how to incorporate the variable for my word concordances?

dput(main_df[1:10,c(1,2,3,6)])

data output:

structure(list(id = 1:10, username = c("106gunner", "CPTMiller", "matey1982", "Why so serious", "Joe Maya", "Toomin", "wadtheEel", "Witch King", "106gunner", "roronoa_zoro"), post = c("Was reported in SCMP news source underneath link", "Government already said ft or CECA create new good jobs for Singaporean", "gunner said Was reported in SCMP news source underneath linkClick to expand arent u stating the obvious", "lightboxclose Close lightboxnext Next lightboxprevious Previous lightboxerror The requested content cannot be loaded Please try again later lightboxstartslideshow Start slideshow lightboxstopslideshow Stop slideshow lightboxfullscreen Full screen lightboxthumbnails Thumbnails lightboxdownload Download lightboxshare Share lightboxzoom Zoom lightboxnewwindow New window lightboxtogglesidebar Toggle sidebar", "From personal experience i lost my job to jhk", "ceca ftw", "edmw say yes but govt say no Who to believe", "I will welcome ceca if pap have ceca candidates in the Parliament", "matey said arent u stating the obvious Click to expand Surprised SCMP news also reported", "wadtheEel said edmw say yes but govt say no Who to believe Click to expand I believe the govt Every year we can only produce ish IT uni graduates Got lots of IT jobs opening not enough if only hire them Posted from PCWX using SMGN"), treatment_implementation = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
1

There are 1 answers

1
Ken Benoit On BEST ANSWER

In your example, there are no matching docnames, but for your full data I'm assuming this is not the case. So this should work:

# create a common variable, docname
main_df <- dplyr::mutate(main_df, docname = as.character(id))

# merge the treatment variable into the word concordances,
word_concordances <- 
    dplyr::left_join(word_concordances, 
                     dplyr::select(main_df, c("docname", "treatment_implementation")),
                     by = "docname")