match columns and keep all duplicated elements in a data frame column [R]

2.3k views Asked by At

I have two data frames; DF1 has 3 columns and DF2 has one column.DF1 has all elements contained in DF2 but then most of them are duplicated as shown below.

DF1=
***freetext***,         ***specific***,    ***ICDcode*** 

Jaundice,hepatitisA,B,C   Hepatitis A       B15
Jaundice,hepatitisA,B,C   Hepatitis B       B16
Jaundice,hepatitisA,B,C   Hepatitis C       B17.1
Jaundice,hepatitisA,B,C   Jaundice          R17
lobar Pneumonia           Lobar pneumonia   J18.1
Lobar Pneumonia ,scabies  Lobar pneumonia   J18.1
scabiess                  scabies            G10     

DF2=          
Jaundice,hepatitisA,B,C  
scabiess                            
Lobar Pneumonia ,scabies
lobar Pneumonia

I wish to have a match between the two data frames such that whenever a match occurs there should be a resultant data frame taking the form of a DF1.for example Jaundice,hepatitisA,B,C should appear 4 times instead of appearing once in a column. In other words duplicates should be maintained as shown below ;

Resultant data frame should appear like this.

    column1                  column2             column3
Jaundice,hepatitisA,B,C   Hepatitis A       B15
Jaundice,hepatitisA,B,C   Hepatitis B       B16
Jaundice,hepatitisA,B,C   Hepatitis C       B17.1
Jaundice,hepatitisA,B,C   Jaundice          R17

so,how am i supposed to loop through DF2 to find a match in DF1(first column) and then produce a data frame of matches with all other corresponding rows as shown above?

here is my script but it doesn't seem to produce my desired results

   newMatches<- data.frame()
for(i 1:nrow(DF1){ for(j in 1:nrow(DF2[,1]{grep(j, i, ignore.case=F, value=T)->newMatches}} 
#it doesn't produce other columns of DF1

any help and or suggestion may be very much appreciated.am slightly novice in R

1

There are 1 answers

7
konvas On

As far as I understand, you want to filter the rows of DF1, keeping only the ones for which the first column exists in DF2. Is that right? The easiest way to achieve this would be

DF1[DF1[, 1] %in% DF2[, 1], ]

Edit

Here is the full code to reproduce the example:

DF1 <- structure(list(
    freetext = structure(c(1L, 1L, 1L, 1L, 2L, 3L, 4L),
        .Label = c("Jaundice,hepatitisA,B,C", "lobar Pneumonia", 
        "Lobar Pneumonia ,scabies", "scabiess"), class = "factor"),
    specific = structure(c(1L, 2L, 3L, 4L, 5L, 5L, 6L),  
        .Label = c("Hepatitis A", "Hepatitis B", "Hepatitis C", "Jaundice",
        "Lobar pneumonia", "scabies"), class = "factor"), 
    ICDcode = structure(c(1L, 2L, 3L, 6L, 5L, 5L, 4L), 
        .Label = c("B15", "B16", "B17.1", "G10", "J18.1", "R17"),
        class = "factor")),
   .Names = c("freetext", "specific", "ICDcode"),
   row.names = c(NA, -7L), class = "data.frame")

DF2 <- structure(list(
    freetext = structure(c(1L, 4L, 3L, 2L), 
        .Label = c("Jaundice,hepatitisA,B,C", 
         "lobar Pneumonia", "Lobar Pneumonia ,scabies", "scabiess"),
         class = "factor")),
    .Names = "freetext", row.names = c(NA, -4L), class = "data.frame")

result <- DF1[DF1[, 1] %in% DF2[, 1], ]

Printing result gives the following output

                  freetext        specific ICDcode
1  Jaundice,hepatitisA,B,C     Hepatitis A     B15
2  Jaundice,hepatitisA,B,C     Hepatitis B     B16
3  Jaundice,hepatitisA,B,C     Hepatitis C   B17.1
4  Jaundice,hepatitisA,B,C        Jaundice     R17
5          lobar Pneumonia Lobar pneumonia   J18.1
6 Lobar Pneumonia ,scabies Lobar pneumonia   J18.1
7                 scabiess         scabies     G10