R programming: Combining Two Data Frames

6.7k views Asked by At

Folks,

I would like to concatenate or merge if you will 2 data frames df1 and df2. My goal is as simple as making a new data frame whose columns is a union of those of df1 and df2.

Example

product=c("p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p3","p3","p3","p3","p3","p3","p3","p3","p4","p4","p4","p4","p4","p4","p4","p4")
skew=c("b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a")
version=c(0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2)
color=c("C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2")
price=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32)

df1 = data.frame(product, skew, version)
df2 = data.frame(product, skew, color, price)

My desire is to get the results as below.

I have tried a few options:

#option 1 with cbind
df <- cbind(df1,df2)

This returns a dataframe duplicated columns "product" and "skew".

# Option 2, use data.frame
df <- data.frame(df1,df2)

This gave me pretty much what I want, except that it had extra columns for "product" and "skew". They are suffixed with a ".1" though, so there is no duplicaton.

# option 3, use merge which seems to be the way to go
df <- merge(df1,df2) 

I think I am missing something with merge because this has actually created a union out of all the data set, making a total of 128 observations out of the 32 provided. I guess that's how merge works. I have run a "?merge" and tried a few options but could not get it to spit what I want.

So my question is:

What is the best way of getting my desired dataframe out of the df1 and df2 as above ?

Thx in advance for your help ! Riad.

     product skew  version color price
1       p1    b     0.1    C1     1
2       p1    b     0.1    C2     2
3       p1    b     0.2    C1     3
4       p1    b     0.2    C2     4
5       p1    a     0.1    C1     5
6       p1    a     0.1    C2     6
7       p1    a     0.2    C1     7
8       p1    a     0.2    C2     8
9       p2    b     0.1    C1     9
10      p2    b     0.1    C2    10
11      p2    b     0.2    C1    11
12      p2    b     0.2    C2    12
13      p2    a     0.1    C1    13
14      p2    a     0.1    C2    14
15      p2    a     0.2    C1    15
16      p2    a     0.2    C2    16
17      p3    b     0.1    C1    17
18      p3    b     0.1    C2    18
19      p3    b     0.2    C1    19
20      p3    b     0.2    C2    20
21      p3    a     0.1    C1    21
22      p3    a     0.1    C2    22
23      p3    a     0.2    C1    23
24      p3    a     0.2    C2    24
25      p4    b     0.1    C1    25
26      p4    b     0.1    C2    26
27      p4    b     0.2    C1    27
28      p4    b     0.2    C2    28
29      p4    a     0.1    C1    29
30      p4    a     0.1    C2    30
31      p4    a     0.2    C1    31
32      p4    a     0.2    C2    32
2

There are 2 answers

1
Jonas On BEST ANSWER

merge() does not work the way you want because your columns "product" and "skew" are no unique identifiers. The combinations occur multiple times. So merge() computes each possible combination. You can either include a third column as an id:

product=c("p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p3","p3","p3","p3","p3","p3","p3","p3","p4","p4","p4","p4","p4","p4","p4","p4")
skew=c("b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a")
version=c(0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2)
color=c("C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2")
price=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32)
id = 1:32

df1 = data.frame(product, skew, id, version)
df2 = data.frame(product, skew, id, color, price)
merge(df1, df2)

Or you merge your data.frames manually:

cbind(df1, df2[, 3:4])
1
Neal Fultz On

You can use union() but it will mess up the column names.

df_c <- union(df1, df2)
names(df_c) <- union(names(df1), names(df2))
df_c <- as.data.frame(df_c)