Why is my merge returning more rows than I expected?

Question

Why is my merge returning more rows than I expected?

55 views Asked by iGada At 22 March 2024 at 13:16

First, I'm very sorry for posting such a question. However, I'm a little bit confused about getting unexpected output. I want to merge two data frames.

x <- data.frame(id=c(1,1,1,2,2,2,3,3,3),
                vr1=c(1,2,3,4,5,6,7,8,9))
y <- data.frame(id=c(1,1,1,2,2,2,3,3,3),
                vr2=c(1,2,3,4,5,6,7,8,9))
z <- merge(x, y, by="id")

I'm expecting the output would be the following:

  id vr1 vr2
1  1   1   1
2  1   2   2
3  1   3   3
4  2   4   4
5  2   5   5
6  2   6   6
7  3   7   7
8  3   8   8
9  3   9   9

However, I got the following. What is my mistake?

   id vr1 vr2
1   1   1   1
2   1   1   2
3   1   1   3
4   1   2   1
5   1   2   2
6   1   2   3
7   1   3   1
8   1   3   2
9   1   3   3
10  2   4   4
11  2   4   5
12  2   4   6
13  2   5   4
14  2   5   5
15  2   5   6
16  2   6   4
17  2   6   5
18  2   6   6
19  3   7   7
20  3   7   8
21  3   7   9
22  3   8   7
23  3   8   8
24  3   8   9
25  3   9   7
26  3   9   8
27  3   9   9

Original Q&A

There are 3 answers

proffarnsworth On 22 March 2024 at 15:31

I think there is a little ambiguity in the question as it is posed. Do you only want to bind columns if the ID is the same and the values are different? Like if you had

x <- data.frame(id=c(1,1,1),
                vr1=c(1,1,1))
y <- data.frame(id=c(1,1,1),
                vr2=c(1,1,1)

Should it output

or

id vr1 vr2 
1   1   1

My solution is to just rename the column in y and then use cbind. Rename it back after.

colnames(y)[2] <- "vr1"
df <- cbind(x, y[,"vr1", drop = FALSE])
colnames(df)[3] <- "vr2"
df

> df
  id vr1 vr2
1  1   1   1
2  1   2   2
3  1   3   3
4  2   4   4
5  2   5   5
6  2   6   6
7  3   7   7
8  3   8   8
9  3   9   9

Elin On 22 March 2024 at 15:38

There are a few issues going on.

By default merge() only returns those rows that are in both. In this case you only have 1,2,3 as ID in the two data frames so only rows with those values are returned.
Each x is matched with a y with the same ID. So since you have e.g. 3 ID=1 rows in x that will be matched with the one row that has ID=1 in the y. (x is the left data frame, y is the right).

**G. Grothendieck** · Accepted Answer · 2024-03-22T13:32:26+00:00

Each id of 1 in x matches each id 1 in y so the merge generates 9 rows for id 1.

If there are the same number of rows as in the question and the id values correspond as in the question the cbind(x, y) will give the desired result shown.

cbind(x, y)
##   id vr1 id vr2
## 1  1   1  1   1
## 2  1   2  1   2
## 3  1   3  1   3
## 4  2   4  2   4
## 5  2   5  2   5
## 6  2   6  2   6
## 7  3   7  3   7
## 8  3   8  3   8
## 9  3   9  3   9

Another possibility is that you want to match the ith id 1 in x to just the ith id 1 in y. (For other examples you may need to use the all= argument of merge but for this example all values of that argument give the same result.) In this case

x2 <- transform(x, seq = ave(id, id, FUN = seq))
y2 <- transform(y, seq = ave(id, id, FUN = seq))

merge(x2, y2, by = c("id", "seq"))
##   id seq vr1 vr2
## 1  1   1   1   1
## 2  1   2   2   2
## 3  1   3   3   3
## 4  2   1   4   4
## 5  2   2   5   5
## 6  2   3   6   6
## 7  3   1   7   7
## 8  3   2   8   8
## 9  3   3   9   9

TechQA.

Why is my merge returning more rows than I expected?

There are 3 answers

Related Questions in R

Related Questions in DATAFRAME

Related Questions in MERGE

Related Questions in INNER-JOIN

Popular Questions

Trending Questions