dplyr semi_join Error: `x` and `y` must share the same src, set `copy` = TRUE (may be slow)

28.6k views Asked by At

I am using dplyr 1.0.6 and R 4.1.0 and I wrote 2 functions as follows:

AllCustomersList <- loadAllCustomersData()

CouldJoinByNationalID <- matchCustomersByNationalCode(AllCustomersList = AllCustomersList)

loadAllCustomersData() returns a list of two data frames, then the matchCustomersByNationalCode tries to execute a semi_join on those two data.frame as follows:

matchCustomersByNationalCode <- function(AllCustomersList) {
  
  FDCustomers <- AllCustomersList$FDCustomers
  Customers <- AllCustomersList$Customers
  
  semi_join(x = FDCustomers, y = Customers, by = c("NationalID" = "NationalCode"), na_matches = "never") %>% 
    pull(NationalID) %>% 
    return()
}

Actully this is just a wrapper for semi_join as matter of naming. But it throughs an error that says :

Error: x and y must share the same src, set copy = TRUE (may be slow).

Run rlang::last_error() to see where the error occurred.

Called from: signal_abort(cnd)

could anyone help with this?

2

There are 2 answers

1
Ali Sadeghi Aghili On BEST ANSWER

thanks to walter and Martin Gal I tried to make a reproducible example and it worked! So I checked the class of both data.frames and it says those are both data.frames. But I converted them again to data.frame inside the match function and it worked! it is still odd to me but problem solved!

0
GG-Delta On

In case you wish to resolve the above stated error message you can follow the approach referenced in the documentation (https://dplyr.tidyverse.org/reference/mutate-joins.html). That is, in case of operating with two distinct data frames as input for your envisioned join-function, you can simply include the "copy" argument and set it to "TRUE". Please see the mock example that assumes two data frames (d_a, d_b) each having two columns to be used for the join-operation. Note that the copy-argument is included and set to TRUE:

(d_a) %>% 
  left_join(d_b,
            by=c('T1_ID_LOC','TIME'),
            copy = TRUE,
            keep = NULL)