I have a dataset (MN_Census) that has information for all census tracts for the following years: 1990, 2000, 2010, and 2020. The variable ID that identifies the census tract is "GISJOIN". My dataset looks like this:
| GISJOIN | GEOGYEAR | STATE | STATEA | COUNTY | COUNTYA | TRACTA | CL8AA | CL9AA | DATAYEAR |
|---|---|---|---|---|---|---|---|---|---|
| G2700010770100 | 2010 | Minnesota | 27 | Aitkin County | 1 | 770100 | 1954.45 | 0 | 1990 |
| G2700010790502 | 2010 | Minnesota | 27 | Aitkin County | 1 | 790502 | 2045.99 | 0 | 1990 |
| G2700010770100 | 2010 | Minnesota | 27 | Aitkin County | 1 | 770100 | 2361.32 | 0 | 2000 |
| G2700010790502 | 2010 | Minnesota | 27 | Aitkin County | 1 | 790502 | 2862.97 | 0 | 2000 |
| G2700010770100 | 2010 | Minnesota | 27 | Aitkin County | 1 | 770100 | 2327 | 0 | 2010 |
| G2700010790502 | 2010 | Minnesota | 27 | Aitkin County | 1 | 790502 | 3262 | 0 | 2010 |
| G2700010770100 | 2010 | Minnesota | 27 | Aitkin County | 1 | 770100 | 2244 | NA | 2020 |
| G2700010790502 | 2010 | Minnesota | 27 | Aitkin County | 1 | 790502 | 3120 | NA | 2020 |
I'm trying to use either join/merge/rbind to add foreach census tract, the years in between. This is, I want to add the following dataset foreach of the census tract:
| YearTransaction |
|---|
| 1990 |
| 1991 |
| 1992 |
| 1993 |
| 1994 |
| 1995 |
| 1996 |
| 1997 |
| 1998 |
| 1999 |
| 2000 |
| 2001 |
| 2002 |
| 2003 |
| 2004 |
So at the end, my desired dataset will have all years (from 1990 to 2020) foreach one of the census tract. I'm trying using "multidplyr" to make a partition by census tract ID (which is represented by the variable "GISJOIN"), and join each one of these partitions with the Years dataset. I'm using the following code:
library("parallel")
library("modeest")
library("multidplyr")
library("doParallel")
library("dtplyr")
detectCores()
cluster <- new_cluster(4)
base <- MN_Census %>% group_by(GISJOIN) %>% partition(cluster)
system.time(final <- dplyr::left_join(Years,base, by=c("YearTransaction"="DATAYEAR"),copy=TRUE) %>% collect())
final <- final %>% dplyr::ungroup()
However, I'm not getting the desired results. What I'm getting is a dataaset with all years (which is good), but these years are not duplicated foreach census tract (GISJOIN):

The desired dataset should look like this:
| DATAYEAR | GISJOIN | GEOGYEAR | STATE | STATEA | COUNTY | COUNTYA | TRACTA | CL8AA | CL9AA |
|---|---|---|---|---|---|---|---|---|---|
| 1990 | G2700010770100 | 2010 | Minnesota | 27 | Aitkin County | 1 | 770100 | 1954.45 | 0 |
| 1991 | G2700010770100 | ||||||||
| … | G2700010770100 | ||||||||
| 2000 | G2700010770100 | 2010 | Minnesota | 27 | Aitkin County | 1 | 770100 | 2361.32 | 0 |
| 2001 | G2700010770100 | ||||||||
| … | G2700010770100 | ||||||||
| 2010 | G2700010770100 | 2010 | Minnesota | 27 | Aitkin County | 1 | 770100 | 2327 | 0 |
| 2011 | G2700010770100 | ||||||||
| … | G2700010770100 | ||||||||
| 2020 | G2700010770100 | 2010 | Minnesota | 27 | Aitkin County | 1 | 770100 | 2244 | NA |
| 1990 | G2700010790502 | 2010 | Minnesota | 27 | Aitkin County | 1 | 790502 | 2045.99 | 0 |
| 1991 | G2700010790502 | ||||||||
| … | G2700010790502 | ||||||||
| 2000 | G2700010790502 | 2010 | Minnesota | 27 | Aitkin County | 1 | 790502 | 2862.97 | 0 |
| 2001 | G2700010790502 | ||||||||
| … | G2700010790502 | ||||||||
| 2010 | G2700010790502 | 2010 | Minnesota | 27 | Aitkin County | 1 | 790502 | 3262 | 0 |
| 2011 | G2700010790502 | ||||||||
| … | G2700010790502 | ||||||||
| 2020 | G2700010790502 | 2010 | Minnesota | 27 | Aitkin County | 1 | 790502 | 3120 | NA |
How can I fix this using multidplyr or any kind of parallel processing? (My original dataset is huge). Any help using dplyr is really appreciated it!
Update: