Creating dyad-pair averages in R

Question

Creating dyad-pair averages in R

54 views Asked by SK5123 At 09 November 2023 at 10:06

I want to create a pair-wise average of price of commodities produced by countries. My data looks like this

df <- data.frame(country    = c("US; UK; FI", "CN; IT; US; GR", "UK; US"),
                 product_id = c(1, 2, 3),
                 price      = c(300, 500, 200))

I want to transform the data to create average of price between dyads of two countries. Something like this:

Ctr_1 Ctr_2 Avg_Price
US    UK    250
US    FI    300
US    CN    500
US    IT    500
UK    FI    300
UK    US    250
CN    IT    500
CN    US    500
CN    GR    500
IT    CN    500
IT    US    500
IT    GR    500
GR    CN    500
GR    IT    500
GR    US    500

I tried changing the data to long form.

library(data.table)

setDT(df)

df1 <- df[, .(country = unlist(strsplit(country, "; "))), by = .(product_id)]

But didn't know how to proceed from here. Any help would be really appreciated. In fact, there is a year variable as well, and the idea is to aggregate pair-wise per year to create a panel dataset.

Original Q&A

There are 2 answers

jblood94 On 09 November 2023 at 12:53

From long format, a non-equi join on country will give the pairs for each product_id. However, non-equi joins don't work with character columns, so we first get a country index. After the join, get the average price with a grouping operation:

df1 <- df[
  ,.(country = unlist(strsplit(country, "; ")), price = price),
  by = .(product_id)
]

df1[,ctr_id := match(country, unique(country))][
  df1,
  on = .(product_id = product_id, ctr_id > ctr_id),
  .(Ctr_1 = i.country, Ctr_2 = x.country, price = price),
  nomatch = 0
][,.(Avg_Price = mean(price)), .(Ctr_1, Ctr_2)]
#>    Ctr_1 Ctr_2 Avg_Price
#> 1:    US    UK       250
#> 2:    US    FI       300
#> 3:    UK    FI       300
#> 4:    CN    IT       500
#> 5:    CN    GR       500
#> 6:    IT    GR       500
#> 7:    US    CN       500
#> 8:    US    IT       500
#> 9:    US    GR       500

Alternatively, we can get the combinations while doing the strsplit:

library(RcppAlgos)

df[
  ,{
    m <- comboGeneral(sort(strsplit(country, "; ")[[1]]), 2)
    .(Ctr_1 = m[,1], Ctr_2 = m[,2], price = price)
  }, product_id
][,.(Avg_Price = mean(price)), .(Ctr_1, Ctr_2)]
#>    Ctr_1 Ctr_2 Avg_Price
#> 1:    FI    UK       300
#> 2:    FI    US       300
#> 3:    UK    US       250
#> 4:    CN    GR       500
#> 5:    CN    IT       500
#> 6:    CN    US       500
#> 7:    GR    IT       500
#> 8:    GR    US       500
#> 9:    IT    US       500

**mt1022** · Accepted Answer · 2023-11-09T13:07:13+00:00

df1 <- df[, .(country = strsplit(country, '; ')[[1]]), by = .(product_id, price)]

# join product_id and price of c1 (CJ for cross-join)
df2 <- df1[CJ(country, c2 = country),
           on = .(country), allow.cartesian = TRUE][country < c2]  # keep uniq pairs

# join product_id and price of c2, then get average
res <- df1[df2, on = .(country = c2, product_id), nomatch = 0][
  , .(avg_price = mean(price)), by = .(c1 = country, c2 = i.country)]

res
#    c1 c2 avg_price
# 1: GR CN       500
# 2: IT CN       500
# 3: US CN       500
# 4: UK FI       300
# 5: US FI       300
# 6: IT GR       500
# 7: US GR       500
# 8: US IT       500
# 9: US UK       250

TechQA.

Creating dyad-pair averages in R

There are 2 answers

Related Questions in R

Related Questions in PANEL-DATA

Related Questions in GEOSTATISTICS

Popular Questions

Popular Tags

Trending Questions