Subgroup summation with missing value in R

58 views Asked by At

I have am trying to calculate simple Herfindahl index by market Share(1)^2+market Share (2)^2 (ignoring the NA but using it to calculate the market share). I tried several ways but is still included in my final HI calculation.

My data

df = data.frame(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 4L, 4L, 4L), 
            ID_name = c("AA", "AA", "", "BB", "BB", "", "", "DD", "DD"), 
            Volume = c(10L, 20L, 30L, 50L, 50L, 40L, 20L, 30L, 10L))

I tried

    df%>%
  mutate(Hasparent_org_id = ifelse(is.na(ID_name), 0, 1)) %>%
  group_by(ID) %>%
  summarise(sum_TRx = sum(Volume),HHI =sum(((Volume/sum(Volume))^2)*Hasparent_org_id))
  • And i get this

    ID MKT_Vol       HHI   (int)   (int)     (dbl) 1     1      60 0.3888889 2     2     140 0.3367347 3     4      60 0.3888889
    

But i want to get this

bf = data.frame(ID = c(1L, 2L,  4L), 
            Volume = c(60L, 140L, 60L),
            HHI = c(0.14,0.25, 0.26 ))

Basically, include volume corresponding the NA entries to calculate the market share, but not include it in HI calculations.

1

There are 1 answers

1
Mark On BEST ANSWER

The issue is in your is.na check. There are no NA values in Hasparent_org_id - you have some empty strings "" instead.

df%>%
  mutate(Hasparent_org_id = ifelse(ID_name=="", 0, 1)) %>%
  group_by(ID) %>%
  summarise(sum_TRx = sum(Volume),HHI =sum(((Volume/sum(Volume))^2)*Hasparent_org_id))

This change in the check seems to address your issue.