Getting the median by date using dplyr's summarise() in R

27.5k views Asked by At

I have a data frame of integer-count observations listed by date and time interval. I want to find the median of these observations by date using the dplyr package. I've already formatted the date column correctly, and used group_by like so:

data.bydate <- group_by(data.raw, date)

When I use summarise() to find the median of each date group, all I'm getting are a bunch of zeroes. There are NA's in the data, so I've been stripping them with na.rm = TRUE.

data.median <- summarise(data.bydate, median = median(count, na.rm = TRUE)

Is there another way I should be doing this?

3

There are 3 answers

0
Max Candocia On

You can do something like,

data.raw %>% group_by(date) %>% summarise(median = median(count, na.rm = TRUE))
0
Andres Felipe Velez Gruezo On

example how I made this using dplyr

data.median<-data.bydate%>% summarise(median = median(count, na.rm = TRUE))

0
soni On

It's possible each group has too many zero values. Try to identify number of unique value in each group to check whether the groups have too many zeros in them. The below code could help to see the number of unique values and total values available for count variable in each group.

summarise(data.bydate, unique_code = n_distinct(count), total_count = n(count))