How to create a function to get summary statistics as columns?

219 views Asked by At

I have three workflows to get Mean, Standard Deviation, and Variance. Would it be possible to simplify this by creating one function with one table with all the summaries as the result?

Mean

iris %>% 
  select(-Species) %>% 
  summarise_all( , mean, na.rm = TRUE) %>% 
  t() %>% 
  as.data.frame() %>% 
  rownames_to_column("Name") %>% 
  rename(Mean = V1)

Standard Deviation

iris %>% 
  select(-Species) %>% 
  summarise_all(., sd, na.rm = TRUE) %>% 
  t() %>% 
  as.data.frame() %>% 
  rownames_to_column("Name") %>% 
  rename(SD = V1)

Variance

iris %>% 
  select(-Species) %>% 
  summarise_all(., var, na.rm = TRUE) %>% 
  t() %>% 
  as.data.frame() %>% 
  rownames_to_column("Name") %>% 
  rename(Variance = V1)
2

There are 2 answers

0
akrun On BEST ANSWER

We could reshape to 'long' format and then do a group by operation to create the three summarise columns

library(dplyr)
library(tidyr)
iris %>% 
   select(where(is.numeric)) %>% 
   pivot_longer(cols = everything(), names_to = "Name") %>% 
   group_by(Name) %>% 
   summarise(Mean = mean(value, na.rm = TRUE),
            SD = sd(value, na.rm = TRUE), 
            Variance = var(value, na.rm = TRUE))

-output

# A tibble: 4 × 4
  Name          Mean    SD Variance
  <chr>        <dbl> <dbl>    <dbl>
1 Petal.Length  3.76 1.77     3.12 
2 Petal.Width   1.20 0.762    0.581
3 Sepal.Length  5.84 0.828    0.686
4 Sepal.Width   3.06 0.436    0.190
0
Onyambu On
iris %>% 
  select(-Species) %>% 
  summarise_all(list(mean = mean,sd = sd, var = var), na.rm = TRUE)%>%
  pivot_longer(everything(), names_sep = '_', names_to = c('Name','.value'))

# A tibble: 4 x 4
  Name          mean    sd   var
  <chr>        <dbl> <dbl> <dbl>
1 Sepal.Length  5.84 0.828 0.686
2 Sepal.Width   3.06 0.436 0.190
3 Petal.Length  3.76 1.77  3.12 
4 Petal.Width   1.20 0.762 0.581