How can I build a custom function to get the frequencies of one factor depending on another factor?

104 views Asked by At

I have a dataset with tons of factors and I want to get the relative frequencies of each factor based on another factor. For example, let's use mtcars:

mtcars$am <- as.factor(mtcars$am)
mtcars$cyl <- as.factor(mtcars$cyl)

I want to get the frequencies where am == 1, based on the values of cyl. In this case, I should get three relative frequencies because cyl has three levels (4, 6, and 8). I have this code working:

mtcars %>%
  select(am, cyl) %>%
  table(.) %>% 
  prop.table(., 1) %>% 
  round(., digits = 2) %>% 
  data.frame() %>% 
  filter(am == 1) %>% 
  t() %>% 
  data.frame() %>% 
  slice(3)

# # A tibble: 1 x 3
#       X1     X2     X3
#   <fctr> <fctr> <fctr>
# 1   0.62   0.23   0.15

If you run it, you'll get the three frequencies above. Of course, I built this code so I know that X1 corresponds to the frequency where cyl == 4, X2 is cyl == 6, and X3 is cyl == 8.

Now, I want to do this with tons of factors (other binary factors like am). So, I want to build a custom function, bind all the frequencies later as rows, and create a nice table with these frequencies. Right now, I have this:

pull_freq <- function(mydata, var1, var2){      
 require(tidyverse)   
  var1 <- enquo(var1)
  var2 <- enquo(var2)
  mydata %>%
    select(!!var1, !!var2) %>%
    table(.) %>% 
    prop.table(., 1) %>% 
    round(., digits = 2) %>% 
    data.frame() %>% 
    filter(!!var1 == 1) %>% 
    t() %>% 
    data.frame() %>% 
    slice(3)
}

pull_freq(mtcars, am, cyl)

# A tibble: 1 x 0

But as you can see, when I run this function, I don't get any output. Any ideas of why I don't get any output? How can I get this function to work? Thank you!

3

There are 3 answers

2
CPak On BEST ANSWER

custom function

myfun <- function(df, col1, col2, col3) {
            require(dplyr)
            require(tidyr)
            col1 <- enquo(col1)
            col2 <- enquo(col2)
            df %>% 
              count(!!col1, !!col2) %>% 
              group_by(!!col1) %>%
              mutate(tot = sum(n)) %>%
              ungroup() %>%
              group_by(!!col2) %>% 
              mutate(n = n / tot) %>%
              select(-tot) %>% 
              filter(UQ(col1)==1) %>%
              spread_(col3, "n") %>%
              round(., digits=2)
        }

Output

myfun(mtcars, am, cyl, "cyl")

# am    `4`   `6`   `8`
#  1  0.62  0.23  0.15
3
Rui Barradas On

Maybe I'm completely off, but is this it?

data(mtcars)

agg <- aggregate(mtcars$cyl, list(mtcars$cyl, mtcars$am), FUN = length)
names(agg) <- c("cyl", "am", "count")

agg$freq <- ave(agg$count, agg$am, FUN = function(x) x/sum(x))
agg <- t(agg[-3])
agg

Note that I have not coerced cyl and am to factors with as.factor. This is because when the data frame would be transposed, the result would be a matrix. And since matrices can only have elements of one class, all the values would become of class character. The freq values would no longer be numeric.

0
TheRimalaya On

How about this,

library(tidyverse)
getFreq <- function(data, group_var, value_var) {
    data %>%
        group_by_(group_var) %>%
        do({
            table(.[[value_var]]) %>%
                prop.table() %>%
                as_tibble()
        }) %>%
        spread(Var1, n)
}

getFreq(mtcars, "am", "cyl") %>% print()

You can do all filtering afterwards or just include inside the function.