How can I build a custom function to get the frequencies of one factor depending on another factor?

125 views Asked by At

I have a dataset with tons of factors and I want to get the relative frequencies of each factor based on another factor. For example, let's use mtcars:

mtcars$am <- as.factor(mtcars$am)
mtcars$cyl <- as.factor(mtcars$cyl)

I want to get the frequencies where am == 1, based on the values of cyl. In this case, I should get three relative frequencies because cyl has three levels (4, 6, and 8). I have this code working:

mtcars %>%
  select(am, cyl) %>%
  table(.) %>% 
  prop.table(., 1) %>% 
  round(., digits = 2) %>% 
  data.frame() %>% 
  filter(am == 1) %>% 
  t() %>% 
  data.frame() %>% 

# # A tibble: 1 x 3
#       X1     X2     X3
#   <fctr> <fctr> <fctr>
# 1   0.62   0.23   0.15

If you run it, you'll get the three frequencies above. Of course, I built this code so I know that X1 corresponds to the frequency where cyl == 4, X2 is cyl == 6, and X3 is cyl == 8.

Now, I want to do this with tons of factors (other binary factors like am). So, I want to build a custom function, bind all the frequencies later as rows, and create a nice table with these frequencies. Right now, I have this:

pull_freq <- function(mydata, var1, var2){      
  var1 <- enquo(var1)
  var2 <- enquo(var2)
  mydata %>%
    select(!!var1, !!var2) %>%
    table(.) %>% 
    prop.table(., 1) %>% 
    round(., digits = 2) %>% 
    data.frame() %>% 
    filter(!!var1 == 1) %>% 
    t() %>% 
    data.frame() %>% 

pull_freq(mtcars, am, cyl)

# A tibble: 1 x 0

But as you can see, when I run this function, I don't get any output. Any ideas of why I don't get any output? How can I get this function to work? Thank you!


There are 3 answers


custom function

myfun <- function(df, col1, col2, col3) {
            col1 <- enquo(col1)
            col2 <- enquo(col2)
            df %>% 
              count(!!col1, !!col2) %>% 
              group_by(!!col1) %>%
              mutate(tot = sum(n)) %>%
              ungroup() %>%
              group_by(!!col2) %>% 
              mutate(n = n / tot) %>%
              select(-tot) %>% 
              filter(UQ(col1)==1) %>%
              spread_(col3, "n") %>%
              round(., digits=2)


myfun(mtcars, am, cyl, "cyl")

# am    `4`   `6`   `8`
#  1  0.62  0.23  0.15
Rui Barradas On

Maybe I'm completely off, but is this it?


agg <- aggregate(mtcars$cyl, list(mtcars$cyl, mtcars$am), FUN = length)
names(agg) <- c("cyl", "am", "count")

agg$freq <- ave(agg$count, agg$am, FUN = function(x) x/sum(x))
agg <- t(agg[-3])

Note that I have not coerced cyl and am to factors with as.factor. This is because when the data frame would be transposed, the result would be a matrix. And since matrices can only have elements of one class, all the values would become of class character. The freq values would no longer be numeric.

TheRimalaya On

How about this,

getFreq <- function(data, group_var, value_var) {
    data %>%
        group_by_(group_var) %>%
            table(.[[value_var]]) %>%
                prop.table() %>%
        }) %>%
        spread(Var1, n)

getFreq(mtcars, "am", "cyl") %>% print()

You can do all filtering afterwards or just include inside the function.