R Data transformation: dataframe to matrix filled with 0 and 1

133 views Asked by At

I am looking to transform a long dataformat to a wide one, but the values of the "signatories" column are not exactly the same as the new column names should be.

I have a dataframe with bill numbers and the names of their signatories (senators). If a senator has signed a bill, their is a row in the dataframe with the bill number and the name of the senator in questions. All the senators whose names are not listed next to the bill number have not signed the bill.

I want to construct a matrix with rows as bill numbers and columns as senator names, filled with 1s and 0s only. 1 indicates that a given senator has signed a bill, 0 indicates they have not.

ex <- data.frame(billno = c(715851, 715851, 715851,715852, 715852, 715852, 715852, 715852, 715852), signatories = c("Ben", "Lisa", "Roger", "Louise", "Macy", "John", "Jake", "James", "Ben"))

Senatornames <- c("Ben", "Lisa", "Roger", "Louise", "Macy", "John", "Jake", "James", "Julian", "Ayn")

#current output is this 
#  billno signatories
#1 715851         Ben
#2 715851        Lisa
#3 715851       Roger
#4 715852      Louise
#5 715852        Macy
#6 715852        John
#7 715852        Jake
#8 715852       James
#9 715852         Ben

#I want something like this. How do I arrive at this result? 
#  billno Ben   Lisa    Roger   Louise   Macy   John   Jake   James    Ayn
#1 715851  1      1       1       0        0      0       0      0      0
#2 715852  1      0       0       1        1      1       1      1      0

Thanks so much!!

2

There are 2 answers

0
Duck On BEST ANSWER

Next code can bring you close to what you want. I have used the vector you provided in order to create a full_join() so that all values can appear. Here the code:

library(tidyverse)
#Dataframe
df <- data.frame(signatories=Senatornames,stringsAsFactors = F)
#Code
ex1 <- ex %>% mutate(Value=1) %>%
  full_join(df) %>%
  fill(billno) %>%
  pivot_wider(names_from = signatories,values_from=Value) %>%
  replace(is.na(.),0)

Output:

# A tibble: 2 x 11
  billno   Ben  Lisa Roger Louise  Macy  John  Jake James Julian   Ayn
   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
1 715851     1     1     1      0     0     0     0     0      0     0
2 715852     1     0     0      1     1     1     1     1      0     0
0
Allan Cameron On

You could just do:

table(ex)
#>        signatories
#> billno   Ben Jake James John Lisa Louise Macy Roger
#>   715851   1    0     0    0    1      0    0     1
#>   715852   1    1     1    1    0      1    1     0

Or if you want to do it in tidyverse style and be left with a data frame, the whole thing can be done with:

tidyr::pivot_wider(as.data.frame(table(ex)), 
                   names_from  = "signatories", 
                   values_from = "Freq")
#> # A tibble: 2 x 9
#>   billno   Ben  Jake James  John  Lisa Louise  Macy Roger
#>   <fct>  <int> <int> <int> <int> <int>  <int> <int> <int>
#> 1 715851     1     0     0     0     1      0     0     1
#> 2 715852     1     1     1     1     0      1     1     0