transposing data and sequence mining most common patterns in rows

31 views Asked by At

I have a data frame that looks like this:

              SFOpID Number MAGroupID
1 0032A00002cgs3XQAQ      1        99
2 0032A00002cgs3XQAQ      1        79
3 003F000001vyUGKIA2      2         8
4 0032A00002btWE6QAM      3        97
5 0032A00002btWE6QAM      3        86
6 0032A00002btWE6QAM      3        35

I need to transpose it so that it looks like this:

              SFOpID Number MAGroupID
1 0032A00002cgs3XQAQ      1        99  79
3 003F000001vyUGKIA2      2         8

Then generate counts for the five most common sequences for example: 12 people (SFOpID) have the 97 86 35 sequence, but only 4 people have the 99 79 sequence. I think this may be possible with the arules package doing something like the following:

x <- read_baskets(con  = system.file("misc", "zaki.txt", package = 
                                 "arulesSequences"),
      info = c("sequenceID","eventID","SIZE"))
      as(x, "data.frame")

The goal is to have output that looks like this:

       items sequenceID eventID SIZE
 1      {C,D}          1      10    2
 2    {A,B,C}          1      15    3
 3    {A,B,F}          1      20    3
 4  {A,C,D,F}          1      25    4
 5    {A,B,F}          2      15    3

Just, for items, it would be a sequence like {99, 79} or {97, 86, 35}

1

There are 1 answers

1
Nar On

You can use group_by and next to collect values into one list. The list could be converted to text. Here is an example:

 code <- read.csv("code.csv", stringsAsFactors = F)
  library(dplyr)
  output <- code[, 2:4]%>%
    group_by(Number, MAGroupID) %>%
    nest()
  output$data <- as.character(output$data )