Order data Preprocessing before Sequential Pattern Mining

72 views Asked by At

I have 2 questions about Sequential Pattern Mining.

df <- data.frame(member_no = c('1','1','1','2','3','4','5','4','3','2','3','1','2','2','4'),
                 year_month = c('2020_Apr','2021_Mar','2021_Mar','2022_Jan','2023_May','2022_Dec','2019_Nov','2022_Feb','2021_Aug','2021_Aug','2020_Jan','2021_Mar','2021_Dec','2021_Jul','2023_Apr'),
                 product = c('A','B','B','B','C','C','A','B','B','B','A','B','B','B','C'))


dataset <- df |> 
  select(member_no, year_month, product) |> 
  group_by(member_no, year_month) |> 
  summarize(itemset = paste(as.character(product), collapse = ','))


write.table(dataset, 'data.txt', sep = ',', quote = F, row.names = F, col.names = F)

transaction <- read_baskets('data.txt', sep = ',', info = c('sequenceID', 'eventID'))
inspect(transaction)

freq.s <- cspade(transaction, parameter = list(support = 0.001), 
                 control = list(verbose = T))
inspect(head(freq.s, 1000, by = 'support'))

df1 <- freq.s

# output all results
df1 <- as(df1, "data.frame") %>% as_tibble()
df1$pattern <- (str_count(df1$sequence, ",") + 1)
df1 <- df1[order(-df1$support),] # descending

Q1: View(dataset) shows the member_no 1 has 3 itemsets in one single month. Do I only count it only once as the sequence <{A}, {B}> or <{A}, {B,B,B}>

Q2: Why would sum(df1$support) more than 1?

Thank you guys!

0

There are 0 answers