I am trying to transform a dataframe in a list of lists of dataframes

42 views Asked by At

My data looks as following: txt format

Time ; Source ; Action
YYYY-MM-DD ; Card Biology ; Reading succesful 
YYYY-MM-DD ; Student 1 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Student 2 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Card Mathematics; Reading succesful 
YYYY-MM-DD ; Student 2 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Card Biology ; Reading succesful 
YYYY-MM-DD ; Student 1 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Card Biology ; Reading succesful 
YYYY-MM-DD ; Student 1 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Card History ; Reading succesful 
YYYY-MM-DD ; Student 1 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Card Biology ; Reading succesful 
YYYY-MM-DD ; Student 1 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in

I would like to split this data first in a list of the cards everytime that column Action contains :"Reading succesful"

I want to then take the list test and split its elements in sublists when the column Action contains "initialised"

Ideally I would like to assigne the corrisponding student to the list but this is not extreamly important.

I split the list first with test<- split(i, cumsum(test$b$Text == "Reading succesful")).

I assigne the name of the lists with

for (a in test) {
  names(test)<-sapply(test, `[[`, 1,2)
}

I have tried looping through the list elements but have not been able to make it work.

I would like to have the follwoing configuration of data

test$
[[card xxx]]$
[[Student 1]], [[Strudent 2]], etc

[[card yyy]]
[[Student 1]], [[Strudent 2]], etc

Each student should be a matrix containing list of books and ideally a copy of the Time and Action for Card type and for itself

1

There are 1 answers

0
Andre Wildberg On

An approach could be to get the names first, save them as variables

library(dplyr)

df_nms <- df %>% 
  mutate(grp = cumsum(Action == "Reading succesful")) %>% 
  mutate(grp2 = cumsum(Action == "initialised"), 
         grp = paste(Source[1],grp), .by = grp) %>% 
  filter(grp2 != 0) %>% 
  mutate(grp2 = paste(Source[1]), .by = c(grp, grp2)) %>% 
  filter(Action != "initialised")

then do the splitting based on those

lapply(split(df_nms, df_nms$grp), \(x) {
  res <- split(x, x$grp2); lapply(res, \(y) y[,1:3])})

output

$`Card Biology 1`
$`Card Biology 1`$`Student 1`
        Time Source           Action
1 YYYY-MM-DD Book 1  first check out
2 YYYY-MM-DD Book 2 Second check out
3 YYYY-MM-DD Book 3         check in

$`Card Biology 1`$`Student 2`
        Time Source           Action
4 YYYY-MM-DD Book 1  first check out
5 YYYY-MM-DD Book 2 Second check out
6 YYYY-MM-DD Book 3         check in


$`Card Biology 3`
$`Card Biology 3`$`Student 1`
         Time Source           Action
10 YYYY-MM-DD Book 1  first check out
11 YYYY-MM-DD Book 2 Second check out
12 YYYY-MM-DD Book 3         check in
...

Data

% cat tmp.txt
Time ; Source ; Action
YYYY-MM-DD ; Card Biology ; Reading succesful 
YYYY-MM-DD ; Student 1 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Student 2 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Card Mathematics; Reading succesful 
YYYY-MM-DD ; Student 2 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Card Biology ; Reading succesful 
YYYY-MM-DD ; Student 1 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Card Biology ; Reading succesful 
YYYY-MM-DD ; Student 1 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Card History ; Reading succesful 
YYYY-MM-DD ; Student 1 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
YYYY-MM-DD ; Card Biology ; Reading succesful 
YYYY-MM-DD ; Student 1 ; initialised 
YYYY-MM-DD ; Book 1 ; first check out
YYYY-MM-DD ; Book 2 ; Second check out
YYYY-MM-DD ; Book 3 ; check in
df <- data.frame(lapply(read.csv('tmp.txt', sep=";", header=T), trimws))