Is there a way to take a numeric variable, like the number 7, and turn it into a string of seven 1s?

91 views Asked by At

I'm wanting to run a Kaplan-Meier analysis on survival data but my survival variables run from 0-7 and KM only uses 0s and 1s. I'll provide some background in case it's helpful. This is seedling survival data. I planted 7 acorns per subplot (there are 3 treatment habitats represented in this data) and recorded survival multiple times over a 2.5 year period. Variables are: time, survival and habitat, the data below is a dataset similar to what my real data looks like.

surva<-c(5,0,3, 2,0,0, 0,1,2, 3,0,1)
time<-c(1,2,3, 1,2,3, 1,2,3, 1,2,3)
habitat<-c("grassa", "grassb", "grassc", "grassa", "grassb", "grassc", "grassa", "grassb", "grassc", "grassa", "grassb", "grassc")

enter image description here

I want to turn my survival metrics into 1s (so they represent the seedlings that are alive in each habitat subplot) but I will also need to create duplicates of the corresponding cells (time and habitat) to go with the survival data. I appreciate any insight.

2

There are 2 answers

2
PGSA On BEST ANSWER

I think maybe this is helpful:

library(tidyverse)
df |> 
  mutate(surva = lapply(surva, \(x) paste0(c(rep(1, times = x), rep(0, times = 7-x)), collapse = ""))) |>
  separate_longer_position(surva, width = 1)

gives:

   surva time habitat
1      1    1  grassa
2      1    1  grassa
3      1    1  grassa
4      1    1  grassa
5      1    1  grassa
6      0    1  grassa
7      0    1  grassa
8      0    2  grassb
9      0    2  grassb
10     0    2  grassb
11     0    2  grassb
12     0    2  grassb
13     0    2  grassb
14     0    2  grassb
15     1    3  grassc
16     1    3  grassc
17     1    3  grassc
18     0    3  grassc
19     0    3  grassc
20     0    3  grassc
21     0    3  grassc
22     1    1  grassa
23     1    1  grassa
24     0    1  grassa
25     0    1  grassa
26     0    1  grassa
27     0    1  grassa
28     0    1  grassa
29     0    2  grassb
30     0    2  grassb
31     0    2  grassb
32     0    2  grassb
33     0    2  grassb
34     0    2  grassb
35     0    2  grassb
36     0    3  grassc
37     0    3  grassc
38     0    3  grassc
39     0    3  grassc
40     0    3  grassc
41     0    3  grassc
42     0    3  grassc
43     0    1  grassa
44     0    1  grassa
45     0    1  grassa
46     0    1  grassa
47     0    1  grassa
48     0    1  grassa
49     0    1  grassa
50     1    2  grassb
51     0    2  grassb
52     0    2  grassb
53     0    2  grassb
54     0    2  grassb
55     0    2  grassb
56     0    2  grassb
57     1    3  grassc
58     1    3  grassc
59     0    3  grassc
60     0    3  grassc
61     0    3  grassc
62     0    3  grassc
63     0    3  grassc
64     1    1  grassa
65     1    1  grassa
66     1    1  grassa
67     0    1  grassa
68     0    1  grassa
69     0    1  grassa
70     0    1  grassa
71     0    2  grassb
72     0    2  grassb
73     0    2  grassb
74     0    2  grassb
75     0    2  grassb
76     0    2  grassb
77     0    2  grassb
78     1    3  grassc
79     0    3  grassc
80     0    3  grassc
81     0    3  grassc
82     0    3  grassc
83     0    3  grassc
84     0    3  grassc

Is that the sort of thing you were after?

0
Nir Graham On

seems like you want to uncount, i.e. have 5 records for every record labelled with a 5; apart from that you dont want 0 records from records labelled 0; so the answer is to divide (and conquer). first set aside the zero rows, then work on the positive rows, simply uncound them then recombine


in_1 <- data.frame(
  surva=c(5,0,3, 2,0,0, 0,1,2, 3,0,1),
  time=c(1,2,3, 1,2,3, 1,2,3, 1,2,3),
  habitat=c("grassa", "grassb", "grassc", "grassa", "grassb", "grassc", "grassa", "grassb", "grassc", "grassa", "grassb", "grassc")
)
library(tidyverse)

(recs_eq_0 <- filter(in_1,
                    surva==0))

(recs_gt_0_prep <- filter(in_1,
                      surva>0))

(recs_gt_0 <- uncount(recs_gt_0_prep,
        weights = surva) |> mutate(surva=1L))

(fin <- bind_rows(recs_eq_0,
          recs_gt_0) )