Counting data by unique factors in R

100 views Asked by At

I have started using R just recently and I came up with an issue I cannot find a fix. I want to assign a column in my data, the values of frequency for a set of factors by another column. My data looks like this:

 ID_GRI                                         LABEL        Diversity
1       1                                                   0         0
2       1                                  Paduri_de_conifere         0
3       1                                    Pajisti_NAturale         0
4       1                                    Pajisti_NAturale         0
5       1                                    Pajisti_NAturale         0
6       1                                    Pajisti_NAturale         0
7       1                                    Pajisti_NAturale         0
8       2                                                   0         0 
9       2                                  Paduri_de_conifere         0
10      2                                  Paduri_de_conifere         0
11      2                                  Paduri_de_conifere         0
12      2                                    Pajisti_NAturale         0 
13      2                                    Pajisti_NAturale         0
14      2                                    Pajisti_NAturale         0
15      2                                    Pajisti_NAturale         0
16      2                                    Pajisti_NAturale         0
17      2 Zone_de_tranzitie_cu_arbusti_(in_general_defrisate)         0
18      3                                                   0         0
19      3                                  Paduri_de_conifere         0
20      3                                    Pajisti_NAturale         0

The LABEL column is a factor variable, imported from excel with the fill=T clause, because I have cells that are empty. Now, I want to assign to the Diversity column the values for each unique type of LABEL corresponding to ID_GRI. It should look like this:

 ID_GRI                                         LABEL         Diversity
1       1                                                   0         2
2       1                                  Paduri_de_conifere         2
3       1                                    Pajisti_NAturale         2
4       1                                    Pajisti_NAturale         2
5       1                                    Pajisti_NAturale         2
6       1                                    Pajisti_NAturale         2
7       1                                    Pajisti_NAturale         2
8       2                                                   0         3
9       2                                  Paduri_de_conifere         3
10      2                                  Paduri_de_conifere         3
11      2                                  Paduri_de_conifere         3
12      2                                    Pajisti_NAturale         3
13      2                                    Pajisti_NAturale         3
14      2                                    Pajisti_NAturale         3
15      2                                    Pajisti_NAturale         3
16      2                                    Pajisti_NAturale         3
17      2 Zone_de_tranzitie_cu_arbusti_(in_general_defrisate)         3
18      3                                                   0         2
19      3                                  Paduri_de_conifere         2
20      3                                    Pajisti_NAturale         2

I have tried using sapply and data.table but it didn't work. Thanks in advance! :)

2

There are 2 answers

0
akrun On

Assuming that LABEL column is of class factor

df$Diversity <- with(df, as.numeric(ave(as.character(LABEL), 
            ID_GRI, FUN=function(x) length(unique(x[x!=0])))))
df$Diversity
#[1] 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 2 2 2

Or using data.table

library(data.table)
setDT(df)[, Diversity:=length(unique(LABEL[LABEL!=0])), by=ID_GRI]
2
Jealie On

A sapply one-liner:

> sapply(df$ID_GRI, function(x) length(unique(df$LABEL[df$ID_GRI==x]))-1)
[1] 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 2 2 2