Count of data by Sqldf

3.9k views Asked by At

My data looks like:

 ID   category
101          A
101          B
101          C
102          A
103          B
103          C

I would like the result like:

 ID   category Count 
101          A     3
101          B     3
101          C     3
102          A     1
103          B     2
103          C     2

I have tried something like:

data<-sqldf("select *,count(ID) as count from data group by ID")

It is showing the output as:

 ID   category  count
101          C      3
102          A      1
103          C      2
4

There are 4 answers

5
akrun On BEST ANSWER

An option using data.table

library(data.table)
setDT(df1)[, Count:=.N, ID]
#    ID category Count
#1: 101        A     3
#2: 101        B     3
#3: 101        C     3
#4: 102        A     1
#5: 103        B     2
#6: 103        C     2

Or using dplyr

library(dplyr)
df1 %>%
    group_by(ID) %>%
    mutate(Count=n())

Or using base R

 df1$Count <- with(df1, ave(seq_along(ID), ID, FUN=length))
0
G. Grothendieck On

For an all SQL solution left join the output of your code (except omit category) with the original data frame like this:

library(sqldf)
sqldf("select * from data
       left join (select ID, count(ID) as count from data group by ID)
       using (ID)")

giving:

   ID category count
1 101        A     3
2 101        B     3
3 101        C     3
4 102        A     1
5 103        B     2
6 103        C     2

Note: Here is the input data in reproducible form:

Lines <- "ID   category
101          A
101          B
101          C
102          A
103          B
103          C"
data <- read.table(text = Lines, header = TRUE)
0
Colonel Beauvel On

Base R approach:

transform(df, count=table(ID)[as.character(ID)])
#   ID category count
# 1 101        a     3
# 2 101        b     3
# 3 101        c     3
# 4 102        a     1
# 5 103        b     2
# 6 103        c     2
0
mpalanco On

Another two options using the base package:

1.

merge(df, table(df$ID), by.x = "ID", by.y = "Var1")

Output:

   ID category Freq
1 101        A    3
2 101        B    3
3 101        C    3
4 102        A    1
5 103        B    2
6 103        C    2

2.Very similiar to the one proposed by akrun

df$count <- ave(df$ID, df$ID, FUN=length)
df

Output:

   ID category count
1 101        A     3
2 101        B     3
3 101        C     3
4 102        A     1
5 103        B     2
6 103        C     2