I have a list of events with (x,y) coordinates, where x ranges from 1 to 100 and y from 1 to 86. There are (often many) duplicates of each coordinate. I want to populate a matrix (effectively a number grid) with the counts of each coordinate. How do I do this?
Right now, my best attempt is:
s=matrix(data=NA,nrow=n,ncol=k)
for(i in 1:n){
for(j in 1:k){
s[i,j]=nrow(subset(data,x_column==i & y_column==j))
}
}
This works for small (~10,000 rows) data frames, but I'd like to run it for a data frame with nearly 3 million rows, and my method is far too slow.
Edit (data):
n=86;k=100;
x_column y_column
54 30
51 32
65 34
19 46
51 27
45 60
62 31
64 45
16 69
31 33
Thanks guys!
Edit: well, it turns out the program was fast enough for my needs -- my workspace was just bogged down with tons of data, and it was slowing everything I tried to do down. So my method works, but it's good to know alternate ways of populating a matrix. I uploaded the first 10 rows; could someone do a speed test?
Here's one approach, using the data.table and Matrix packages:
If you can guarantee that you'll have at least some coordinates in each possible row and column, you can just use base R's
table()
(though it isn't nearly as fast):