counting islands in R csv

167 views Asked by At

I would like to count islands along rows in a .csv. I say "islands" meaning consecutive non-blank entries on rows of the .csv. If there are three non-blank entries in a row, I would like that to be counted as 1 island. Anything less than three consecutive entries in a row counts as 1 "non-island". I would then like to write the output to a dataframe:

Name,,,,,,,,,,,,,
Michael,,,1,1,1,,,,,,,,
Peter,,,,1,1,,,,,,,,,
John,,,,,1,,,,,,,,,

Desired dataframe output:

Name,island,nonisland,
Michael,1,0,
Peter,0,1,
John,0,1,
1

There are 1 answers

3
Jota On BEST ANSWER

You could use rle like this;

output <- stack(sapply(apply(df, 1, rle), function(x) sum(x$lengths >= 3)))
names(output) <- c("island", "name")

output$nonisland <- 0
output$nonisland[output$island == 0] <- 1
#  island    name nonisland
#1      1 Michael         0
#2      0   Peter         1
#3      0    John         1

Here you run rle across the rows of your data frame. Then look through and add up occurrences when you find lengths of 3 or more.

Note that this solution assumes all islands are made up of the same thing (i.e. all 1's as in your example). If that is not the case, you would need to convert all the non-empty entries to be the same thing by doing something like this: df[!is.na(df)] <- 1 before rle will be appropriate.