I would like to summarize the following sample data into a new dataframe as follows:
Population, Sample Size (N), Percent Completed (%)
Sample Size is a count of all records for each population. I can do this using the table command or tapply. Percent completed is the percentage of records with 'End Date's (all records without 'End Date' are assumed to not complete. This is where I am lost!
Sample Data
sample <- structure(list(Population = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L), .Label = c("Glommen",
"Kaseberga", "Steninge"), class = "factor"), Start_Date = structure(c(16032,
16032, 16032, 16032, 16032, 16036, 16036, 16036, 16037, 16038,
16038, 16039, 16039, 16039, 16039, 16039, 16039, 16041, 16041,
16041, 16041, 16041, 16041, 16044, 16044, 16045, 16045, 16045,
16045, 16048, 16048, 16048, 16048, 16048, 16048), class = "Date"),
End_Date = structure(c(NA, 16037, NA, NA, 16036, 16043, 16040,
16041, 16042, 16042, 16042, 16043, 16043, 16043, 16043, 16043,
16043, 16045, 16045, 16045, 16045, 16045, NA, 16048, 16048,
16049, 16049, NA, NA, 16052, 16052, 16052, 16052, 16052,
16052), class = "Date")), .Names = c("Population", "Start_Date",
"End_Date"), row.names = c(NA, 35L), class = "data.frame")
You can do this with split/apply/combine:
One word of warning:
sample
is the name of a base function, so you should pick a different name for your data frame.