I have a file of transactional data (trips between locations) that I am looking to summarise using R, which I am relatively new to. Sample Data
Start.Date Start.Area End.Date End.Area
2007-07-12 14:00 New Street 2007-07-12 15:46 Windy Lane
2007-07-12 15:10 High Street 2007-07-12 18:08 New Street
2007-07-12 16:42 Kings Street 2007-07-12 17:47 Windy Lane
My aim is to return the occurances for each day (potentially hour) for an area.
Sample Return, in a new data.frame, would be
Date Area Start.Occurances End.Occurances
2007-07-12 New Street 1 1
2007-07-12 High Street 1 0
2007-07-12 Kings Street 1 0
2007-07-12 Windy Lane 0 2
Ideally I would have carried out the analysis in Excel but it can't cope with the scale of my data. In a spreadsheet I would use countif functions to see how many times the area appears in a given date/time.
I also hope to incorporate days when both Start.Occurances and End.Occurances are zero if possible.
The questions I have seen already regarding Countif functions or Match/Index combination haven't addressed my query so I was hoping somebody out there could help me!
This can be done by first reshaping and then summarising. Here is an example using
reshape2
anddplyr
(data isdat
).Another way: stack 'Start.Date' and 'Start.Area' columns on top of the corresponding 'End' columns, renaming the columns to 'Date' and 'Area' with a new column 'Pos' that specifies if it is a 'Start' or 'End'. Then it is easy to summarise by grouping Area, Date, or both.