I would like to create dummy variables that indicate which product version was in effect based on the date range, i.e. between the release date and the following version release date without doing it manually. I have a few hundred versions with their corresponding release dates and when the next version was released, and I will merge/join this data with a Purchases data frame.
Currently I have two data frames (Version and Purchases) that look like this:
View(Version)
Type Version Release_Date Next_Release
A 1.2.3 2013-11-14 2014-01-11
B 1.3.1 2014-01-11 2014-02-20
A 1.5.1 2014-02-20 2014-03-08
A 1.5.2 2014-03-08 2014-04-06
B 1.5.3 2014-04-06 2014-04-12
A 1.5.4 2014-04-12 2014-04-15
B 1.5.5 2014-05-15 2014-05-20
B 1.6.1 2014-05-20 2014-06-26
A 1.6.2 2014-06-26 2014-07-14
View(Purchases)
TIMESTAMP Amount
2013-11-14 15.44
2013-11-14 13.39
2013-11-14 15.35
2013-11-15 86.43
2014-01-15 12.30
2014-01-17 23.55
I would like to create dummy variables for each version in the data frame 'Purchases' according to the date range which the version was in effect. i.e. if the timestamp of the purchase is within the date range that that version was in effect, that version dummy = 1 otherwise 0.
View(Purchases)
TIMESTAMP Amount Version_1.2.3 Version_1.3.1 ....
2013-11-14 15.44 1 0
2013-11-14 13.39 1 0
2013-11-14 15.35 1 0
2013-11-15 86.43 1 0
2014-01-15 12.30 0 1
2014-01-17 23.55 0 1
....
Thanks you in advance for any advise or help.
The task can be divided in two sub-tasks:
Release_Date
before theTIMESTAMP
of the purchase,For finding the most recent release, the
data.table
package offers two possibilities,a rolling join
or a non-equi join
which produces the same result.
For reshaping, the
dcast()
function is used withlength()
for aggregation:or, in case the columns shall be renamed during reshaping