I am trying to construct a panel-data data frame which consists of both periodic and 'continous' daily data which should be assigned to each other, so that each row of the new data frame has both, the period, the value for the periodic data and the value and day of one of the days within that period, the data looks similar to this:
> dailycds
Date CDS
1 30-06-2015 194
2 01-07-2015 195
3 02-07-2015 198
4 03-07-2015 198
5 04-07-2015 199
6 30-06-2016 165
7 01-07-2016 172
8 02-07-2016 213
9 03-07-2016 123
10 04-07-2016 321
> periodicassets
Period Assets
1 201506 1314
2 201606 2134
And ultimately, I would like it to look like this:
> df
Period Date Assets CDS
1 201506 30-06-2015 1314 194
2 201506 01-07-2015 1314 195
3 201506 02-07-2015 1314 198
4 201506 03-07-2015 1314 198
5 201606 30-06-2016 2134 165
6 201606 01-07-2016 2134 172
7 201606 02-07-2016 2134 213
8 201606 03-07-2016 2134 123
So basically, the idea would be to take certain ranges of rows from the daily data and assign (and merge) them to the periodic data. Unfortunately though, I cannot simply do that by extracting the mm-yyyy parts of the dates, as the period of 201506 also contains data on July up to the third, whereas the forth relates to no period and should be dropped, as each period should only contain a certain numbers of days (in this case 4).
Here is the code to great above sample data:
dailycds = data.frame(Date = c("30-06-2015", "01-07-2015", "02-07-2015","03-07-2015","04-07-2015","30-06-2016", "01-07-2016", "02-07-2016","03-07-2016","04-07-2016"),
CDS = c(194, 195, 198,198,199,165,172,213,123,321))
dailycds
periodicassets = data.frame(Period = c("201506", "201606"),
Assets = c("1314","2134"))
periodicassets
df = data.frame(Period = c("201506", "201506", "201506", "201506", "201606", "201606", "201606", "201606"),
Date = c("30-06-2015", "01-07-2015", "02-07-2015","03-07-2015", "30-06-2016", "01-07-2016", "02-07-2016", "03-07-2016"),
Assets = c("1314", "1314", "1314", "1314", "2134", "2134", "2134", "2134"),
CDS = c(194, 195, 198, 198, 165, 172, 213, 123))
Context and Additional Complications
So as suggested in the given solutions, my previous example was quite specific and probably overly simplified. Thus, to get a little closer to my problem here is some additional context: Ultimately, the periodic data refers to end of month holdings of banks' assets, to which I want to assign the daily CDS data around the time (e.g.) 3 days before the end of the month and 6 days afterwards. So in the panel of course there are multiple banks, for each of which the (same) CDS data has to be assigned to its holdings. (E.g. if I have 2 banks for which I need 3 days before and 6 after the end of the month I have (3+1+6)*2 days.) As pointed out in the comments, I always refer to business/working days in my question, as my time series do not contain any holidays etc..
So, to do justice to the problem, here is a snippet from the original with only one period:
> periodicassets
BankName Period value
2 BPCE 201412 112189.50
4 Credit Agricole 201412 81618.76
Date CDS
<dttm> <chr>
1 2015-01-12 46.869
2 2015-01-09 48.121000000000002
3 2015-01-08 48.625999999999998
4 2015-01-07 48.801000000000002
5 2015-01-06 48.633000000000003
6 2015-01-05 46.670999999999999
7 2015-01-02 45.158000000000001
8 2015-01-01 47.32
9 2014-12-31 47.658000000000001
10 2014-12-30 45.843000000000004
11 2014-12-29 47.588999999999999
12 2014-12-26 47.625999999999998
13 2014-12-25 47.697000000000003
14 2014-12-24 47.414999999999999
15 2014-12-23 48.075000000000003
16 2014-12-22 48.085999999999999
17 2014-12-19 47.496000000000002
18 2014-12-18 46.534999999999997
19 2014-12-17 48.149000000000001
which can be accessed here: periodic assets, dailycds
When looking through the forum, I found a similar questions such as: create an index for aggregating daily data to match periodic data and create an index for aggregating daily data to match periodic data, however, while the first one tries to aggregate the data, the second one is already blessed with the format I want to have (in object xtime).
The key issue of this question is how
Period
is mapped toDate
. From OP's decription I have understood that each period includes the last day of the actual month plus the first three days into the next month, 4 days in total.This can be solved with some date arithmetic and a right join:
There are only 4 rows per period as requested and the result is in line with the expected result
df
:Unused levels have to be dropped to pass the strict checks of
all.equal()
Caveat
The code has been tested to work with the sample data provided. In case of continuous daily as well as periodic data it might be necessary to add code to remove days which do not belong to the 4 days period.
Edit: More realistic sample data
The OP has updated his question and is providing more realistic sample data via dropbox. Now,
dailycds
contains daily data (except for weekends). As already mentioned in the Caveat above, this requiresdailycds
to be filtered for the relevant days.The OP isn't clear how the days to be considered before and after the turn of the month are defined. Here, we assume that 3 days before the end of the month and 6 days afterwards refers to calendar days and not business days.
Edit 2: Using business days instead of calendar dates.
The OP has clarified that he is using buiness days instead of calendar days. This seemingly minor change of the specification has a severe impact on the way the dates to be included are selected.
Now, always the first 6 entries in each month are picked as well as the last 3 entries before the last trading day of the month (ultimo) and the ultimo itself which results in 3 + 1 + 6 = 10 business days to pick.
Note that the result data set contains (3 + 1 + 6) * 2 months * 2 banks = 40 rows.
Data from dropbox
In case the dropbox links break: