Reshape2 dcast() function returning wrong values

938 views Asked by At

I have tick data for the equities in the Dow Jones Industrial Average over the course of the day. Here is a sample of the data:

> head(df)
                        TS  Sym      Ask
1: 2015-03-24 14:00:00.000 YMM5 17956.00
2: 2015-03-24 14:00:00.002 AAPL   126.91
3: 2015-03-24 14:00:00.005  UNH   118.35
4: 2015-03-24 14:00:00.009  XOM    84.64
5: 2015-03-24 14:00:00.014  AXP    81.35
6: 2015-03-24 14:00:00.016   PG    84.19

I am trying to use the dcast() function of reshape2 to transform the data into wide format, so it will look like:

                        TS   AAPL   AXP    PG    UNH   XOM
1: 2015-03-24 14:00:00.000 126.91 81.35 84.19 118.35 84.64

When I try the following set of commands though, here is what I get:

tick <- data.table(read.csv("2015-3-24.csv"))
df<- data.table(TS = tick$DateTime, Sym = tick$Symbol, Ask = tick$Ask, Bid = tick$Bid)
tmp <- dcast(data = df, formula = TS ~ Sym)

head(tmp)


                       TS AAPL AXP BA CAT CSCO CVX DD DIS GE GS HD IBM INTC JNJ JPM KO MCD MMM MRK MSFT NKE PFE PG TRV UNH UTX V VZ WMT XOM YMM5
1 2015-03-24 14:00:00.000    0   0  0   0    0   0  0   0  0  0  0   0    0   0   0  0   0   0   0    0   0   0  0   0   0   0 0  0   0   0    1
2 2015-03-24 14:00:00.002    1   0  0   0    0   0  0   0  0  0  0   0    0   0   0  0   0   0   0    0   0   0  0   0   0   0 0  0   0   0    0
3 2015-03-24 14:00:00.005    0   0  0   0    0   0  0   0  0  0  0   0    0   0   0  0   0   0   0    0   0   0  0   0   1   0 0  0   0   0    0

I know I am getting the formula wrong or something, but no matter what I have though to try I am getting the same result. Thanks in advance.

1

There are 1 answers

1
Arun On BEST ANSWER

Several issues with your post, I'd like to get clarified: (too long for comment, so this is a guess at best as of now.)

  • No reproducible example.
  • You use data.table, but the version is not shown. data.table provides efficient implementations of melt and dcast. In versions <= 1.9.4, you might want to use dcast.data.table. From versions 1.9.5+, you can directly use dcast() without loading reshape2. So I'm not sure if you're using dcast from reshape2 or from the devel version of data.table.
  • Your formula (if not a typo) is wrong for the result you've shown.

You've not shown the entire result from dcast, I believe. If it complains something about

Aggregate function missing, defaulting to 'length'

then your id and measure vars in formula do not uniquely identify cells. With the minimal info provided, I can only guess that this is indeed your issue.