I have a Deedle Frame<DateTime,string>
.
The columns contain float
values and are dense (no missing values).
I need to build the data frame from an string []
and then:
- Build a 2D
Matrix
with the whole data - Build a Series
Series<DateTime,Matrix<float,CpuLib>>
, collapsing the rows in a1xn
matrix
In my case, I am experimenting with FCore by StatFactory, but I may use another linear algebra library in the future.
My concern is that I need to make sure that the order of rows and columns is not changed in the process.
Data Frame Construction
I fetch the data using the following.
I notice that the order of columns is different that the initial list of tickers.
Why is that? Will the use of Array.Parallel.Map
change the order?
/// get the selected tickers in a DataFrame from a DataContext
let fetchTickers tickers joinKind =
let getTicker ticker =
query {
for row in db.PriceBarsDay do
where (row.Ticker = ticker)
select row }
|> Seq.map (fun row -> row.DateTime, float row.Close)
|> dict
tickers
|> Array.map (fun ticker -> getTicker ticker) // returns a dict(DateTime, ClosePrice)
|> Array.map (fun dictionary -> Series(dictionary))
|> Array.map2 (fun ticker series -> [ticker => series] |> frame ) tickers
|> Array.reduce (fun accumFrame frame -> accumFrame.Join(frame, joinKind))
Data frame to 2D matrix
In order to build the matrix I use the code below. Mapping on the array of column names (selectedCols
) ensures that the order of columns is not shifted. I run unit tests on the order of rows using Array.Map
and everything looks fine but I would like to know
- if there is a consistency check in the library that would ensure that I may not run into an issue?
- I suppose
Array.Parallel.map
would preserve the order of columns.
Here is the code:
/// Build a matrix
let buildMatrix selectedCols (frame: Frame<DateTime, String>) =
let matrix =
selectedCols
|> Array.map (fun colname -> frame.GetSeries(colname))
|> Array.map (fun serie -> Series.values serie)
|> Array.map (fun aSeq -> Seq.map unbox<float> aSeq)
|> Array.map (fun aSeq -> Matrix(aSeq) )
|> Array.reduce (fun acc matrix -> acc .| matrix)
matrix.T
Data Frame to Time Series of Row Matrices
I build the time series of row matrices with the code below.
- Keeping the data in the Series should ensure that the order of rows is preserved.
- How can I filter the columns and ensure that the column order is exactly as in the array of column names passed on to the function?
Here is the code:
// Time series of row matrices - it'll be used to run a simulation
let timeSeriesOfMatrix frame =
frame
|> Frame.filterRows (fun day target -> day >= startKalman)
|> Frame.mapRowValues ( fun row -> row.Values |> Seq.map unbox<float> )
|> Series.mapValues( fun row -> Matrix(row) )
Many thanks.
PS: I kept all the three scenarios together because I believe that the three examples above would better help other users and myself understand how the library works rather than discussing each single case separately.
To answer the first part, the order changes because you are joining ordered frames (containing just a single series) and the frame construction preserves the ordering in this case. You can probably replace the last two lines using just
Frame.ofColumns
instead of using explicit join (this will always do outer join, but if you need inner join, you can then useFrame.dropSparseRows
to drop the missing values).In your second sample, everything looks good - you could save some work by getting data as a float directly;
The third sample also looks good and you can make it a bit shorter: