Hi I have created a dictionary of dataFrame with this code
import os
import pandas
import glob
path="G:\my_dir\*"
dataList={}
for files in glob.glob(path):
dataList[files]=(read_csv(files,sep=";",index_col='Date'))
The different dataframe present in the dictory have different time sample. An example of dataFrame(A) is
Date Volume Value
2014-01-04 06:00:02 6062 108000.0
2014-01-04 06:06:05 6062 107200.0
2014-01-04 06:12:07 6062 97400.0
2014-01-04 06:18:10 6062 99200.0
2014-01-04 06:24:12 6062 91300.0
2014-01-04 06:30:14 6062 84100.0
2014-01-04 06:36:17 6062 57000.0
Example of dataFrame(B) is
Date Volume Value
2014-01-04 05:52:50 6062 4.7
2014-01-04 05:58:53 6062 4.7
2014-01-04 06:04:56 6062 4.9
2014-01-04 06:10:58 6062 5.1
2014-01-04 06:17:01 6062 5.2
2014-01-04 06:23:03 6062 5.2
2014-01-04 06:29:05 6062 5.5
2014-01-04 06:35:08 6062 5.5
The different data frame don't have the same number of rows. I want to merge the different data frame in a single one like this:
Data Volume B A Value(DataframeN)
2014/04/01 05:52:50 6062 4.70 NaN
2014/04/01 05:58:53 6062 4.70 NaN
2014/04/01 06:04:56 6062 4.90 107465.51
2014/04/01 06:10:58 6062 5.10 100652.60
2014/04/01 06:17:01 6062 5.20 98899.57
2014/04/01 06:23:03 6062 5.20 92618.56
2014/04/01 06:29:05 6062 5.50 85301.73
2014/04/01 06:35:08 6062 5.50 61523.06
I have done this easily with Matlab using with the command
ts_A=timeseries(ValueA,datenum(DateA));
ts_B=timeseries(ValueB,datenum(DateB));
res_A=resample(ts_A,datenum(DateB));
I have to do this for several sets of csv files so I wanted to automate the process with python.
Tnx
You can
concatthe twoDataFrames,interpolate, thenreindexon theDataFrameyou want.I assume we have a certain number of
DataFrames, where theDateis aDateTimeIndexin all of them. I will use two in this example, since you used two in the question, but the code will work for any number.df_a:df_b:And I will put these into a
dictfor the example. You read them directly into adict, so you don't need to do this step. I just want to show how my exampledictis formatted. Thedictkeysdon't matter, any validdictkeywill work:This gets us to where you currently are, with my
dataListhopefully having the same format as yours.The first thing you need to do is to combine the
DataFrames. I use thedictkeysasMultiIndexcolumn names so you can keep track of which instance of a given column came from whichDataFrame. You can do that like so:This gives you a
DataFramelike this:Next, you need to interpolate to fill in the missing values. I interpolate using
'time'modeso it properly handles the time indexes:This gives you a
DataFramelike this:I think generally it would be best to stop here, since you keep all data from all
csvfiles. But you said you want only the time points from the longestcsv. To get that, you need to find the longestDataFrame, and then get the rows corresponding to its indexes. Finding the longestDataFrameis easy, you just find the one with the maximum length. Keeping only the time points in thatindexis also easy, you just slice using thatindex(you use thelocmethod for slicing in this way).This gives you the following final
DataFrame:This can be combined into one line if you want:
Or, perhaps a slightly clearer 4 lines:
I am not sure why my final results are different than what you show. I ran your example in
MATLAB(R2015A) myself and got the same results as I get here, so I suspect you generated the final data with a different data set than the example.