Sub-title: Dumb it down pandas, stop trying to be clever.
I've a list (res
) of single-column pandas data frames, each containing the same kind of numeric data, but each with a different column name. The row indices have no meaning. I want to put them into a single, very long, single-column data frame.
When I do pd.concat(res)
I get one column per input file (and loads and loads of NaN cells). I've tried various values for the parameters (*), but none that do what I'm after.
Edit: Sample data:
res = [
pd.DataFrame({'A':[1,2,3]}),
pd.DataFrame({'B':[9,8,7,6,5,4]}),
pd.DataFrame({'C':[100,200,300,400]}),
]
I have an ugly-hack solution: copy every data frame and giving it a new column name:
newList = []
for r in res:
r.columns = ["same"]
newList.append(r)
pd.concat( newList, ignore_index=True )
Surely that is not the best way to do it??
BTW, pandas: concat data frame with different column name is similar, but my question is even simpler, as I don't want the index maintained. (I also start with a list of N single-column data frames, not a single N-column data frame.)
*: E.g. axis=0
is default behaviour. axis=1
gives an error. join="inner"
is just silly (I only get the index). ignore_index=True
renumbers the index, but I stil gets lots of columns, lots of NaNs.
UPDATE for empty lists
I was having problems (with all the given solutions) when the data had an empty list, something like:
res = [
pd.DataFrame({'A':[1,2,3]}),
pd.DataFrame({'B':[9,8,7,6,5,4]}),
pd.DataFrame({'C':[]}),
pd.DataFrame({'D':[100,200,300,400]}),
]
The trick was to force the type, by adding .astype('float64')
. E.g.
pd.Series(np.concatenate([df.values.ravel().astype('float64') for df in res]))
or:
pd.concat(res,axis=0).astype('float64').stack().reset_index(drop=True)
I would use list comphrension such has:
I tested speed for you.
looks like
is the fastest.