Goal: I have two Pandas Series. On each I want to apply a function that gives me some summarizing statistic for the column (like sum, count and so on). All this is embedded in a for each` loop. E.g.:

    Id      V1       V2    
    0       3        2
    1       2        1

    Id      T1       T2    
    0       4        2
    1       5        2

The result (on a count task) suppose to be:

    Id      V1       V2      T1       T2  
    0       2        2       2        2

My code works fine so for but the solution I get is:

    Id      V1       V2      T1       T2  
    0       2        2       NaN      NaN
    1       NaN      NaN     2        2

My code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'a' : np.random.randn(6),
                 'b' : np.random.randn(6),
                 'c' : np.random.randn(6)})

df2 = pd.DataFrame({'d' : np.random.randn(6),
                 'e' : np.random.randn(6),
                 'f' : np.random.randn(6)})

def mysum(col):
    return col.count()

lst = []

myDf = pd.DataFrame()

for el in lst:
    test = el.apply(lambda cols: mysum(cols))
    myDf = myDf.append(test, ignore_index=True)


Can anyone help me with getting the result I am aiming for? I also tried .assign but this could not solve my problem as well. P.S.: I know that simple things like count or sum can be accomplished quite easy but I have some complicated task and this is just an easy example.

2 Answers

Community On Best Solutions

Try this

pd.concat([df1,df2], axis=1)

And then apply whatever function you want to.

Quang Hoang On

It's hard to say if the problem is from concatenating dataframes or form mySum(). But you can try:

myDf = (pd.concat(el.apply(lambda cols: mySum(cols)) 
                   for el in [df1,df2])