I am looking for an efficient way to apply a function on each row of a dataframe to perform some operation and repeat the row by a number defined in other column. Currently, I am doing it by iterate on each row, but it takes too long on a large dataframe.
Sample code is as below:
`import pandas as pd
def my_func(row):
row = row.to_frame().T
repeated_row = row.loc[row.index.repeat(row['col2'])]
return repeated_row
df = pd.DataFrame(data = {'col1':list('abc'),
'col2': [2,2,3]})
df_comb = pd.DataFrame()
for i, row in df.iterrows():
df_rep = my_func(row)
df_comb = pd.concat([df_comb, df_rep], axis=0)`
However, I want a solution that's not using the for loop as above and I couldn't find an answer for this historically. I imagine there will be an equivalent way to use "apply" function to this df, such as:
df_comp = pd.concat([df.apply(lambda row: my_func(row)), axis=1], axis=0)
But at the moment this syntax does not work properly.
Much appreciated if you could point out the correct solution.