Let's say I have an algorithm that I am looping. It will return an unknown number of results and I want to store them all in a DataFrame. For example:
df_results = pd.DataFrame(columns=['x', 'x_squared'])
x = 0
x_squared = 1
while x_squared < 100:
x_squared = x ** 2
df_iteration = pd.DataFrame(data=[[x,x_squared]], columns=['x', 'x_squared'])
df_results = df_results.append(df_iteration, ignore_index=True)
x += 1
print(df_results)
Output:
x x_squared
0 0 0
1 1 1
2 2 4
3 3 9
4 4 16
5 5 25
6 6 36
7 7 49
8 8 64
9 9 81
10 10 100
The problem is when I want to do a high number of iterations. The mathematical operation itself is pretty quick. However, the dataframe creation and append become really slow when we do a big loop.
I know this particular example can be solved easily without using dataframes in each iteration. But imagine a complex algorithm which also performs operation with dataframes, etc. For me, sometimes it is easier to build your result dataframe just step by step. Which is the best approach to do so?
It's much more efficient to build a list of dictionaries from which a data frame can be created. Something like this: