Best way to iteratively construct a Pandas DataFrame

1k views Asked by At

Let's say I have an algorithm that I am looping. It will return an unknown number of results and I want to store them all in a DataFrame. For example:

df_results = pd.DataFrame(columns=['x', 'x_squared'])

x = 0
x_squared = 1

while x_squared < 100:
    x_squared = x ** 2

    df_iteration = pd.DataFrame(data=[[x,x_squared]], columns=['x', 'x_squared'])
    df_results = df_results.append(df_iteration, ignore_index=True)

    x += 1

print(df_results)

Output:

     x  x_squared
0    0          0
1    1          1
2    2          4
3    3          9
4    4         16
5    5         25
6    6         36
7    7         49
8    8         64
9    9         81
10  10        100

The problem is when I want to do a high number of iterations. The mathematical operation itself is pretty quick. However, the dataframe creation and append become really slow when we do a big loop.

I know this particular example can be solved easily without using dataframes in each iteration. But imagine a complex algorithm which also performs operation with dataframes, etc. For me, sometimes it is easier to build your result dataframe just step by step. Which is the best approach to do so?

1

There are 1 answers

0
OD1995 On BEST ANSWER

It's much more efficient to build a list of dictionaries from which a data frame can be created. Something like this:

dictList = []

x = 0
x_squared = 1

while x_squared < 100:
    x_squared = x ** 2

    dict1 = {}
    dict1['x'] = x
    dict1['x_squared'] = x_squared
    dictList.append(dict1)
    x += 1

df = pd.DataFrame(dictList)