A for loop inside a for loop not executing after the first run

Question

A for loop inside a for loop not executing after the first run

89 views Asked by Jingy At 08 October 2023 at 20:18

I've a dataset of firms with year from 1960-1989 and month (1-12) that looks like this.

PERMNO.	Year	Month
10057	1960	1
10057	1960	2
10057	1960	3
10057	1960	4
10057	1960	5
10057	1960	6
10057	1960	7
10057	1960	...
10057	1961	1
10057	1961	2
10057	1961	3
10057	1961	4
10057	1961	5
10057	1961	6
10057	1961	7
10057	1961	...
10057	1989	1
10057	1989	2
10057	1989	3
10057	1989	4
10057	1989	5
10057	1989	6
10057	1989	7
10057	1989	...
11857	1960	1
11857	1960	2
11857	1960	3
11857	1960	4
11857	1960	5
11857	1960	6
11857	1960	7
11857	1960	...

What I'm trying to do is for each Month of July for each firm (PERMNO), I want all the rows before it. For example, For firm 10057, for July 1960, I want all rows beginning from Jan 1960-July 1960 and for next for the same firm 10057, for July 1961, I want all rows beginning from Jan 1960-July 1961 etc etc. Something like this:

PERMNO.	Year	Month
10057	1960	1
10057	1960	2
10057	1960	3
10057	1960	4
10057	1960	5
10057	1960	6
10057	1960	7
10057	1960	1
10057	1960	2
10057	1960	3
10057	1960	...
10057	1961	4
10057	1961	5
10057	1961	6
10057	1961	7
10057	1960	1
10057	1960	...
10057	1961	1
10057	1961	...
10057	1962	6
10057	1962	7

and so on for each firm and within each firm, all Julys.

I've written a loop inside a loop. The first firm worked but starting from the second, the loop started to return all rows from the beginning 1960 to the end 1980. Not sure where I did wrong for the loop.

This is my code:

permnolist = CRSP2.drop_duplicates('PERMNO').PERMNO.to_list()
LL = []
for indexer in permnolist:
    df = CRSP2[CRSP2['PERMNO']==indexer]
    july_list = df[df['Month']==7].Month.index.to_list()
    for indexer2 in july_list:
        df2 = df.iloc[:indexer2]
        LL.append(df2)

I made a list of all unique firms. The first loop attempts to get a dataframe of the first firm and the second loop uses the index for the row where Month=July (Yes, I've reset the index). I use iloc to take all rows before July. The first firm worked beautifully but once it looped to the second firm, it started returning all rows for that firm from beginning year 1960 all the way to the end 1980 for many times.

Original Q&A

There are 2 answers

**Sebastian Liebscher** · Answer 1 · 2023-10-09T05:57:29+00:00

Given you input example data you could do something like this to avoid loops at all:

# Filter rows where id=1 and month <= 7
filtered_df = CRSP2[(CRSP2["PERMNO"] == 10057) & (CRSP2["Month"] <= 7)]
print(filtered_df)

to output exactly what you describe as output:

PERMNO. Year    Month
10057   1960    1
10057   1960    2
10057   1960    3
10057   1960    4
10057   1960    5
10057   1960    6
10057   1960    7
10057   1960    1
10057   1960    2
10057   1960    3
10057   1960    ...
10057   1961    4
10057   1961    5
10057   1961    6
10057   1961    7
10057   1960    1
10057   1960    ...
10057   1961    1
10057   1961    ...
10057   1962    6
10057   1962    7

**ziying35** · Answer 2 · 2023-10-09T07:07:00+00:00

ziying35 On 09 October 2023 at 07:07

try this:

result = CRSP2.query('Month<=7').sort_values('PERMNO.')

TechQA.

A for loop inside a for loop not executing after the first run

There are 2 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in FOR-IN-LOOP

Popular Questions

Trending Questions