I have the following table for 3 baseball games. Per each game, I have two rows representing the information per each one of the two teams playing in the same game. The way to know which row is referring to what team is by checking the column play_homevisitor and if it has 1, the row is about the home team (located in hometeam column), but if it is a 0, the row is about the visitor team (located in visteam column). This is my dataset df.

Game_ID             hometeam    visteam play_homevisitor    Runs_scored
ATL199204090         ATL         SFN           0                13
ATL199204090         ATL         SFN           1                6
ATL199204100         ATL         SFN           0                3
ATL199204100         ATL         SFN           1                6
ATL199204110         ATL         SFN           0                4
ATL199204110         ATL         SFN           1                0

I also have the number of runs_scored in per each team.

So, I need to calculate a new column with the runs_allowed, which is just switching the value of runs_scored among each two rows with the same Game_ID, as follows:

Game_ID        hometeam visteam play_homevisitor    Runs_scored     Runs_allowed       
ATL199204090    ATL      SFN           0                13                6
ATL199204090    ATL      SFN           1                6                 13
ATL199204100    ATL      SFN           0                3                 6
ATL199204100    ATL      SFN           1                6                 3
ATL199204110    ATL      SFN           0                4                 0
ATL199204110    ATL      SFN           1                0                 4

I have one possible approach, but I am curious if you do have one different way to do it.

I have been thinking any possible way, and I noticed that I always have the sequence of 0,1,0,1 in the column play_homevisitor, so I came with an idea:

  1. Create two intermediate columns shifting -1 and +1 the runs_scored. It will create the column upShift and the column downShift. Analysing only each pair of game_D; In the upShift, runs scored by the hometeam will go up and in the downShift the runs scored by the visitor team will go down.
df['downShift'] = df['Runs_scored'].shift(periods= 1).fillna(0)
df['upShift'] = df['Runs_scored'].shift(periods= -1).fillna(0)
  1. Then if play_homevisitor is 0, I will pick up the value in upShift, else, if play_homevisitor is 1, I will pick up the value in downshift

df['Runs_allowed'] = df[['play_homevisitor','downShift', 'upShift']].apply(lambda x: x['upShift'] if x['play_homevisitor'] == 0 else x['downShift'], axis=1)

3 Answers

1
Erfan On Best Solutions

You can use groupby in combination with shift twice. Then use fillna to create the new column:

s1 = df.groupby('Game_ID')['Runs_scored'].shift(-1)
s2 = df.groupby('Game_ID')['Runs_scored'].shift(1)

df['Runs_allowed'] = s1.fillna(s2).astype(int)

print(df)
        Game_ID hometeam visteam  play_homevisitor  Runs_scored  Runs_allowed
0  ATL199204090      ATL     SFN                 0           13             6
1  ATL199204090      ATL     SFN                 1            6            13
2  ATL199204100      ATL     SFN                 0            3             6
3  ATL199204100      ATL     SFN                 1            6             3
4  ATL199204110      ATL     SFN                 0            4             0
5  ATL199204110      ATL     SFN                 1            0             4
0
Manu AI On

I have been thinking any possible way, and I noticed that I always have the sequence of 0,1,0,1 in the column play_homevisitor, so I came with an idea:

  1. Create two intermediate columns shifting -1 and +1 the runs_scored. It will create the column upShift and the column downShift. Analysing only each pair of game_D; In the upShift, runs scored by the hometeam will go up and in the downShift the runs scored by the visitor team will go down.
df['downShift'] = df['Runs_scored'].shift(periods= 1).fillna(0)
df['upShift'] = df['Runs_scored'].shift(periods= -1).fillna(0)
  1. Then if play_homevisitor is 0, I will pick up the value in upShift, else, if play_homevisitor is 1, I will pick up the value in downshift

df['Runs_allowed'] = df[['play_homevisitor','downShift', 'upShift']].apply(lambda x: x['upShift'] if x['play_homevisitor'] == 0 else x['downShift'], axis=1)

0
jezrael On

If there are all pairs and columns are sorted use:

df = df.sort_values(['Game_ID','play_homevisitor'])

m1 = df['play_homevisitor'] == 0
m2 = df['play_homevisitor'] == 1

s1 = df.loc[m1, 'Runs_scored'].values
s2 = df.loc[m2, 'Runs_scored'].values
df.loc[m2, 'Runs_allowed'], df.loc[m1, 'Runs_allowed'] = s1, s2 

print (df)
        Game_ID hometeam visteam  play_homevisitor  Runs_scored  Runs_allowed
0  ATL199204090      ATL     SFN                 0           13           6.0
1  ATL199204090      ATL     SFN                 1            6          13.0
2  ATL199204100      ATL     SFN                 0            3           6.0
3  ATL199204100      ATL     SFN                 1            6           3.0
4  ATL199204110      ATL     SFN                 0            4           0.0
5  ATL199204110      ATL     SFN                 1            0           4.0