Loop through rows in pandas dataframe and update check on the column condition

286 views Asked by At
Row_Number ColumnA ColumnB
1 Data 1 A
2 Data A
3 Data B
4 Data 2 B
5 Data C
6 Data D
7 Data E
8 Data 3 C

this is how my data updated on a dataframe from another table, In this "Data 1" having a continuation as "Data A" and "Data B" and having only one data as on column B as "A", but in data frame all the "Data 1", "Data A", "Data B" are created in separate rows. Is there any operations are available to concatenate all the "Row_number" 1,2,3 to be appended on a single line and having only value as "A" in columnB?

Expected output is below:

Row_Number ColumnA ColumnB
1 Data1,Data A, Data B A
2 Data2,Data C, Data D,Data E B
3 Data3 C

Thanks in Advance

I'm new to python and have tried the below code

# variable that stores the values: table_values
import pandas as pd

df=pd.DataFrame(table_values,columns=['ColumnA','ColumnB'])

for index, row in df.iterrows():
  if df.loc[index,'ColumnB'] == '' & df.loc[index,'ColumnA'] != '':
    df.loc[index-1, 'ColumnA'] = df.loc[index-1, 'ColumnA'] + df.loc[index, 'ColumnA']  
print(df)'''
1

There are 1 answers

2
Tom On

You could do a groupby here. The simplest is to just use Column B, since that already corresponds to the rows you want to group. So given this data:

import pandas as pd

df = pd.DataFrame({'ColumnA': ['Data1', 'DataA', 'DataB',
                               'Data2', 'DataC', 'DataD', 'DataE',
                               'Data3'],
                   'ColumnB': ['A', None, None,
                               'B', None, None, None,
                               'C']})

#  ColumnA ColumnB
# 0   Data1       A
# 1   DataA    None
# 2   DataB    None
# 3   Data2       B
# 4   DataC    None
# 5   DataD    None
# 6   DataE    None
# 7   Data3       C

Create something to group with, by filling the missing values:

grouper = df['ColumnB'].ffill()

# 0    A
# 1    A
# 2    A
# 3    B
# 4    B
# 5    B
# 6    B
# 7    C
# Name: ColumnB, dtype: object

Then aggregate to concat strings, following this post:

output = df['ColumnA'].groupby(grouper).apply(lambda x : ', '.join(x)).reset_index()

Final result:

  ColumnB                     ColumnA
0       A         Data1, DataA, DataB
1       B  Data2, DataC, DataD, DataE
2       C                       Data3