How can I iterate through a DataFrame to concatenate strings once an empty cell is reached?

58 views Asked by At

I've extracted some pdf tables using Camelot.
The first column contains merged cells, which is often problematic.

Despite tweaking some of the advanced configurations, the merged cells for the first column, span across rows.

I'd like to iterate through the first column rows to achieve the following:

  1. Start from the top
  2. if you find an empty cell, then move / concatenate each previous string sequentially (with a space in between), to the first instance of a non-empty cell.
Column What I have now What I'd like
1 A A B C D
2 B
3 C
4 D
5
6 F F G
7 G
8
1

There are 1 answers

0
Panda Kim On

Code

cond1 = df['What I have now'].isna()
g = df['What I have now'].groupby(cond1.cumsum()[~cond1])
m = g.cumcount().eq(0)
df["What I'd like"] = g.transform(' '.join).where(m)

df:

   Column What I have now What I'd like
0       1               A       A B C D
1       2               B           NaN
2       3               C           NaN
3       4               D           NaN
4       5             NaN           NaN
5       6               F           F G
6       7               G           NaN
7       8             NaN           NaN

Example Code

import pandas as pd
data1 = {'Column': [1, 2, 3, 4, 5, 6, 7, 8], 
         'What I have now': ['A', 'B', 'C', 'D', float('nan'), 'F', 'G', float('nan')]}
df = pd.DataFrame(data1)