In Pandas DataFrame how can I increment the values per row if it matches a character

50 views Asked by At

This is somewhat similar to this question but a little bit complex.

Assuming I have this table:

data = {'column1': ['A1', 'A1', 'A1', 'A1'],
        'column2': ['A9', 'A1', 'A8', 'A1'],
        'column3': ['D1', 'D1', 'D1', 'D1'],
        'column4': ['A6', 'A2', 'A3', 'A4'],
        'column5': ['H1', 'H1', 'H1', 'H1'],
        'column6': ['A4', '', '', 'A3'],
        'column7': ['A5', '', '', 'A9']}

df = pd.DataFrame(data)

+---------+---------+---------+---------+---------+---------+---------+
| column1 | column2 | column3 | column4 | column5 | column6 | column7 |
+---------+---------+---------+---------+---------+---------+---------+
| A1      | A9      | D1      | A6      | H1      | A4      |  A5     |
| A1      | A1      | D1      | A2      | H1      |         |         |
| A1      | A8      | D1      | A3      | H1      |         |         |
| A1      | A1      | D1      | A4      | H1      | A3      |  A9     |
+---------+---------+---------+---------+---------+---------+---------+

my goal here is to reset the number counterpart of all values containing "A" per row, starting with A1. if "A1" re-occurs on the same row, move to the next cell. Also, values that is not "A" and blanks should be ignored.

+---------+---------+---------+---------+---------+---------+---------+
| column1 | column2 | column3 | column4 | column5 | column6 | column7 |
+---------+---------+---------+---------+---------+---------+---------+
| A1      | A2      | D1      | A3      | H1      | A4      | A5      |
| A1      | A1      | D1      | A2      | H1      |         |         |
| A1      | A2      | D1      | A3      | H1      |         |         |
| A1      | A1      | D1      | A2      | H1      | A3      | A4      |
+---------+---------+---------+---------+---------+---------+---------+
2

There are 2 answers

2
mozway On BEST ANSWER

If I remember well you had a similar question, you can use a similar cumsum logic with a mask:

# identify Ax cells that are not A1
m = df.apply(lambda c: c.str.startswith('A') & c.ne('A1'))

# perform the cumsum, only assign to the above mask
df[m] = m.cumsum(axis=1).add(1).astype(str).radd('A')

Output:

  column1 column2 column3 column4 column5 column6 column7
0      A1      A2      D1      A3      H1      A4      A5
1      A1      A1      D1      A2      H1                
2      A1      A2      D1      A3      H1                
3      A1      A1      D1      A2      H1      A3      A4
1
Tarifazo On

Try this:

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'a': ['A1', 'A1', 'A2', 'B1', 'A8'],
    'b': ['A1', 'A2', 'A4', 'D1', 'A9']
}).T
print(df, '\n')

def f(x):
  return np.insert(
      np.where(
      (x[1:].str[0] != 'A') | (x[1:].str[1:].values == x[:-1].str[1:].values) ,
      x[1:],
      'A' + (x[:-1].str[0]=='A').astype(int).cumsum().astype(str)
      ), 0, x.values[0])

df2 = df.copy()
df2.iloc[:] = np.vstack(df.apply(f, axis=1))
print(df2)

>>    0   1   2   3   4
a  A1  A1  A2  B1  A8
b  A1  A2  A4  D1  A9 

    0   1   2   3   4
a  A1  A1  A2  B1  A3
b  A1  A1  A2  D1  A3