New to pandas: I want all the values main category to be included in subcategory by using pandas

63 views Asked by At

I have a table that looks like:

id string
3.ab.3. axz
3.ab.3.a. b
3.ab.3.b. c
3.ab.4. dog
3.ab.4.a. e
3.ab.4.b. f
3.ab.4.b.1 g
3.ab.4.b.2 h

What I expect is the subcategory should have the string from its main category so 3.ab.3.a should have everything from 3.ab.3. and then the get rid of 3.ab.3.

id string
3.ab.3.a. axzb
3.ab.3.b. axzc
3.ab.4.a. doge
3.ab.4.b.1 dogfg
3.ab.4.b.2 dogfh

I tried using .join and .startswith but couldn't get through

1

There are 1 answers

0
rhug123 On

Try this:

s = df.sort_values('id')['id'].str.split('.',n = 3).str[:3].str.join('')

s2 = s.ne(s.shift())

df.assign(string = df['string'].where(s2).groupby(s).ffill().add(df['string'])).loc[~s2]

Output:

           id string
1   3.ab.3.a.   axzb
2   3.ab.3.b.   axzc
4   3.ab.4.a.   doge
5   3.ab.4.b.   dogf
6  3.ab.4.b.1   dogg
7  3.ab.4.b.2   dogh