Python :Remove items that are less than 4 characters

60 views Asked by At

I have a data frame as shown below:

I need to delete items that are less that 4 characters from the column CityIds. There could be spaces after comma, as there are thousands of elements under Items.

CityIds
98765, 98-oki, th6, iuy89, 8.90765
89ol, gh98.0p, klopi, th, loip
98087,PAKJIYT, hju, yu8oi, iupli

For example: I want to remove th6 or show th6 in a separate column.

2

There are 2 answers

0
harriet On

answer above is obviously cleaner; but, here I added a new column for excluded IDs:

import pandas as pd

d = {'CityIds': ['98765, 98-oki, th6, iuy89, 8.90765',
                 '89ol, gh98.0p, klopi, th, loip',
                 '98087, PAKJIYT, hju, yu8oi, iupli']}
df = pd.DataFrame(data=d)
n = len(df['CityIds'])
df['rmvdIDs'] = ['' for _ in range(n)]
for i in range(n):
    row = df['CityIds'][i]
    cityIDs = "".join(row.split()).split(',')
    new_IDs = [i for i in cityIDs if len(i) >= 4]
    excl_IDs = list(set(cityIDs) - set(new_IDs))
    new_row = ", ".join(new_IDs)
    excl_row = ", ".join(excl_IDs)
    df['CityIds'][i] = new_row
    df['rmvdIDs'][i] = excl_row

print(df)

will return:

                         cityIDs rmvdIDs
0  98765, 98-oki, iuy89, 8.90765     th6
1     89ol, gh98.0p, klopi, loip      th
2   98087, PAKJIYT, yu8oi, iupli     hju

-- Hope this helps

1
RomanPerekhrest On

Extract and join back just the needed items those length is equal or greater than 4:

df['CityIds'] = df['CityIds'].str.findall(r'([^\s,]{4,})').str.join(', ')

                         CityIds
0  98765, 98-oki, iuy89, 8.90765
1     89ol, gh98.0p, klopi, loip
2   98087, PAKJIYT, yu8oi, iupli