I need to drop rows that do not contain a certain value throughout the table

47 views Asked by At

I have the following dataset:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513250 entries, 0 to 513249
Data columns (total 26 columns):
  # Column Non-Null Count Dtype
--- ------ -------------- -----
  0 Game Title 513196 non-null object
  1 Game Poster 513196 non-null object
  2 Game Release Date 513196 non-null object
  3 Game Developer 512977 non-null object
  4 Genre 513196 non-null object
  5 Platforms 513250 non-null object
  6 Product Rating 425527 non-null object
  7 Overall Metascore 513181 non-null float64
  8 Overall User Rating 513128 non-null object
  9 Reviewer Name 512347 non-null object
  10 Reviewer Type 513250 non-null object
  11 Rating Given By The Reviewer 510449 non-null float64
  12 Review Date 405044 non-null object
  13 Review Text 512804 non-null object
  14 0 513250 non-null object
  15 1 513250 non-null object
  16 2 513250 non-null object
  17 3 513250 non-null object
  18 4 513250 non-null object
  19 5 513250 non-null object
  20 6 513250 non-null object
  21 7 513250 non-null object
  22 8 513250 non-null object
  23 9 513250 non-null object
  24 10 513250 non-null bool
  25 11 513250 non-null object
dtypes: bool(1), float64(2), object(23)

memory usage: 98.4+ MB

Where columns named 0 to 11 contain 'PC' values in several rows. I want to drop the lines that do not contain the PC value by index. But I tried in different ways and I didn't succeed. Columns that do not contain the PC value are set to "False".

How can I do this?

I believe that some way of iterating over the columns will help, but I tried several ways and it didn't work.

I tried:

for index, row in df.iterrows(): 
    if 'PC' not in row.values: 
    df.drop(index, inplace=True)

I tried too:

df = df[df['column'] == 'PC']

# Resetando os índices após a filtragem
df.reset_index(drop=True, inplace=True)

Other solution:

column_pc = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

df = df[df[colum_pc].apply(lambda row: 'PC' in row.values, axis=1)]

df.reset_index(drop=True, inplace=True)
1

There are 1 answers

0
mozway On

Assuming you want to drop all rows that contain at least one "PC" in the first 12 columns, use boolean indexing with all:

out = df[df.iloc[:, :12].ne('PC').all(axis=1)]

How is works:

# select first 12 columns
df.iloc[:, :12]
# True if not "PC" else False
.ne('PC')
# a row is True if all columns are True
.all(axis=1)