Pandas: selecting specific rows and specific columns using .loc() and/or .iloc()

3k views Asked by At

I have a Pandas dataframe that looks like this:

df = pd.DataFrame ({
    
'id': [1, 17, 19, 17, 22, 3, 0, 3],
'color': ['Green', 'Blue', 'Orange', 'Yellow', 'White', 'Silver', 'Purple', 'Black'],
'shape' : ['Circle', 'Square', 'Circle', 'Triangle', 'Rectangle', 'Circle', 'Square', 'Triangle'],
'person' : ['Sally', 'Bob', 'Tim', 'Sue', 'Bill', 'Diane', 'Brian', 'Sandy']
    
})

df

    id      color     shape       person
0   1       Green     Circle      Sally
1   17      Blue      Square      Bob
2   19      Orange    Circle      Tim
3   17      Yellow    Triangle    Sue
4   22      White     Rectangle   Bill
5   3       Silver    Circle      Diane
6   0       Purple    Square      Brian
7   3       Black     Triangle    Sandy

I set the index to color:

df.set_index ('color', inplace = True )

          id    shape       person
color           
Green     1     Circle      Sally
Blue      17    Square      Bob
Orange    19    Circle      Tim
Yellow    17    Triangle    Sue
White     22    Rectangle   Bill
Silver    3     Circle      Diane
Purple    0     Square      Brian
Black     3     Triangle    Sandy

I'd like to select only the columns id and person and only the indices 2 and 3. To do so, I'm using the following:

new_df = df.loc[:, ['id', 'person']][2:4]

new_df

          id    person
color       
Orange    19    Tim
Yellow    17    Sue

It feels like this might not be the most 'elegant' approach. Instead of tacking on [2:4] to slice the rows, is there a way to effectively combine .loc (to get the columns) and .iloc (to get the rows)?

Thanks!

2

There are 2 answers

0
RomanPerekhrest On

Alternatively, you can start with df.iloc (specifying slices or arbitrary indices) and filter column names at the end:

df.iloc[[2,3]][['id', 'person']]
1
Timeless On

One option is to remove df.set_index('color', inplace=True) and use loc this way :

new_df = df.loc[2:3, ['color', 'id', 'person']].set_index('color')

​ Output :

print(new_df)

        id person
color            
Orange  19    Tim
Yellow  17    Sue