I want to change a number of values in my pandas dataframe, where the indices that are indicating the columns may vary in size.

I need something that is faster than a for-loop, because it will be done on a lot of rows, and this turned out to be too slow.

As a simple example, consider this

df = pd.DataFrame(np.zeros((5,5)))

Now, I want to change some of the values in this dataframe to 1. If I e.g. want to change the values in the second and fith row for the first two columns, but in the fourth row I want to change all the values, I want something like this to work:

col_indices = np.array([np.arange(2),np.arange(5),np.arange(2)]) 

row_indices = np.array([1,3,4]) 

df.loc(row_indices,col_indices) =1

However, this does not work (I suspect that it does not work because the shape of the data you would select is not conform with a dataframe).

Is there any more flexible way of indexing without having to loop over rows etc.?

A solution that works only for range-like arrays (as above) would also work for my current problem - but general answer would also be nice.

Thanks for any help!

1 Answers

yatu On Best Solutions

IIUC here's one approach. Define the column indices as the amount of columns where you want to insert 1s instead, and the rows where you want to insert them:

col_indices = np.array([2,5,2])
row_indices = np.array([1,3,4]) 
arr = df.values

And use advanced indexing to set the cells of interest to 1:

arr[row_indices] = np.arange(arr.shape[0]) <= col_indices[:,None]

array([[0., 0., 0., 0., 0.],
       [1., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1.],
       [1., 1., 0., 0., 0.]])