I have a dataframe with 3 columns, in each row I have the probability that this row, the feature T has the value 1, 2 and 3

```
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame({"T1" : [0.8,0.5,0.01],"T2":[0.1,0.2,0.89],"T3":[0.1,0.3,0.1]})
```

For row 0, T is 1 with 80% chance, 2 with 10% and 3 with 10%

I want to simulate the value of T for each row and change the columns T1,T2, T3 to binary features. I have a solution but it needs to loop on the rows of the dataframe, it is really slow (my real dataframe has over 1 million rows) :

```
possib = df.columns
for i in range(df.shape[0]):
probas = df.iloc[i][possib].tolist()
choix_transp = np.random.choice(possib,1, p=probas)[0]
for pos in possib:
if pos==choix_transp:
df.iloc[i][pos] = 1
else:
df.iloc[i][pos] = 0
```

Is there a way to vectorize this code ?

Thank you !

We can use

`numpy`

for this:This generates a single column of random values and compares it to the column-wise cumsum of the dataframe, which results in a

`DataFrame`

of values where the first`False`

value shows which "bucket" the random value falls in. With`idxmax`

, we can get the index of this bucket, which we can then convert back with`pd.get_dummies`

.Example:

Output:

A note:

Most of the slowdown comes from

`pd.get_dummies`

; if you use Divakar's method of`pd.DataFrame(result.view('i1'), index=df.index, columns=df.columns)`

, it gets a lot faster.