Is it possible to create a new column in a dataframe where the bins for 'X' are based on a value of another column(s). Example below.
The bins for AR1, PO1 and RU1 are different from one another.
Until now I can only get bins for all values in 'X'.
import pandas as pd
import numpy as np
import string
import random
N = 100
J = [2012,2013,2014]
K = ['A','B','C','D','E','F','G','H']
L = ['h','d','a']
S = ['AR1','PO1','RU1']
np.random.seed(0)
df = pd.DataFrame(
{'X': np.random.uniform(1,10,N),
'Y': np.random.uniform(1,10,N),
'J':np.random.choice(J, N),
'R':np.random.choice(L, N),
'S':np.random.choice(S,N)
})
df['bins_X'] = pd.qcut(df['X'], 10)
print(df.head())
The output I would like to have:
EDIT;
On my real data I get a ValueError: edges being not unique. Can I solve this with i.e. rank? How would I add this to the solution proposed?
Simple use
pd.qcut
within agroupby
onS
Leave of the
labels
parameter if you want them to have their own unique edges