import pandas as pd
import numpy as np
import random
import string
N = 100
J = [2012,2013,2014]
K = ['A','B','C','D','E','F','G','H']
L = ['h','d','a']
df = pd.DataFrame(
np.random.uniform(1,10,size=(N, 3)),
columns=list('XYZ')
)
df['ht'] = pd.Series(random.choice(K) for _ in range(N))
df['at'] = pd.Series(random.choice(K) for _ in range(N))
df['J'] = pd.Series(random.choice(J) for _ in range(N))
df['R'] = pd.Series(random.choice(L) for _ in range(N))
df1 = (df.X).groupby([df.ht, df.J]).agg(['sum', 'size']).unstack(fill_value=0)
print(df.head())
I'd like to create a new dataframe where column 'X' will be clustered in 10 even bins. Then a sum needs to be calculated per year per cluster: 'R' * 'X', where 'R' is 'h'.
EDIT;
Example of desired endresult:
bins/2012/2013/2014/Total_sum_years/Total_number_'h'
0 < 1.5 /15/8/5/28/7
New guess
output
I'm taking a shot here. Still not sure exactly what you want. I first redid your dataframe creation to be better.
I then grouped X with pd.cut and also grouped by J and ht
Which produced the following. I think this gets you close to what you want.