Currently I am working on Hypothesis Testing on datasets.
While reading about chi square tests I found this notebook through Kaggle:
It is chi square hypothesis testing on titanic dataset.
For calculating relationship between class and survival he used this code:
1) For getting contingency table (observed values)
PClass_survd = pd.pivot_table(data,index=['Pclass'],columns=['Survived'],aggfunc='size')
2) How class and survival is distributed
pct_class = PClass_survd.sum(axis=1)/891
pct_survived = PClass_survd.sum(axis=0)/891
3) To Calculate Expected Values
pct_class.to_frame()@(pct_survived.to_frame().T)
I don't understand How expected values are calculated in step 3. I know pd.to_frame()
convert series to dataframe.
Can anyone please explain this step 3 in detail or how generally expected values be calculated from dataset without using chi square function from stats (with example if possible) ?
Thanks in advance