Understanding Chi Square test on Titanic dataset

560 views Asked by At

Currently I am working on Hypothesis Testing on datasets.

While reading about chi square tests I found this notebook through Kaggle:

https://github.com/viswanathanc/statistics/blob/master/Titanic%20Chi%20Square%20test%20-%20PClass%20vs%20Survied.ipynb

It is chi square hypothesis testing on titanic dataset.

For calculating relationship between class and survival he used this code:

1) For getting contingency table (observed values)

PClass_survd = pd.pivot_table(data,index=['Pclass'],columns=['Survived'],aggfunc='size')

2) How class and survival is distributed

pct_class = PClass_survd.sum(axis=1)/891

pct_survived = PClass_survd.sum(axis=0)/891

3) To Calculate Expected Values

pct_class.to_frame()@(pct_survived.to_frame().T)

I don't understand How expected values are calculated in step 3. I know pd.to_frame() convert series to dataframe.

Can anyone please explain this step 3 in detail or how generally expected values be calculated from dataset without using chi square function from stats (with example if possible) ?

Thanks in advance

0

There are 0 answers