How to apply Henze-Zirkler's Multivariate Normality Test in Jupyter notebook with rpy2

1.9k views Asked by At

I am interested in Applying Henze-Zirkler's Multivariate Normality Test in python 3x and I was wondering if I may do so in python in Jupyter notebook.

I have fitted a VAR model with my data and the then I would like to test whether the residuals from this fitted VAR model are normally distributed.

How may I do so in Jupyter notebook using python?

3

There are 3 answers

1
rsc05 On BEST ANSWER

This is another answer since I discover this method later. If you do not want to import the library of R into Python. One may call the output of R to python. i.e. one is capable of activating R function through python as follow:

import rpy2.robjects as robjects
from rpy2.robjects import r
from rpy2.robjects.numpy2ri import numpy2ri
from rpy2.robjects.packages import importr
import numpy as np

suppose that resi is a Dataframe in python say

# Create data
resi = pd.DataFrame(np.random.random((108, 2)), columns=['Number1','Number2'])

Then the code is as follow

#Converting the dataframe from python to R

# firt take the values of the dataframe to numpy
resi1=np.array(resi, dtype=float)

# Taking the variable from Python to R
r_resi = numpy2ri(resi1)

# Creating this variable in R (from python)
r.assign("resi", r_resi)

# Calling libraries in R 
r('library("MVN")')

# Calling a function in R (from python)
r("res <- hzTest(resi, qqplot = F)")

# Retrieving information from R to Python
r_result = r("res")

# Printing the output in python
print(r_result)

This will generate the output:

 Henze-Zirkler's Multivariate Normality Test 

--------------------------------------------- 

  data : resi 



  HZ      : 2.841424 

  p-value : 1.032563e-06 



  Result  : Data are not multivariate normal. 

---------------------------------------------

Update per 2021-08-25 There has been some API changes both to the MVN package and ro rpy2. The following works with MVN version 5.9 and rpy2 version 3.4.

"""Interface file to access the R MVN package"""

import numpy as np
import rpy2.robjects.packages as rpackages
from rpy2.robjects import numpy2ri
from rpy2.robjects.packages import importr
from rpy2.robjects.vectors import StrVector


# Install packages, if they are not already installed
packages_to_install_if_needed = ("MVN",)
utils = rpackages.importr("utils")
utils.chooseCRANmirror(ind=1)  # select the first mirror in the list
names_to_install = [x for x in packages_to_install_if_needed if not rpackages.isinstalled(x)]
if len(names_to_install) > 0:
    utils.install_packages(StrVector(names_to_install))

# load the package
mvn = importr("MVN")

# Generate data
np_arr = np.random.multivariate_normal(np.ones(2), np.eye(2), size=100)

# activate automatic conversion from numpy to rpy2 interface objects
numpy2ri.activate()

# perform the work
res = mvn.mvn(np_arr)
print(res)

outputting

$multivariateNormality
           Test        HZ   p value MVN
1 Henze-Zirkler 0.3885607 0.8343017 YES

$univariateNormality
              Test  Variable Statistic   p value Normality
1 Anderson-Darling  Column1     0.2443    0.7569    YES
2 Anderson-Darling  Column2     0.3935    0.3692    YES

$Descriptives
    n      Mean   Std.Dev    Median       Min      Max      25th     75th
1 100 0.9619135 1.0353688 1.0222279 -1.994833 3.679615 0.2696537 1.758255
2 100 0.7664778 0.9134449 0.8121996 -1.568635 2.648268 0.2068718 1.418113
        Skew    Kurtosis
1 -0.2123274 -0.16171832
2 -0.3718904 -0.05279222
0
rsc05 On

There is a package in R that already does this test and it is called MVN

The first thing you have to do is to import MVN into python as described in here

Then go to your jupyter notebook and fit the VAR(1) model to your data as so

# Fit VAR(1) Model

results = Model.fit(1)
results.summary()

Store the residuals as resi

resi=results.resid

Then

# Call function from R
import os
os.environ['R_USER'] = '...\Lib\site-packages\rpy2'
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()

from rpy2.robjects.packages import importr

MVN = importr("MVN", lib_loc = "C:/.../R/win-library/3.3")

After importing MVN you can simply do the normality test as so

MVNresult =MVN.hzTest(resi, qqplot = 0)

If you press on

type(MVNresult)

you will find that it is an

rpy2.robjects.methods.RS4

Therefore, in this case you will find this link a very powerful in explaining the details

Then afterwards

tuple(MVNresult.slotnames())

This will show you the observations

('HZ', 'p.value', 'dname', 'dataframe')

Then you may get the values as so

np.array(MVNresult.slots[tuple(MVNresult.slotnames())[i]])[0]

where i stands for 0, 1, 2, 3 as 'HZ', 'p-value',...

So in case the p-value i.e. i=1 is less than 0.05 then residuals (resi) are not multivariate normal at 5% confidence level.

0
Ezequiel Castaño On

There is an open source Python package called Pingouin that provides Henze-Zirkler multivariate normality test and is tested against R's MVM.

https://pingouin-stats.org/generated/pingouin.multivariate_normality.html

Example extracted from the docs:

import pingouin as pg
data = pg.read_dataset('multivariate')
X = data[['Fever', 'Pressure', 'Aches']]
pg.multivariate_normality(X, alpha=.05)
>>> HZResults(hz=0.5400861018514641, pval=0.7173686509624891, normal=True)