non-conformable arrays when passing numpy array to R via rpy2

487 views Asked by At

I am trying to pass a numpy array to the GAMLSS package in R.

import numpy as np
import rpy2.robjects as robjects
from rpy2.robjects import numpy2ri
numpy2ri.activate()
r = robjects.r
r.library("gamlss")
r.library("gamlss.mx")

L = r['data.frame'](np.array(np.random.normal(size=1000), 
                             dtype=([('x', np.float), ('y', np.float), ('z', np.float)])))
r.gamlssMX(robjects.Formula('z~1'), data=L)

Running this returns

Error in y0 - f0 : non-conformable arrays

Yet I can pass the data frame to the linear model R function.

lm = r.lm(robjects.Formula('x~y'), data=L)
print r.summary(lm.rx())

I have got a load of code that reads a binary file in Python but would like to use the R package, hence the need for rpy2.

-- EDIT --

As an example in R:

x <- data.frame(z=c(rnorm(1000), rnorm(1000, mean=4)))
gamlssMX(z~1, K=1, data=x)
1

There are 1 answers

0
CT Zhu On

Looks like it is a bug, if I use the now depreciated pandas.rpy.common.convert_to_r_dataframe, it works fine:

But the currently preferred method raises error:

import numpy as np
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
import pandas.rpy.common as com

robjects.reval("library('gamlss')")
robjects.reval("library('gamlss.mx')")

R =pd.DataFrame({'x': np.random.random(2000)})
A1 = pandas2ri.pandas2ri(R)
A2 = com.convert_to_r_dataframe(R)
robjects.r.assign('B1', A1)
robjects.r.assign('B2', A2)
robjects.reval("m <- gamlssMX(x~1, K=1, data=B1)") #won't work
robjects.reval("m <- gamlssMX(x~1, K=1, data=B2)") #works fine

There is only one line of difference: use com.convert_to_r_dataframe or pandas2ri.pandas2ri. Looks like the current version has a bug.

The newer pandas2ri.pandas2ri method results in rpy2.robjects.vectors.Array and the older com.convert_to_r_dataframe results in rpy2.robjects.vectors.FloatVector.

In [3]:

robjects.r.B1
Out[3]:
<DataFrame - Python:0x10e868a28 / R:0x10f425238>
[Array]
  x: <class 'rpy2.robjects.vectors.Array'>
  <Array - Python:0x10e868b48 / R:0x10f425400>
[0.051728, 0.149642, 0.884797, ..., 0.485063, 0.733193, 0.134963]
In [4]:

robjects.r.B2
Out[4]:
<DataFrame - Python:0x10e868cf8 / R:0x110e1b918>
[FloatVector]
  x: <class 'rpy2.robjects.vectors.FloatVector'>
  <FloatVector - Python:0x10e868e18 / R:0x10f442400>
[0.051728, 0.149642, 0.884797, ..., 0.485063, 0.733193, 0.134963]

Looks like gamlss raise an exception when the data vector is Array instead of FloatVector.