rpy2 preserve metadata in FactorVector

79 views Asked by At

I have a script in python that loads .RData and reads it and then writes it out to an excel file. Unfortunately, one table that contains 11 variables and 144 objects with mixed types (IntVector, FactorVector, Float Vector, Float Vector,...etc.)

When the table writes to Excel, the column names and data are preserved, except for the column that is a four-level FactorVector. Instead of returning the metadata (a,a,a,a,b,b,b,b,c,c,c,c,d,d,d,d...etc.) associated with the four levels, it returns integer values associated with each level (1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4...etc.)

I found this on the rpy2 sourceforge website, which pretty much explains my problem.

Since a FactorVector is an IntVector with attached metadata (the levels), getting items Python-style was not changed from what happens when gettings items from a IntVector. A consequence to that is that information about the levels is then lost.

It goes on below this to explain using levels, at which point I get lost as to what exactly I should do or use to keep the metadata levels intact for the FactorVector variable in question.

I presume there some sort of rpy2.robjects "switch" that will preserve this metadata when it gets translated into python? What would be the most efficient way to to apply this? Thanks!

1

There are 1 answers

3
lgautier On BEST ANSWER

The conversion layer customers customized for pandas DataFrame in rpy2-2.6.0 should take care of converting R factors to Pandas factors.