Python equivalent to R poly() function?

2.5k views Asked by At

I'm trying to understand how to replicate the poly() function in R using scikit-learn (or other module).

For example, let's say I have a vector in R:

a <- c(1:10)

And I want to generate 3rd degree polynomial:

polynomial <- poly(a, 3)

I get the following:

              1           2          3
[1,] -0.49543369  0.52223297 -0.4534252
[2,] -0.38533732  0.17407766  0.1511417
[3,] -0.27524094 -0.08703883  0.3778543
[4,] -0.16514456 -0.26111648  0.3346710
[5,] -0.05504819 -0.34815531  0.1295501
[6,]  0.05504819 -0.34815531 -0.1295501
[7,]  0.16514456 -0.26111648 -0.3346710
[8,]  0.27524094 -0.08703883 -0.3778543
[9,]  0.38533732  0.17407766 -0.1511417
[10,]  0.49543369  0.52223297  0.4534252

I'm relatively new to python and I'm trying understand how to utilize the PolynomiaFeatures function in sklearn to replicate this. I've spent time time looking at examples at the PolynomialFeatures documentation but I'm still a bit confused.

Any insight would be greatly appreciated. Thanks!

2

There are 2 answers

4
K. A. Buhr On BEST ANSWER

It turns out that you can replicate the result of R's poly(x,p) function by performing a QR decomposition of a matrix whose columns are the powers of the input vector x from the 0th power (all ones) up to the pth power. The Q matrix, minus the first constant column, gives you the result you want.

So, the following should work:

import numpy as np

def poly(x, p):
    x = np.array(x)
    X = np.transpose(np.vstack((x**k for k in range(p+1))))
    return np.linalg.qr(X)[0][:,1:]

In particular:

In [29]: poly([1,2,3,4,5,6,7,8,9,10], 3)
Out[29]: 
array([[-0.49543369,  0.52223297,  0.45342519],
       [-0.38533732,  0.17407766, -0.15114173],
       [-0.27524094, -0.08703883, -0.37785433],
       [-0.16514456, -0.26111648, -0.33467098],
       [-0.05504819, -0.34815531, -0.12955006],
       [ 0.05504819, -0.34815531,  0.12955006],
       [ 0.16514456, -0.26111648,  0.33467098],
       [ 0.27524094, -0.08703883,  0.37785433],
       [ 0.38533732,  0.17407766,  0.15114173],
       [ 0.49543369,  0.52223297, -0.45342519]])

In [30]: 
0
AChervony On

The answer by K. A. Buhr is full and complete.

The R poly function also calculates interactions of different degrees of the members. That's why I was looking for the R poly equivalent.
sklearn.preprocessing.PolynomialFeatures Seems to provide such, you can do the np.linalg.qr(X)[0][:,1:] step after to get the orthogonal matrix.

Something like this:

import numpy as np
import pprint
import sklearn.preprocessing
PP = pprint.PrettyPrinter(indent=4)

MATRIX = np.array([[ 4,  2],[ 2,  3],[ 7,  4]])
poly = sklearn.preprocessing.PolynomialFeatures(2)
PP.pprint(MATRIX)
X = poly.fit_transform(MATRIX)
PP.pprint(X)

Results in:

array([[4, 2],
       [2, 3],
       [7, 4]])
array([[ 1.,  4.,  2., 16.,  8.,  4.],
       [ 1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  7.,  4., 49., 28., 16.]])