Python numpy equivalent of R rep and rep_len functions

7.2k views Asked by At

I'd like to find the python (numpy is possible)-equivalent of the R rep and rep_len functions.

Question 1: Regarding the rep_len function, say I run,

rep_len(paste('q',1:4,sep=""), length.out = 7)

then the elements of vector ['q1','q2','q3','q4'] will be recycled to fill up 7 spaces and you'll get the output

[1] "q1" "q2" "q3" "q4" "q1" "q2" "q3"

How do I do recycle elements of a list or a 1-d numpy array to fit a predetermined length? From what I've seen numpy's repeat function lets you specify a certain number of reps, but doesn't repeat values to fill a predetermined length.

Question 2: Regarding the rep function, say I run,

rep(2000:2004, each = 3, length.out = 14)

then the output is

[1] 2000 2000 2000 2001 2001 2001 2002 2002 2002 2003 2003 2003 2004 2004

How could I make this (recycling elements of a list or numpy array to fit a predetermined length and list each element consecutively a predetermined number of times) happen using python?

I apologize if this question has been asked before; I'm totally new to stack overflow and pretty new to programming in general.

5

There are 5 answers

0
Raw Noob On

Commenting on Psidom's np_rep function, I believe that an additional feature of R's rep function (with the each= parameter) is that it'll recycle elements in the repeated vector until the length specified by length.out is achieved. For example,

rep(2000:2001, each = 4, length.out = 15)

returns

[1] 2000 2000 2000 2000 2001 2001 2001 2001 2000 2000 2000 2000 2001 2001[15] 2001

. In python, define np_rep as Psidom has defined it,

 def np_rep(x, repeat, length_out):
     return np.repeat(x, repeat)[:length_out]

and call

np_rep(list(range(2000,2002)), repeat = 4, length_out = 15)

, the output is

array([2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001])

; so the function doesn't recycle to achieve the desired length, but stops after the elements of parameter x have been repeated parameter repeat number of times.

I believe the following should work as a version that incorporates recycling:

def repeat_recycle(x, repeat, length_out):
    rep = lambda x,length_out,repeat:np.repeat(x,repeat)[:length_out]
    repeated = rep(x, length_out, repeat)
    if len(x)*repeat >= length_out:
        return repeated
    v = [None for i in range(length_out)]
    n = len(repeated)
    for i in range(length_out):
        v[i] = repeated[i%n]
    return np.array(v)

The call,

repeat_recycle(list(range(2000,2002)), repeat = 4, length_out = 15)

returns

array([2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2000, 2000, 2000,
   2000, 2001, 2001, 2001])

, which is the recycled version that fills up 15 elements.

It'll default to a lambda form of Psidom's np_rep function if length_out doesn't exceed the product of len(x) and repeat.

0
Psidom On

For rep_len, the similar numpy method is np.tile except that it doesn't provide a length.out parameter; But you can implement it pretty easily with slice:

x = ['q1', 'q2', 'q3', 'q4']
def np_rep_len(x, length_out):
    return np.tile(x, length_out // len(x) + 1)[:length_out]

np_rep_len(x, 7)
# array(['q1', 'q2', 'q3', 'q4', 'q1', 'q2', 'q3'], 
#       dtype='<U2')

For rep method, the numpy equivalent is numpy.repeat, also you need to implement the length.out with slice:

def np_rep(x, repeat, length_out):
    return np.repeat(x, repeat)[:length_out]

np_rep(x, 3, 10)
# array(['q1', 'q1', 'q1', 'q2', 'q2', 'q2', 'q3', 'q3', 'q3', 'q4'], 
#       dtype='<U2')
1
kpie On

You can use a combination of multiplication and slicing with python's builtin's implicit iteration if you like. (I know you wanted a numpy solution but I just figured this couldn't hurt...)

rep_len(paste('q',1:4,sep=""), length.out = 7)

translates to ->

(["q"+str(k) for k in range(1,5)]*(7/4+1))[:7]
1
user2357112 On

NumPy actually does provide an equivalent of rep_len. It's numpy.resize:

new_arr = numpy.resize(arr, new_len)

Note that the resize method pads with zeros instead of repeating elements, so arr.resize(new_len) doesn't do what you want.

As for rep, I know of no equivalent. There's numpy.repeat, but it doesn't allow you to limit the length of the output. (There's also numpy.tile for the repeat-the-whole-vector functionality, but again, no length.out equivalent.) You could slice the result, but it would still spend all the time and memory to generate the un-truncated array:

new_arr = numpy.repeat(arr, repetitions)[:new_len]
0
mwrowe On

numpy.repeat() acts like R's rep() function with each=True. When each=False, recycling can be implemented by transposition:

import numpy as np

def np_rep(x, reps=1, each=False, length=0):
    """ implementation of functionality of rep() and rep_len() from R

    Attributes:
        x: numpy array, which will be flattened
        reps: int, number of times x should be repeated
        each: logical; should each element be repeated reps times before the next
        length: int, length desired; if >0, overrides reps argument
    """
    if length > 0:
        reps = np.int(np.ceil(length / x.size))
    x = np.repeat(x, reps)
    if(not each):
        x = x.reshape(-1, reps).T.ravel() 
    if length > 0:
        x = x[0:length]
    return(x)

For, example, if we set each=True:

np_rep(np.array(['tinny', 'woody', 'words']), reps=3, each=True)

...we get:

array(['tinny', 'tinny', 'tinny', 'woody', 'woody', 'woody', 'words', 'words', 'words'], 
  dtype='<U5')

But when each=False:

np_rep(np.array(['tinny', 'woody', 'words']), reps=3, each=False)

...the result is:

array(['tinny', 'woody', 'words', 'tinny', 'woody', 'words', 'tinny', 'woody', 'words'], 
  dtype='<U5')

Note that x gets flattened, and the result is flattened as well. To implement the length argument, the minimum number of reps needed is calculated, and then the result is truncated to the desired length.