I'm facing a problem when trying to shuffle a multi-dimensional array with numpy. The problem can be reproduced with the following code:
import numpy as np
s=(300000, 3000)
n=s[0]
print ("Allocate")
A=np.zeros(s)
B=np.zeros(s)
print ("Index")
idx = np.arange(n)
print ("Shuffle")
idx = np.random.shuffle(idx)
print ("Arrange")
B[:,:] = A[idx,:] # THIS REQUIRES A LARGE AMOUNT OF MEMORY
When running this code (python 2.7 as well as python 3.6 with numpy 1.13.1 on win7 64bit), the execution of the last line of code is requiring a large amount of memory (~ 10 Gb), which sound strange to me.
Actually, I'm expecting the data to be copied from an array to another, both being pre-allocated, so I can understand that the copy will consume time, but not understand why it requires memory.
I guess I do something wrong but don't find what... maybe someone can help me?
From the
numpy
documentation under 'Index arrays':In other words, your assumption that your line
B[:,:] = A[idx,:]
(after correcting the line pointed out by @MSeifert) only induces copying of elements fromA
toB
is not correct. Insteadnumpy
first creates a new array from the indexedA
before copying its elements intoB
.Why the memory usage changes so much is beyond me. However, looking at your original array shape,
s=(300000,3000)
, this would, for 64 bit numbers, amount to roughly 6.7 GB, if I didn't calculate wrong. Thus creating that additional array, the extra memory usage actually seems plausible.EDIT:
Reacting to the OP's comments, I did a few tests concerning the performance of different ways to assign the shuffled rows of
A
toB
. First off, here a small test thatB=A[idx,:]
indeed creates a newndarray
, not just a view ofA
:So indeed, assigning new values to
b
leavesa
unchanged. Then I did a few timing tests concerning the fastest way to shuffle the rows ofA
and getting them intoB
:The results (min, max, mean) of 7 runs are:
In the end, a simple
for
-loop does not perform too badly, especially if you want to only assign part of the rows, not the entire array. Surprisinglynumba
does not seem to enhance performance.