In-place shuffle of a contiguous ragged array

64 views Asked by At

I have a ragged array represented as a contiguous block of memory along with its "shape", corresponding to the length of each row, and its "offsets", corresponding to the index of the first element in each row. To illustrate, I have an array conceptually like this:

[[0, 0, 0],
 [1, 1, 1, 1],
 [2],
 [3, 3],
 [4, 4]]

Represented in-memory as:

values: [0, 0, 0, 1, 1, 1, 1, 2, 3, 3, 4, 4]
shape: [3, 4, 1, 2, 2]
offsets: [0, 3, 7, 8, 10]

I may have in the hundreds of millions of rows with typically, say, 3-20 four-byte floats per row, though with no hard upper bound on the row length.

I wish to shuffle the rows of this array randomly. Since the array is ragged, I can't see how the Fisher-Yates algorithm can be applied in a straightforward manner. I see how I can carry out a shuffle by randomly permuting the array shape, pre-allocating a new array, and then copying over rows according to the permutation generating the new shape with some book-keeping on the indexes. However, I do not necessarily have the RAM required to duplicate the array for the purposes of this operation.

Therefore, my question is whether there is a good way to perform this shuffle in-place, or using only limited extra memory? Run-time is also a concern, but shuffling is unlikely to be the main bottleneck.

For illustration purposes, I wrote a quick toy-version in Rust here, which attempts to implement the shuffle sketched above with allocation of a new array.

2

There are 2 answers

0
rici On

shape is redundant since shape[i] is offset[i+1]-offset[i] (if you extend offset by one element containing the length of the values array). But since your data structure has both these fields, you could shuffle your array by just in-place shuffling the two descriptor vectors (in parallel), using F-Y. This would be slightly easier if shape and offset were combined into an array of pairs (offset, length), which also might improve locality of reference, but it's certainly not critical if you have some need for the separate arrays.

That doesn't preserve the contiguity of the rows in the values list, but if all your array accesses are through offset, it will not require any other code modification.

It is possible to implement an in-place swap of two variable-length subsequences using a variant of the classic rotate-with-three-reversals algorithm. Given P V Q, a sequence conceptually divided into three variable length parts, we first reverse P, V, and Q in-place independently producing PR VR QR. Then we reverse the entire sequence in place, yielding Q V P. (Afterwards, you'd need to fixup the offsets array.)

That's linear time in the length of the span from P to Q, but as a shuffle algorithm it will add up to quadratic time, which is impractical for "hundreds of millions" of rows.

1
btilly On

As often happens, I started with a complex idea and then simplified it. Here is the simple version, with the complex one below.

What we're going to do is quicksort it into a random arrangement. The key operation is partitioning. That is we want to take a section of m blocks and randomly partition them into m_l blocks on the left and m_r on the right.

The idea here is this. We have a queue of temporarily copied blocks on the left, and a queue of temporarily copied blocks on the right. It is a queue of blocks, but the queue size is the number of elements in it. The partitioning logic looks like this:

while m_l + m_r < m:
    pick the larger queue, breaking ties randomly
    if the queue is empty:
        read a block into it
    get block from queue
    if random() < m_l / (m_l + m_r):
        # put the block on the left
        until we have enough room to write the block:
            copy blocks into the left queue
        write block to the left
        m_l--
    else:
        # put the block on the right
        until we have enough room to write the block:
            copy blocks into the right queue
        write block to the right
        m_r--

And now we need to recursively repeat until we've quicksorted it into a random order.

Note that, unlike with a regular quicksort, we are constructing our partitions to be exactly the size we expect. So we don't have to recurse. Instead we can:

# pass 1
partition whole array

# pass 2
partition 0..(m//2-1)
partition (m//2)..(m-1)

# pass 3
partition 0..(m//4-1)
partition (m//4)..(m//2-1)
partition (m//2)..((3*m)//4-1)
partition ((3*m)//4)..(m-1)

etc.

The result is time O(n * log(m)). And the queues will never get past 5k data where k is the largest block size.


Here is an approach that we can calculate in time O(n log(n)). The maximum space needed is O(k) where k is the maximum block size.

First note, shape and offsets are largely redundant. Because shape[i] = offset[i+1] - offset[i] for all i but the last. So with O(1) extra data (which we already have in values.len()) we can make shape redundant, then (ab)use it, however we want, and then calculate it at the end.

So let's start by picking a random permutation of 0..(shape.len()-1) and placing it in shape. This will be where each element will go, and can be found in time O(n) using Fisher-Yates.

Our idea now is to use quicksort to actually get them to the right places.

First, our pivot. For O(n) work in a single pass we can add up the lengths of all blocks which will come before the median block, and also find the length of said median block.

Now quicksort is dependent upon being able to swap things. But we can't do that directly (your whole problem). However the idea is that we'll partition from the middle out. And so the values, shape and offsets arrays will have beginning and ending sections that we haven't gotten to, and a piece in the middle that we've rewritten. Where those sections meet we'll also need to have queues of blocks copied off of the left and right and not yet written back. And, of course, we'll need to have a record of where the boundaries are.

So the idea is this.

set up the data structures.
copy a few blocks in the middle to one of the queues - enough to have a place for the median block to go.
record where the median will go
while have not finished partitioning:
    pick the larger queue (break ties by farthest from its end, then randomly)
    if it is empty:
        read a block into it
    figure out where its next block needs to be written
    copy blocks in its way to the appropriate queue
    write the block out

Where writing the block out means writing its elements to the right place, setting its offset, and setting its shape to the still final location for that block.

This operation will partition around the median block. Recursively repeat to sort each side into blocks being in their final position.

And, finally, fix the shape array back to what it was supposed to be.

The time complexity of this is O(n log(n)) for the same reason that quicksort is. As for space complexity, if k is the largest block size, any time the queues get past size 4k then the next block you extract must be able to be written, so they cannot grow any farther. This makes the maximum space used O(k).