What is the CNTK randomizationWindow behavior?

Question

What is the CNTK randomizationWindow behavior?

212 views Asked by Nathaniel Powell At 04 January 2017 at 23:03

I have a quick question about the randomizationWindow parameter of the reader. It says in the documentation it controls how much of the data is in memory – but I’m a little unclear what effect it will have on the randomness of the data. If the training data file starts with one distribution of data, and ends in another completely different distribution, will setting a randomization window smaller than the data size cause the data fed to the trainer not to be from a homogenous distribution? I just wanted to double check.

Original Q&A

There are 2 answers

eldakms On 05 January 2017 at 09:16

To give a bit more detail on randomization/IO:

All corpus/data is always splitted in chunks. Chunks help to make IO efficient, because all sequences of a chunk are read in one go (usually a chunk is 32/64MB).

When it comes to randomization, there are two steps there:

all chunks are randomized
given the randomization window of N samples the randomizer creates a rolling window of M chunks that in total have approximately N samples in them. All sequences inside this rolling window are randomized. When all sequences of a chunk have been processed, the randomizer can release it and start loading the next one asynchronously.

**chrisbasoglu** · Accepted Answer · 2017-01-04T23:06:35+00:00

chrisbasoglu On 04 January 2017 at 23:06 BEST ANSWER

When the randomizationWindow is set to a window smaller than the entire data size, the entire data size is chunked into randomizationWindow sized chunks and the order of chunks is randomized. Then within each chunk, the samples are randomized.

TechQA.

What is the CNTK randomizationWindow behavior?

There are 2 answers

Related Questions in CNTK

Popular Questions

Popular Tags

Trending Questions