I have a collection with ~800,000 documents, I am trying to fetch all of them, 5,000 at a time.
When running the following code:
const CHUNK_SIZE = 5000;
let skip = 0;
do {
matches = await dbClient
.collection(collectionName)
.find({})
.skip(skip)
.limit(CHUNK_SIZE)
.toArray();
// ... some processing
skip += CHUNK_SIZE;
} while (matches.length)
After about 30 iterations, I start getting documents I already received in a previous iteration.
What am I missing here?
As posted in the comments, you'll have to apply a
.sort()
on the query. To do so without adding too much performance overhead it would be easiest to do this on the_id
e.g.Neither MongoDB or the AmazonDocumentDB flavor guarantees implicit result sort ordering without it.
Amazon DocumentDB
Mongo Result Ordering