My mongoimport runs to infinity

513 views Asked by At

I did a mongorestore of a gzipped mongodump:

mongorestore -v --drop --gzip --db bigdata /Volumes/Lacie2TB/backup/mongo20170909/bigdata/

But it kept going. I left it, because I figure if I 'just' close it now, my (important) data will be corrupted. Check the percentages:

2017-09-10T14:45:58.385+0200    [########################]  bigdata.logs.sets.log  851.8 GB/85.2 GB  (999.4%)
2017-09-10T14:46:01.382+0200    [########################]  bigdata.logs.sets.log  852.1 GB/85.2 GB  (999.7%)
2017-09-10T14:46:04.381+0200    [########################]  bigdata.logs.sets.log  852.4 GB/85.2 GB  (1000.0%)

And it keeps going!

Note that the other collections have finished. Only this one goes beyond 100%. I do not understand.

This is mongo 3.2.7 on Mac OSX.

There is obviously a problem with the amount of data imported, because there is not even that much diskspace.

$ df -h
Filesystem      Size   Used  Avail Capacity   iused     ifree %iused Mounted on
/dev/disk3     477Gi  262Gi  214Gi    56%  68749708  56193210   55%   /

The amount of disk space used could be right, because the gzipped backup is about 200GB. I do not know if this would result in the same amount of data on the WiredTiger database with snappy compression.

However, the log keeps showing inserts:

2017-09-10T16:20:18.986+0200 I COMMAND  [conn9] command bigdata.logs.sets.log command: insert { insert: "logs.sets.log", documents: 20, writeConcern: { getLastError: 1, w: 1 }, ordered: false } ninserted:20 keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 locks:{ Global: { acquireCount: { r: 19, w: 19 } }, Database: { acquireCount: { w: 19 } }, Collection: { acquireCount: { w: 19 } } } protocol:op_query 245ms
2017-09-10T16:20:19.930+0200 I COMMAND  [conn9] command bigdata.logs.sets.log command: insert { insert: "logs.sets.log", documents: 23, writeConcern: { getLastError: 1, w: 1 }, ordered: false } ninserted:23 keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 locks:{ Global: { acquireCount: { r: 19, w: 19 } }, Database: { acquireCount: { w: 19 } }, Collection: { acquireCount: { w: 19 } } } protocol:op_query 190ms

update

Disk space is still being consumed. This is roughly 2 hours later, and roughly 30 GB later:

$ df -h
Filesystem      Size   Used  Avail Capacity   iused     ifree %iused  Mounted on
/dev/disk3     477Gi  290Gi  186Gi    61%  76211558  48731360   61%   /

The question is: Is there a bug in the progress indicator, or is there some kind of loop that keeps inserting the same documents?

Update

It finished.

2017-09-10T19:35:52.268+0200    [########################]  bigdata.logs.sets.log  1604.0 GB/85.2 GB  (1881.8%)
2017-09-10T19:35:52.268+0200    restoring indexes for collection bigdata.logs.sets.log from metadata
2017-09-10T20:16:51.882+0200    finished restoring bigdata.logs.sets.log (3573548 documents)
2017-09-10T20:16:51.882+0200    done

604.0 GB/85.2 GB (1881.8%)

Interesting. :)

1

There are 1 answers

0
Redsandro On

It looks similar to this bug: https://jira.mongodb.org/browse/TOOLS-1579

There seems to be a fix backported to 3.5 and 3.4. The fix might not be backported to 3.2. I'm thinking the problem might have something to do with using gzip and/or snappy compression.