nginx / sails.js: incomplete file upload

1.3k views Asked by At

We are developing an app using sails.js. In this app we have an upload controller:
https://github.com/makinacorpus/mnhn_bai/blob/master/api/controllers/Object3DController.js

This controller use skipper under the hood, as explained in the documentation.

Now the problem is that when we upload big files, they are incompletely stored, the uploaded size is never the same and varies from 7mb to 14mb for a 15 mb file.

The architecture is as follow:
haproxy -> nginx -> node.js/sails.

If we replace the nginx reverse proxy by a simple apache + proxypass configuration, the uploads work flawlessly.

If we replace the node.js app by a simple python upload controller (in flask, eg) the upload is also showing the correct length and data.

Of course nginx has been correctly configured for the buffer sizes, client_body_timeout, and client_max_body_size, and as i said, the flask have is just receiving correctly the upload.

The upload with the nginx app result in a 200 response, so it seems the file was uploaded, but in fact, on the disk, the file is incomplete.

On the nginx debug log we can see that:

2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header:
"POST /admin/edit_object/6 HTTP/1.1^M
Host: xxxxxx.makina-corpus.net^M
X-Real-IP: xxxx^M
X-Forwarded-For: xxxxx^M
X-NginX-Proxy: true^M
X-Forwarded-Proto: http^M
Connection: upgrade^M
Content-Length: 15361775^M
Origin: http://bai.makina-corpus.net^M
User-Agent: Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) CasperJS/1.1.0-beta3+PhantomJS/1.9.8 Safari/534.34^M
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryRt4v4f7RkrlzUEX2^M
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8^M
Referer: http://xxxxxxxxxx.makina-corpus.net/admin/edit_object/6^M
Cookie: sails.sid=s%3Akv_Gxxxxxxxx2F5iaDWA^M
Accept-Encoding: gzip^M
Accept-Language: en,*^M
Authorization: Basic xxxx=^M
^M
"
2014/12/03 01:57:23 [debug] 39583#0: *1 http cleanup add: 00000000011CC520
2014/12/03 01:57:23 [debug] 39583#0: *1 init keepalive peer
2014/12/03 01:57:23 [debug] 39583#0: *1 get keepalive peer
2014/12/03 01:57:23 [debug] 39583#0: *1 get rr peer, try: 1
2014/12/03 01:57:23 [debug] 39583#0: *1 get keepalive peer: using connection 0000000001156018
2014/12/03 01:57:23 [debug] 39583#0: *1 http upstream connect: -4
2014/12/03 01:57:23 [debug] 39583#0: *1 http upstream send request
2014/12/03 01:57:23 [debug] 39583#0: *1 chain writer buf fl:0 s:806
2014/12/03 01:57:23 [debug] 39583#0: *1 chain writer buf fl:1 s:15361775
2014/12/03 01:57:23 [debug] 39583#0: *1 chain writer in: 00000000011CC5C0
2014/12/03 01:57:23 [debug] 39583#0: *1 tcp_nopush
2014/12/03 01:57:23 [debug] 39583#0: *1 writev: 806
2014/12/03 01:57:23 [debug] 39583#0: *1 sendfile: @0 15361775
2014/12/03 01:57:23 [debug] 39583#0: *1 sendfile: 2776864, @0 2776864:15361775
2014/12/03 01:57:23 [debug] 39583#0: *1 chain writer out: 00000000011CC5D0
2014/12/03 01:57:23 [debug] 39583#0: *1 event timer add: 35: 60000:1417568303245
2014/12/03 01:57:23 [debug] 39583#0: *1 http run request: "/admin/edit_object/6?"
2014/12/03 01:57:23 [debug] 39583#0: *1 http request empty handler
2014/12/03 01:57:23 [debug] 39583#0: *1 http upstream request: "/admin/edit_object/6?"
2014/12/03 01:57:23 [debug] 39583#0: *1 http upstream send request handler
2014/12/03 01:57:23 [debug] 39583#0: *1 http upstream send request
2014/12/03 01:57:23 [debug] 39583#0: *1 chain writer in: 00000000011CC5D0
2014/12/03 01:57:23 [debug] 39583#0: *1 sendfile: @2776864 12584911
2014/12/03 01:57:23 [debug] 39583#0: *1 sendfile: 2488810, @2776864 2488810:12584911
2014/12/03 01:57:23 [debug] 39583#0: *1 chain writer out: 00000000011CC5D0
2014/12/03 01:57:23 [debug] 39583#0: *1 event timer del: 35: 1417568303245
2014/12/03 01:57:23 [debug] 39583#0: *1 event timer add: 35: 60000:1417568303254
2014/12/03 01:57:23 [debug] 39583#0: *1 http upstream request: "/admin/edit_object/6?"
2014/12/03 01:57:23 [debug] 39583#0: *1 http upstream process header
2014/12/03 01:57:23 [debug] 39583#0: *1 malloc: 00000000011CD000:262144
2014/12/03 01:57:23 [debug] 39583#0: *1 recv: fd:35 369 of 262144
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy status 200 "200 OK"
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "X-Powered-By: Sails <sailsjs.org>"
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "Access-Control-Allow-Origin: "
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "Access-Control-Allow-Credentials: "
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "Access-Control-Allow-Methods: "
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "Access-Control-Allow-Headers: "
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "Content-Type: application/json; charset=utf-8"
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "Content-Length: 33"
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "Vary: Accept-Encoding"
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "Date: Wed, 03 Dec 2014 00:57:23 GMT"
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "Connection: keep-alive"
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header done
2014/12/03 01:57:23 [debug] 39583#0: *1 uploadprogress error-tracker error: 0
2014/12/03 01:57:23 [debug] 39583#0: *1 xslt filter header
2014/12/03 01:57:23 [debug] 39583#0: *1 HTTP/1.1 200 OK^M
Server: nginx^M
Date: Wed, 03 Dec 2014 00:57:23 GMT^M

The problem seems that skipper never seems to hit the 'finish' event at the upstream level, infinite loop ? The sails stdout

Parser: Done reading textparam through field `gallery`
Parser: Done reading textparam through field `category`
Parser: Done reading textparam through field `copyright`
Parser: Done reading textparam through field `published`
Parser: Done reading textparam through field `filename_3D`
Parser: Done reading textparam through field `filename_flat`
Parser: Done reading textparam through field `preview`
Parser: Done reading textparam through field `preview_animated`
Something is trying to read from Upstream `media_files`...
Passing control to app...
User allowed : admin ( 1 )
RenamerPump:
• dirname => undefined
• field => media_files
• fd => 04cb80ba-dce6-4a1d-9b54-ac8b08ca3e06
100
100
100
100
100
100
100
100
100
100
100

The interesting thing is that the file on the disk after upload contains at the end the headers from another unrelated request:

07bb890: 3130 3130 3130 3130 3130 3130 3130 3130  1010101010101010
07bb8a0: 3130 3130 3130 3130 3130 3130 3130 3130  1010101010101010
07bb8b0: 3130 3130 3130 3130 3130 3130 3130 3130  1010101010101010
07bb8c0: 3130 3130 3130 3130 3130 3130 4745 5420  101010101010GET
07bb8d0: 2f20 4854 5450 2f31 2e31 0d0a 486f 7374  / HTTP/1.1..Host
07bb8e0: xxxx xxxx xxxx xxxx xxxx xxxx 2d63 6f72  : xxx.makina-cor
07bb8f0: 7075 732e 6e65 740d 0a58 2d52 6561 6c2d  pus.net..X-Real-
07bb900: 4950  
07bb910: 3134 0d0a 582d 466f 7277 6172 6465 642d  14..X-Forwarded-
07bb920: 466f                                     For: xxxxxxxxxxx
07bb930: 2e31                              
07bb940: 2e31 340d 0a58 2d4e 6769 6e58 2d50 726f  .14..X-NginX-Pro
07bb950: 7879 3a20 7472 7565 0d0a 582d 466f 7277  xy: true..X-Forw
07bb960: 6172 6465 642d 5072 6f74 6f3a 2068 7474  arded-Proto: htt
07bb970: 700d 0a43 6f6e 6e65 6374 696f 6e3a 2075  p..Connection: u
07bb980: 7067 7261 6465 0d0a 5573 6572 2d41 6765  pgrade..User-Age
07bb990: 6e74 3a20 4d6f 7a69 6c6c 612f 352e 3020  nt: Mozilla/5.0
07bb9a0: 2855 6e6b 6e6f 776e 3b20 4c69 6e75 7820  (Unknown; Linux
07bb9b0: 7838 365f 3634 2920 4170 706c 6557 6562  x86_64) AppleWeb
07bb9c0: 4b69 742f 3533 342e 3334 2028 4b48 544d  Kit/534.34 (KHTM
07bb9d0: 4c2c 206c 696b 6520 4765 636b 6f29 2043  L, like Gecko) C
07bb9e0: 6173 7065 724a 532f 312e 312e 302d 6265  asperJS/1.1.0-be
07bb9f0: 7461 332b 5068 616e 746f 6d4a 532f 312e  ta3+PhantomJS/1.
07bba00: 392e 3820 5361 6661 7269 2f35 3334 2e33  9.8 Safari/534.3
07bba10: 340d 0a41 6363 6570 743a 2074 6578 742f  4..Accept: text/
07bba20: 6874 6d6c 2c61 7070 6c69 6361 7469 6f6e  html,application
07bba30: 2f78 6874 6d6c 2b78 6d6c 2c61 7070 6c69  /xhtml+xml,appli
07bba40: 6361 7469 6f6e 2f78 6d6c 3b71 3d30 2e39  cation/xml;q=0.9
07bba50: 2c2a 2f2a 3b71 3d30 2e38 0d0a 4163 6365  ,*/*;q=0.8..Acce
07bba60: 7074 2d45 6e63 6f64 696e 673a 2067 7a69  pt-Encoding: gzi
07bba70: 700d 0a41 6363 6570 742d 4c61 6e67 7561  p..Accept-Langua
07bba80: 6765 3a20 656e 2c2a 0d0a 4175 7468 6f72  ge: en,*..Author
07bba90: 697a 6174 696f 6e3a 2042 6173 6963 2063  ization: Basic c
07bbaa0: 6d39 xxxx xxxx xxxx xxxx 3d0d 0a         xxxx=..
(END)

And in other requests, we have not some other requests headers, but just an incomplete file. Here, the missing bits are from the end of the original file, the start is always correct.

Note that the main difference with apache is that nginx is sending data is quick big bursts to the sails app. On the contrary apache is streaming the request. This is because nginx does request buffering.

If someone has an idea from where to continue in skipper to dig out that upload problem !

If i replace the save method by this example, i see that the bits coming from nginx are written correctly, i have the full and correct file in the POSTed data, so the error is clearly somewhere in skipper request consumption

var body = "";
req.on('data', function (chunk) {
  body += chunk;
});
req.on('end', function () {
  console.log('POSTed: ' + body.length);
  console.log('POSTed: ' + body.slice(-400));
  res.writeHead(200);
  res.end('<html/>');
});
3

There are 3 answers

0
kiorky On BEST ANSWER

So, the solution i found is to hack a bodyparser which use formidable.

No more problem :).

For the record, it was a bit of a hack to switch the bodyparser in the middlewares:

config/http.js

module.exports.http = {
  middleware: {
    bodyParser: false,
    cbodyParser: require('../bodyParser')(
        {urls: [/\/admin\/edit_object/]}),
    order: [
     'startRequestTimer',
     'cookieParser',
     'session',
     'cbodyParser',
     'handleBodyParserError',
     'compress',
     'methodOverride',
     'poweredBy',
     '$custom',
     'router',
     'www',
     'favicon',
     '404',
     '500'
   ],    
  }
};

bodyparser.js:

/**
 * Module dependencies
  // Configure body parser components
 */
var _ = require('lodash');
var util = require('util');
var formidable = require('formidable');

function mime(req) {
  var str = req.headers['content-type'] || '';
  return str.split(';')[0];
}

function parseMultipart(req, res, next) {
  req.form = new formidable.IncomingForm();
  req.form.uploadDir = sails.config.data.__uploadData;
  req.form.maxFieldsSize = sails.config.maxsize;
  req.form.multiple = true;
  // res.setTimeout(0);
  req.form.parse(req, function(err, fields, files) {
    if (err)
        return next(err);
    else {
      req.files = files;
      req.fields = fields;
      req.body = extend(fields, files);
      next();
    }
  });
}

function extend(target) {
  var key, obj;
  for (var i = 1, l = arguments.length; i < l; i++) {
    if ((obj = arguments[i])) {
      for (key in obj)
        target[key] = obj[key];
    }
  }
  return target;
}

function disable_parser(opts, req, res)  {
    var matched = false;
    try {
        var method = null;
        try {method = req.method.toLowerCase();}
        catch(err){ /* */}
        if(method) {
            _(opts.urls).forEach(function(turl) {
                if (method === 'post' && req.url.match(turl)) {
                    // console.log("matched"+ req.url);
                    if(!matched) matched = true;
                };});
        }
    } catch(err) { debug(err);/* pass */ }
    return matched;
}

module.exports = function toParseHTTPBody(options) {
  options = options || {};
  var bodyparser = require('skipper')(options);
  // NAME of anynonymous func IS IMPORTANT (same as the middleware in config) !!!
  return function cbodyParser(req, res, next) {
    var err_hdler = function(err) {};
    if (disable_parser(options, req, res) && mime(req) == 'multipart/form-data') {
        return parseMultipart(req, res, next);
    } else {
        return bodyparser(req, res, next);
    }
  };
};

Indeed, sails let to think that we can override the bodyParser, but we cant as it will result in an anonymous function but the express router only map "named" function...

1
myusuf On

We faced a similar issue. I dont know if our solution will work for you or not, but here goes.

For very large files, the csrf gets left out of request packet. So we need to send the csrf in request header rather than request body. For that we changed the XMLHttpRequest a little bit.

/*
Putting csrf in Header as some large
files need this mechanism to upload
*/
(function() {
  var send = XMLHttpRequest.prototype.send,
  token = csrfToken;   //csrfToken is global
  XMLHttpRequest.prototype.send = function(data) {
    this.setRequestHeader('X-CSRF-Token', token);
    return send.apply(this, arguments);
  };
}());

From now, every request will have csrf in the header. This solved the problem for us. Hope this helps you too.

1
Maxim Dounin On

From the nginx debug log it seems that the problem is due to early return of response from the backend - note that in the last sendfile() call nginx was able to send only 2488810 out of 12584911 bytes it tried to:

...
2014/12/03 01:57:23 [debug] 39583#0: *1 chain writer in: 00000000011CC5D0
2014/12/03 01:57:23 [debug] 39583#0: *1 sendfile: @2776864 12584911
2014/12/03 01:57:23 [debug] 39583#0: *1 sendfile: 2488810, @2776864 2488810:12584911
2014/12/03 01:57:23 [debug] 39583#0: *1 chain writer out: 00000000011CC5D0
2014/12/03 01:57:23 [debug] 39583#0: *1 event timer del: 35: 1417568303245
2014/12/03 01:57:23 [debug] 39583#0: *1 event timer add: 35: 60000:1417568303254
2014/12/03 01:57:23 [debug] 39583#0: *1 http upstream request: "/admin/edit_object/6?"
2014/12/03 01:57:23 [debug] 39583#0: *1 http upstream process header
2014/12/03 01:57:23 [debug] 39583#0: *1 malloc: 00000000011CD000:262144
2014/12/03 01:57:23 [debug] 39583#0: *1 recv: fd:35 369 of 262144
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy status 200 "200 OK"
2014/12/03 01:57:23 [debug] 39583#0: *1 http proxy header: "X-Powered-By: Sails <sailsjs.org>"
...

And the backend returned an 200 OK answer. At this point nginx thinks that there is no reason to send the rest of the request body and stops sending it - this what causes incomplete uploaded files. Additionally, you have keepalive upstream connections configured, and you are hitting this bug - and this is why you see headers of an unrelated request.

Teaching the backend code to only send a response after the request is fully read, as in your test code, should resolve the problem.