Strange ECONNRESET error I cannot figure out

5.1k views Asked by At

I do not know, if this is related to koa, or is problem of some other npm module or something else. I am going to start from here.

So to the problem. I am having REST api written in koa v1. We are running node server in the Docker image. One of the endpoints we have, starts the import and returns the status 200 with message: "import started", and when the import finishes, we send Slack message to notify us.

So first I tested the server on my local machine, everything works (endpoint does not throw any errors). Then I built docker image. I run container localy, everything works (endpoint does not throw any errors). I deploy my image to Mesos environment, everything works so far. Container runs, every endpoint works, beside import endpoint. When I call it, after few seconds (5 to 10), I get ECONNRESET error, the running container gets killed and new running instance is started. So import is terminated.

At the beginning we assigned 128 MB ram to the docker container and that seems to be enough. After import error occurred, we thought maybe OOM killed process. So we decided to check dmesg and we could not find any log entries related to the OOM and the process of the running container. Then we checked ram usage of the container locally (with htop) and found out it uses aprox. 250+ MB, so we decided to add more ram in marathon config (512 MB). That however did not help, same error occurred.

Because the error was not explicit enough we installed longjohn module, so we could get more detailed error message. That got us just a little bit more information, but not as much as we thought it would.

Error: read ECONNRESET
      at exports._errnoException (util.js:1026:11)
      at TCP.onread (net.js:569:26)
  ---------------------------------------------
      at Application.app.callback (/src/node_modules/koa/lib/application.js:130:45)
      at Application.app.listen (/src/node_modules/koa/lib/application.js:73:39)
      at Promise.then.result (/src/server.js:97:13)


  Error: read ECONNRESET
      at exports._errnoException (util.js:1026:11)
      at TCP.onread (net.js:569:26)

Line 97 of the server.js is:

 96:if(!module.parent) {
 97:    app.listen(port, (err) => {
 98:        if (err) {
 99:            console.error('Server error', err);
100:        }
101:        console.log('Listening on the port', port);
102:    });
103:}

So what exactly happens in the endpoint logic. We are using postgres npm module pg. We are passing pg.Pool to the context, so later we can use it in our models. We are executing insert query encapsulated in promise and push promises in the array. There are roughly 2700+ records. Later we do Promise.all on the array of promises and with then we send the message to Slack.

As you can see I do not know if the error is related to koa or pg or some other thing. What is more intriguing is that locally everything works (node server as well as in docker container), but on Mesos it does not. How can I find out what is wrong?

  • version of koa npm module: 1.2.0
  • version of pg npm module: 6.1.0
  • version of Postgres 9.5
  • version of Mesos: 1.0.1
2

There are 2 answers

0
daniyel On BEST ANSWER

Thanks to another developer, we found out what was the cause of the ERROR. We used all connections in the pool when there was an import running.

When the marathon was requesting the service status at the time of the import, service tried to connect to the database to test the connection and at that time the connection to the database was terminated. Service became unhealthy and marathon restarted the service. We re-factored the import code. We are limiting the number of pool connections.

0
Ted On

According to this github issue this is an error caused by tiny-lr.

It seems that downgrading to version 0.2.1 stops it, but this is usually a dependency of other packages you're using that you've got no control over. You might be able to filter out the error by displaying all errors except this, as such:

if (error.code !== 'ECONNRESET') { console.log(error) }

The issue is still open, and dates from Oct 27, 2016. Don't know if it will get fixed or not. But as far as feedback goes, it doesn't seem like a dangerous error, or to have any impact whatsoever. But heh, I'd rather fix mine too, if there was a way.