We encounter semi-weird behaviour in the MongoDB PHP Driver (v1.3) when executing a slow query. The driver seems to keep opening connections when requests are slow and I don't fully understand why. Maybe you guys have some suggestions here.
Here are some facts first:
- Website and database run on single Ubuntu 13.04 server
- Server is high-end 8 core 16GB RAM server
- MongoDB v2.2.4
- Website runs PHP 5.4
- Apache 2 as web server
- PHP Mongo driver 1.3.x (should be newest 1.3)
- Website uses Doctrine ODM
- Website has about 50 to 100 concurrent users at any moment
- ulimit open files (ulimit nofile) = 64000
Once a day a Memcache record expires and a slow query is done. This leads to PHP opening up to 800 connections to MongoDB (normally we have 10 open connections according to logs). Our website is almost fully Memcached, so our database doesn't have any other significant load. The 800 open connections make the website have 30 second loading times at first and throw several types of MongoExceptions (too many connections / socket exceptions) later on.
It's an ugly query with a group by. To be perfectly clear, we understand this query is slow and idiotic, and we're removing this query today. It's just not clear why it screws up the entire website. We use Doctrine as an abstraction layer, but this is the actual query on a 200,000 document database (3 fields per document: id/product/date)) according to the logs:
{"group":true,"keys":{"product":1},"initial":{"count":0},"reduce":"function (obj, prev) { prev.count++; }","options":[],"db":"Orders","collection":"History"}
After the query is done, its results are written to Memcache for 24h. So all new requests get it from Memcache, not from MongoDB. But still, it sticks around 800 connections, the problem does not solve itself and the website doesn't respond anymore after a while. It takes about 10 minutes to open these 800 connections.
It feels like a typical race condition. The query just doesn't feel heavy enough to actually cause a race condition on this server with this load. I mean, it feels like it shouldn't.
Okay, so the questions are:
- Why does PHP keep opening so many connections?
- Why can't MongoDB handle this (it shouldn't be THAT big of a problem, right?)
- Any other suggestions of what we should do?
- Should I set timeouts on the connections and queries to solve this or is it something else?
Reason I ask this is because our website is growing really fast and we're expecting way more traffic and MongoDB load in the future.
Thanks a lot in advance!
Given that you're invoking the
group
command instead of performing a basic read query, you may also be fighting against the JavaScript interpreter in MongoDB 2.2. It's not until 2.4 that the JavaScript interpreter was enhanced to support concurrent execution. If each of these group operations requires JS evaluation (at the very least for thereduce
function), you're looking at widespread resource starvation.I don't have any explanation for the "too many connection" exceptions. Even 800 concurrent connections falls well below MongoDB's limit of 20,000 (note: this is being removed for 2.6 in SERVER-8943).
One idea to refactor you application and avoid the
group
race condition would be to use a single document as a lock for a PHP process to recompute the result and refill the cache. UsingfindAndModify
, you could have a single document with some string_id
(e.g. "Order.History group") and anotheractive
field. When a PHP process gets a cache miss and would need to recompute the result, it can first attempt to executefindAndModify
and find the appropriate_id
whereactive
isfalse
, updatingactive
totrue
in the same, atomic operation. Only after getting this lock document back should it proceed with thegroup
command. Other PHP processes that can't find the lock document (becauseactive
will not befalse
) could be instructed to sleep for a bit, return stale data, or abort the web request.