large json response emitted from celery, not reaching flask

1k views Asked by At

I am working on a web application in which the user uploads a file, some calculation is performed on the file on server which takes a long time. Once the calculation is done the application displays the calculated results with the original file.

This works for small files but fails for large files. Basically, if the json response is small (checked for array of objects of size 3 and 11726) it is emitted to the browser but not for large response (size 507072). The uploaded file in the failed case has size ~3 MB and the json response contains all the data of uploaded file, so assuming that it is also of the same order.

On Celery side

I am seeing this in the case of small json response (11726 array size)

[2016-12-20 18:11:11,734: INFO/MainProcess] emitting event "response"     to all [/test]
[2016-12-20 18:11:13,151: INFO/MainProcess] Task  flask_application.restapi.background_thread[0457e7cb-f10b-408e-8f80-2bd95ce2c7b4] succeeded in 10.542778854s: None

Similarly, in the large json response:

[2016-12-20 18:17:08,788: INFO/MainProcess] emitting event "response" to all [/test]
[2016-12-20 18:18:42,813: INFO/MainProcess] Task flask_application.restapi.background_thread[309e0ed6-94ec-4332-a2ed-b6d9211bc263] succeeded in 366.478926406s: None

On the Flask side

For the small json response, I see the full emitted output, which finally is visible on browser also (this is not the full output as it was truncated):

35,"Formula":"CHNO"},{"NA Corrected with  zero":345,"Name":"XYZ","Label":"A","Sample":"M_6_6","Truth Value":false,"NA Corrected":323,"Formula":"CHNO"},{"NA Corrected with zero":422,"Name":"XYZ","Label":"A","Sample":"M_6_5","Truth Value":false,"NA Corrected":233,"Formula":"CHNO"},{"NA Corrected with zero":494,"Name":"XYZ","Label":"A","Sample":"M_6_4","Truth Value":false,"NA Corrected":4946,"Formula":"CHNO"},{"NA Corrected with zero":458,"Name":"L-Proline","Label":"A","Sample":"M_6_3","Truth Value":false,"NA Corrected":4586,"Formula":"CHNO"},{"NA Corrected with zero":493,"Name":"XYZ","Label":"A","Sample":"M_6_2","Truth Value":false,"NA Corrected":4935,"Formula":"CHNO"},{"NA Corrected with zero":4565,"Name":"XYZ","Label":"A","Sample":"M_6_1","Truth Value":false,"NA Corrected":45656,"Formula":"CHNO"}]}] [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:11PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]

But for the large response, there is nothing. It seems that flask is not receiving anything from celery:

20161220-18:16PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:16PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:16PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:17PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:17PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:17PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:17PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:18PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:18PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:18PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:18PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:19PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:19PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]
20161220-18:19PM INFO: 441d88fffb814e1696aa25eb99e97336: Received packet PING data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:44]
20161220-18:19PM INFO: 441d88fffb814e1696aa25eb99e97336: Sending packet PONG data None [in /Users/raaisa/.virtualenvs/nacorr/lib/python2.7/site-packages/engineio/socket.py:69]

I am using flask, flask-socketio, celery and redis. I have followed this blog to implement web sockets and long background task. Here is the relevant code.

#socketio_redis
'SOCKETIO_REDIS_URL' = 'redis://localhost:6379/0'

@socketio.on('nacorr', namespace='/test')
def test_connect(data):
    background_thread.delay(app.config['SOCKETIO_REDIS_URL'],data)

@celery.task
def background_thread(url, data):
    local_socketio = SocketIO(message_queue=url)
    local_socketio.emit('response', {'data': 'background task starting ...'}, namespace='/test')
    # do something with data  
    responsejson = some_func_resulting_in_json_obj()
    local_socketio.emit('response', responsejson, namespace='/test')

Any help is appreciated.

Update

redis key values of all the tasks are same (only different task ids):

127.0.0.1:6379> GET "celery-task-meta-691c61e6-ef2a-413c-9226-fac0be352928" 
"{\"status\": \"SUCCESS\", \"traceback\": null, \"result\": null,  \"task_id\": \"691c61e6-ef2a-413c-9226-fac0be352928\", \"children\": []}"
127.0.0.1:6379> GET "celery-task-meta-f01e0818-dc80-42e2-8ab0-2e4240f0e972"
"{\"status\": \"SUCCESS\", \"traceback\": null, \"result\": null, \"task_id\": \"f01e0818-dc80-42e2-8ab0-2e4240f0e972\", \"children\": []}"
127.0.0.1:6379> GET "celery-task-meta-5adc0627-3288-4d21-8bf7-6ee5964e1683"
"{\"status\": \"SUCCESS\", \"traceback\": null, \"result\": null, \"task_id\": \"5adc0627-3288-4d21-8bf7-6ee5964e1683\", \"children\": []}"

I was able to workaround this issue by dividing my response into chunks and emitting the smaller chunks, so probably there is some issue of buffer size somewhere.

1

There are 1 answers

0
Raaisa On BEST ANSWER

This problem was resolved by configuring redis pubsub client-buffer limits and flask-socketio ping timeout.

Monitor redis and flask

Monitored redis logs as suggested by miguel. Used the command PSUBSCRIBE * on redis-cli to monitor redis activity as a message queue.

For the small dataset, this came up on redis when the button was clicked to start the long running task:

ivery_tag\": \"433b78b8-7cb1-4665-96aa-a53fcc478c32\"}, \"content-  encoding\": \"utf-8\"}"
1) "pmessage"
2) "*"
3) "flask-socketio"
4) "(dp0\nS'skip_sid'\np1\nNsS'room'\np2\nNsS'namespace'\np3\nS'/test'\np4\nsS'event'\np5\nS'response'\np6\nsS'callback'\np7\nNsS'data'\np8\n(dp9\ng8\nS'background task starting ...'\np10\nssS'method'\np11\nS'emit'\np12\ns."

ected\np140697\nF45867838.165479094\nsVFormula\np140698\nVC5H9NO2\np140699\nsa(dp140700\nVNA Corrected with zero\np140701\nF49355484.84626437\nsVName\np140702\nVL-Proline\np140703\nsVLabel\np140704\nVC12 PARENT\np140705\nsVSample\np140706\nVSAMPLE_#SZ3G2FM_6_2\np140707\nsVTruth Value\np140708\nI00\nsVNA Corrected\np140709\nF49355484.84626437\nsVFormula\np140710\nVC5H9NO2\np140711\nsa(dp140712\nVNA Corrected with zero\np140713\nF45656465.639370896\nsVName\np140714\nVL-Proline\np140715\nsVLabel\np140716\nVC12 PARENT\np140717\nsVSample\np140718\nVSAMPLE_#SZ3G2FM_6_1\np140719\nsVTruth Value\np140720\nI00\nsVNA Corrected\np140721\nF45656465.639370896\nsVFormula\np140722\nVC5H9NO2\np140723\nsassS'method'\np140724\nS'emit'\np140725\ns."
1) "pmessage"
2) "*"
3) "flask-socketio"
4) "(dp0\nS'skip_sid'\np1\nNsS'room'\np2\nNsS'namespace'\np3\nS'/'\np4\nsS'event'\np5\nS'disconnect'\np6\nsS'callback'\np7\nNsS'data'\np8\n(dp9\ng8\nS'disconnect the socket'\np10\nssS'method'\np11\nS'emit'\np12\ns."
1) "pmessage"
2) "*"
3) "celery-task-meta-6b4da3b9-b887-4032-a0ce-5d19a38f75d3"

It is properly running the celery task and sending all the emit messages. The id of the task is "celery-task-meta-6b4da3b9-b887-4032-a0ce-5d19a38f75d3".

If the same task runs for a large file which generates a large repsonse, the process is not completed and redis-cli stops after some time:

3) "flask-socketio"
4) "(dp0\nS'skip_sid'\np1\nNsS'room'\np2\nNsS'namespace'\np3\nS'/test'\np4\nsS'event'\np5\nS'response'\np6\nsS'callback'\np7\nNsS'data'\np8\n(dp9\ng8\nS'background task starting ...'\np10\nssS'method'\np11\nS'emit'\np12\ns."
1) "pmessage"
2) "*"
3) "/0.celeryev/worker.heartbeat"
4) "{\"body\": \"eyJzd19zeXMiOiAiRGFyd2luIiwgImNsb2NrIjogMTY1MCwgInRpbWVzdGFtcCI6IDE0ODI3ODE1OTQuODI5MzY0LCAiaG9zdG5hbWUiOiAiY2VsZXJ5QFJhYWlzYXMtTWFjQm9vay1Qcm8ubG9jYWwiLCAicGlkIjogNDI0MSwgInN3X3ZlciI6ICI0LjAuMiIsICJ1dGNvZmZzZXQiOiAtNiwgImxvYWRhdmciOiBbMi4zNywgMi45MSwgMi45N10sICJwcm9jZXNzZWQiOiA4LCAiYWN0aXZlIjogMSwgImZyZXEiOiAyLjAsICJ0eXBlIjogIndvcmtlci1oZWFydGJlYXQiLCAic3dfaWRlbnQiOiAicHktY2VsZXJ5In0=\", \"headers\": {\"hostname\": \"[email protected]\"}, \"content-type\": \"application/json\", \"properties\": {\"priority\": 0, \"body_encoding\": \"base64\", \"delivery_info\": {\"routing_key\": \"worker.heartbeat\", \"exchange\": \"celeryev\"}, \"delivery_mode\": 1, \"delivery_tag\": \"d96e8e9e-032e-41ae-8e9c-64cf7c6bc17c\"}, \"content-encoding\": \"utf-8\"}"
1) "pmessage"
2) "*"
3) "/0.celeryev/worker.heartbeat"
4) "{\"body\": \"eyJzd19zeXMiOiAiRGFyd2luIiwgImNsb2NrIjogMTY1MiwgInRpbWVzdGFtcCI6IDE0ODI3ODE2MDkuODA5MzAyLCAiaG9zdG5hbWUiOiAiY2VsZXJ5QFJhYWlzYXMtTWFjQm9vay1Qcm8ubG9jYWwiLCAicGlkIjogNDI0MSwgInN3X3ZlciI6ICI0LjAuMiIsICJ1dGNvZmZzZXQiOiAtNiwgImxvYWRhdmciOiBbMi4yOSwgMi44NiwgMi45NV0sICJwcm9jZXNzZWQiOiA4LCAiYWN0aXZlIjogMSwgImZyZXEiOiAyLjAsICJ0eXBlIjogIndvcmtlci1oZWFydGJlYXQiLCAic3dfaWRlbnQiOiAicHktY2VsZXJ5In0=\", \"headers\": {\"hostname\": \"[email protected]\"}, \"content-type\": \"application/json\", \"properties\": {\"priority\": 0, \"body_encoding\": \"base64\", \"delivery_info\": {\"routing_key\": \"worker.heartbeat\", \"exchange\": \"celeryev\"}, \"delivery_mode\": 1, \"delivery_tag\": \"3160b37f-273b-4023-9bb6-5d89ebfe97a7\"}, \"content-encoding\": \"utf-8\"}"
user-MacBook-Pro:WebTool user$

But if on redis we set the following config: config set client-output-buffer-limit "pubsub 0 0 0", we see the large emit message in redis. Now another issue arises as flask websocket shows the error "Socket is closed". It is probably because of the time taken to emit the large response. Increasing the ping_timeout in SocketIO handles this and the reponse is emitted to frontend.

Divide the response into chunks:

The other solution to this problem is dividing the response into small chunks and emitting. It worked well for mthe above case.