I have a rather large .csv file (up to 1 million lines) that I want to generate and send when a browser requests it.
The current code I have is (except that I don't actually generate the same data):
class CSVHandler(tornado.web.RequestHandler):
def get(self):
self.set_header('Content-Type','text/csv')
self.set_header('content-Disposition','attachement; filename=dump.csv')
self.write('lineNumber,measure\r\n') # File header
for line in range(0,1000000):
self.write(','.join([str(line),random.random()])+'\r\n') # mock data
app = tornado.web.Application([(r"/csv",csvHandler)])
app.listen(8080)
The problems I have with the method above are:
- The web browser doesn't directly start downloading chunks that are sent. It hangs while the webserver seems to prepare the whole content.
- The web server is blocked while it processes this request and makes other clients hang.
By default, all data is buffered in memory until the end of the request so that it can be replaced with an error page if an exception occurs. To send a response incrementally, your handler must be asynchronous (so it can be interleaved with both the writing of the response and other requests on the IOLoop) and use the
RequestHandler.flush()
method.Note that "being asynchronous" is not the same as "using the
@tornado.web.asynchronous
decorator"; in this case I recommend using@tornado.gen.coroutine
instead of@asynchronous
. This allows you to simply use theyield
operator with every flush:self.flush()
starts the process of writing the data to the network, andyield
waits until that data has reached the kernel. This lets other handlers run and also helps manage memory consumption (by limiting how far ahead of the client's download speed you can get). Flushing after every line of a CSV file is a little expensive, so you may want to only flush after every 100 or 1000 lines.Note that if there is an exception once the download has started, there is no way to show an error page to the client; you can only cut the download off partway through. Try to validate the request and do everything that is likely to fail before the first call to flush().