Generate large file and send it

2.8k views Asked by At

I have a rather large .csv file (up to 1 million lines) that I want to generate and send when a browser requests it.

The current code I have is (except that I don't actually generate the same data):

class CSVHandler(tornado.web.RequestHandler): 
  def get(self):
    self.set_header('Content-Type','text/csv')
    self.set_header('content-Disposition','attachement; filename=dump.csv')  
    self.write('lineNumber,measure\r\n') # File header
    for line in range(0,1000000): 
      self.write(','.join([str(line),random.random()])+'\r\n') # mock data

app = tornado.web.Application([(r"/csv",csvHandler)])
app.listen(8080)

The problems I have with the method above are:

  • The web browser doesn't directly start downloading chunks that are sent. It hangs while the webserver seems to prepare the whole content.
  • The web server is blocked while it processes this request and makes other clients hang.
2

There are 2 answers

0
Ben Darnell On BEST ANSWER

By default, all data is buffered in memory until the end of the request so that it can be replaced with an error page if an exception occurs. To send a response incrementally, your handler must be asynchronous (so it can be interleaved with both the writing of the response and other requests on the IOLoop) and use the RequestHandler.flush() method.

Note that "being asynchronous" is not the same as "using the @tornado.web.asynchronous decorator"; in this case I recommend using @tornado.gen.coroutine instead of @asynchronous. This allows you to simply use the yield operator with every flush:

class CSVHandler(tornado.web.RequestHandler): 
    @tornado.gen.coroutine
    def get(self):
        self.set_header('Content-Type','text/csv')
        self.set_header('content-Disposition','attachment; filename=dump.csv')  
        self.write('lineNumber,measure\r\n') # File header
        for line in range(0,1000000): 
            self.write(','.join([str(line),random.random()])+'\r\n') # mock data
            yield self.flush()

self.flush() starts the process of writing the data to the network, and yield waits until that data has reached the kernel. This lets other handlers run and also helps manage memory consumption (by limiting how far ahead of the client's download speed you can get). Flushing after every line of a CSV file is a little expensive, so you may want to only flush after every 100 or 1000 lines.

Note that if there is an exception once the download has started, there is no way to show an error page to the client; you can only cut the download off partway through. Try to validate the request and do everything that is likely to fail before the first call to flush().

0
CodingPenguins On
  • For your first problem, you need to flush() the given chunks to the output buffer.

    From the documentation (bolded for emphasis):

    RequestHandler.write(chunk)[source]

    Writes the given chunk to the output buffer.

    To write the output to the network, use the flush() method below.

  • Concerning your application hang, you're serving the request from the main thread, so everything will wait for your operation to finish. You should instead be using Tornado's iostream for this operation. From the tornado.iostream documentation:

    tornado.iostream — Convenient wrappers for non-blocking sockets Utility classes to write to and read from non-blocking files and sockets.