How to know when a PSGI nonblocking streaming writer is ready for more data in a PSGI compatible way?

70 views Asked by At

I'm writing a PSGI middleware and currently running on the Twiggy server. The middleware deals with large (>2GB) dynamically created files and utilising the asynchronous streaming ability of Twiggy/AnyEvent.

The PSGI Specification says very briefly in regards to streaming responses:

... the responder MUST return yet another object which implements write and close methods. ...

Digging through the Twiggy code, it uses AnyEvent::Handle::push_write to implement the above write method. This will eat all your RAM if you keep feeding it large amounts of data faster than you can write it out to the network though.

Of course AnyEvent::Handle has methods and utilises callbacks to deal with buffer size (ie on_drain event handler to indicate when write buffer is empty and wbuf_max to limit the write buffer size).

However using these features would be very server specific and limit the portability of a PSGI application. The PSGI spec doesn't seem to cover an API for controlling/monitoring asynchronous write streams or accessing the underlying filehandle/descriptor for manual checking.

How do others address memory usage/buffering or knowing when the asynchronous write is complete in a way which which is 'compatible' across PSGI web servers? Any pointers would be great.

1

There are 1 answers

0
drclaw On

As a follow up, I thought I would post a simplified version of how I got around my issue in case it helps someone else.

Using the {handle} element in the AnyEvent::Handle used in writer, I manually set the callbacks for on_drain and on_error.

The on_drain is called when the write buffer is empty. So the handler enables my data generation code to continue generating data.

When the data generation callback is called, data is written to the response and it disables/pauses the data generation.

The cycle continues when the on_drain handler once again enables the data generation.

This keeps the memory usage of the writer in check, using minimum memory now to process the large streaming responses. I still seem to have some slow leaking memory issues, but that is probably deep I'm my code elsewhere.

sub call {
    my ($self,$env)=@_;
  
    #URL/path matching here
    
    my $myASYNCObject;        #Complicated async object setup 

    my $onDrain= sub {               #on_drain handler
        $myAsyncObject->continue;    #tell generation code to continue
    };

    return sub {
        #Boilerplate for streaming response
        my $responder=shift;
        my $resCode=200;
        my $resHeaders=[...];
        my $writer=$responder->([$resCode,$resHeaders]);

        #Setup callback and start data generation
        $myAsyncObject->setCallback=sub{
            my $myData=shift;

            $writer->write->($myData);    #Write the data

            $myAsyncObject->pause;        #Tell generation code to pause
            
        };
        $writer->{handle}->on_drain(      #Setup on_drain handler
            sub { 
                $myAsyncObject->continue; #tell generation code to continue
            }
        );

        #Error handlers here...
    }
}