Timeout Notification for Asynchronous Request

280 views Asked by At

I am sending SPARQL queries as asynchronous requests to a SPARQL endpoint, currently DBpedia using the dotNetRDF library. While simpler queries usually work, more complex queries sometimes result in timeouts.

I am looking for a way to handle the timeouts by capturing some event when they occur.

I am sending my queries by using one of the asynchronous QueryWithResultSet overloads of the SparqlRemoteEndpoint class.

As described for SparqlResultsCallback, the state object will be replaced with an AsyncError instance if the asynchronous request failed. This does indicate that there was a timeout, however it seems that it only does so 10 minutes after the request was sent. When my timeout is, for example, 30 seconds, I would like to know 30 seconds later whether the request was successful. (35 seconds are ok, too, but you get the idea.)

Here is a sample application that sends two requests, the first of which is very simple and likely to succeed within the timeout (here set to 120 seconds), while the second one is rather complex and may easily fail on DBpedia:

using System;
using System.Collections.Concurrent;

using VDS.RDF;
using VDS.RDF.Query;

public class TestTimeout
{
    private static string FormatResults(SparqlResultSet results, object state)
    {
        var result = new System.Text.StringBuilder();

        result.AppendLine(DateTime.Now.ToLongTimeString());

        var asyncError = state as AsyncError;
        if (asyncError != null) {
            result.AppendLine(asyncError.State.ToString());
            result.AppendLine(asyncError.Error.ToString());
        } else {
            result.AppendLine(state.ToString());
        }

        if (results == null) {
            result.AppendLine("results == null");
        } else {
            result.AppendLine("results.Count == " + results.Count.ToString());
        }

        return result.ToString();
    }

    public static void Main(string[] args)
    {
        Console.WriteLine("Launched ...");
        Console.WriteLine(DateTime.Now.ToLongTimeString());

        var output = new BlockingCollection<string>();

        var ep = new SparqlRemoteEndpoint(new Uri("http://dbpedia.org/sparql"));
        ep.Timeout = 120;

        Console.WriteLine("Server == " + ep.Uri.AbsoluteUri);
        Console.WriteLine("HTTP Method == " + ep.HttpMode);
        Console.WriteLine("Timeout == " + ep.Timeout.ToString());

        string query = "SELECT DISTINCT ?a\n"
            + "WHERE {\n"
            + "  ?a <http://www.w3.org/2000/01/rdf-schema#label> ?b.\n"
            + "}\n"
            + "LIMIT 10\n";

        ep.QueryWithResultSet(query,
            (results, state) => {
                output.Add(FormatResults(results, state));
            },
            "Query 1");

        query = "SELECT DISTINCT ?v5 ?v8\n"
            + "WHERE {\n"
            + "  {\n"
            + "    SELECT DISTINCT ?v5\n"
            + "    WHERE {\n"
            + "      ?v6 ?v5 ?v7.\n"
            + "      FILTER(regex(str(?v5), \"[/#]c[^/#]*$\", \"i\")).\n"
            + "    }\n"
            + "    OFFSET 0\n"
            + "    LIMIT 20\n"
            + "  }.\n"
            + "  OPTIONAL {\n"
            + "    ?v5 <http://www.w3.org/2000/01/rdf-schema#label> ?v8.\n"
            + "    FILTER(lang(?v8) = \"en\").\n"
            + "  }.\n"
            + "}\n"
            + "ORDER BY str(?v5)\n";

        ep.QueryWithResultSet(query,
            (results, state) => {
                output.Add(FormatResults(results, state));
            },
            "Query 2");

        Console.WriteLine("Queries sent.");
        Console.WriteLine(DateTime.Now.ToLongTimeString());
        Console.WriteLine();

        string result = output.Take();
        Console.WriteLine(result);

        result = output.Take();
        Console.WriteLine(result);

        Console.ReadLine();
    }
}

When I run this, I reproducibly get an output like the following:

13:13:23
Server == http://dbpedia.org/sparql
HTTP Method == GET
Timeout == 120
Queries sent.
13:13:25

13:13:25
Query 1
results.Count == 10

13:23:25
Query 2
VDS.RDF.Query.RdfQueryException: A HTTP error occurred while making an asynchron
ous query, see inner exception for details ---> System.Net.WebException: Der Rem
oteserver hat einen Fehler zurückgegeben: (504) Gatewaytimeout.
   bei System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
   bei VDS.RDF.Query.SparqlRemoteEndpoint.<>c__DisplayClass13.<QueryWithResultSe
t>b__11(IAsyncResult innerResult)
   --- Ende der internen Ausnahmestapelüberwachung ---
results == null

Obviously, the exact times will be different, but the crucial point is that the error message based on the second query is received approximately 10 minutes after the request was sent, nowhere near the 2 minutes set for the timeout.

Am I using dotNetRDF incorrectly here, or is it intentional that I have to run an additional timer to measure the timeout myself and react on my own unless any response has been received meanwhile?

1

There are 1 answers

1
RobV On BEST ANSWER

No you are not using dotNetRDF incorrectly rather there appears to be a bug that the timeouts set on an endpoint don't get honoured when running queries asynchronously. This has been filed as CORE-393

By the way even with this bug fixed you won't necessarily get a hard timeout at the set timeout. Essentially the value you set for the Timeout property of the SparqlRemoteEndpoint instance that value is used to set the Timeout property of the .Net HttpWebRequest. The documentation for HttpWebRequest.Timeout states the following:

Gets or sets the time-out value in milliseconds for the GetResponse and GetRequestStream methods.

So you could wait up to the time-out to make the connection to POST the query and then up to the time-out again to start receiving a response. Once you start receiving a response the timeout becomes irrelevant and is not respected by the code that processes the response.

Therefore if you want a hard timeout you are better off implementing it yourself, longer term this may be something we can add to dotNetRDF but this is more complex to implement that simply fixing the bug about the timeout not getting honoured for the HTTP request.