Hangfire .NET Core ExpirationManager Error Bogging Down Application and Connection Pool

524 views Asked by At

This began out of the blue about two days ago with no rhyme or reason and has been completely bogged down my PG instance.
I have been getting a recurring constant Hangfire error message below that states it basically cannot connect to the PG instance because there are no more available connections.

Things I have tried so far:

  • I have added this global attribute to my startup but it seems to be ignored as you can see by the attempt number on the below error message: GlobalJobFilters.Filters.Add(new AutomaticRetryAttribute { Attempts = 0 });
  • I have attempted restart the PG instance but all the queries come right back from Hangfire
  • I have attempted to restart the pods hosting the Hangfire environment
  • I have cleared the failed jobs through the Hangfire API
  • I reset the PG DB by truncating them all and resetting the LastUpdatedId
  • If I stop the applications and restart the DB instance the constant calls seem to go away, so it leads me to believe that Hangfire is the problem and not the database but I am not 100% sure.

Here is the constant recurring error message:

      Error occurred during execution of 'SQL Records Expiration Manager' process. Execution will be retried (attempt #150) in 00:05:00 seconds.
Npgsql.NpgsqlException: Exception while reading from stream ---> System.IO.IOException: Unable to read data from the transport connection: Connection timed out. ---> System.Net.Sockets.SocketException: Connection timed out
   at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
   at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   at Npgsql.ReadBuffer.<Ensure>d__27.MoveNext()
   --- End of inner exception stack trace ---
   at Npgsql.ReadBuffer.<Ensure>d__27.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlConnector.<DoReadMessage>d__147.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
   at Npgsql.NpgsqlConnector.<ReadMessage>d__146.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
   at Npgsql.NpgsqlConnector.<ReadExpecting>d__153`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
   at Npgsql.NpgsqlDataReader.<NextResult>d__32.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlDataReader.NextResult()
   at Npgsql.NpgsqlCommand.<Execute>d__71.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
   at Npgsql.NpgsqlCommand.<ExecuteNonQuery>d__84.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery()
   at Dapper.SqlMapper.ExecuteCommand(IDbConnection cnn, CommandDefinition& command, Action`2 paramReader)
   at Dapper.SqlMapper.ExecuteImpl(IDbConnection cnn, CommandDefinition& command)
   at Dapper.SqlMapper.Execute(IDbConnection cnn, String sql, Object param, IDbTransaction transaction, Nullable`1 commandTimeout, Nullable`1 commandType)
   at Hangfire.PostgreSql.ExpirationManager.Execute(CancellationToken cancellationToken)
   at Hangfire.PostgreSql.ExpirationManager.Execute(BackgroundProcessContext context)
   at Hangfire.Server.AutomaticRetryProcess.Execute(BackgroundProcessContext context)

Second Error Message (New as of 7/14/2020):

      Error occurred during execution of 'RecurringJobScheduler' process. Execution will be retried (attempt #10) in 00:01:38 seconds.
System.TimeoutException: SetRangeInHash experienced timeout while trying to execute transaction
   at Hangfire.PostgreSql.PostgreSqlConnection.SetRangeInHash(String key, IEnumerable`1 keyValuePairs)
   at Hangfire.Server.RecurringJobScheduler.TryScheduleJob(JobStorage storage, IStorageConnection connection, String recurringJobId, IReadOnlyDictionary`2 recurringJob)
   at Hangfire.Server.RecurringJobScheduler.Execute(BackgroundProcessContext context)
   at Hangfire.Server.AutomaticRetryProcess.Execute(BackgroundProcessContext context)

There are 0 answers