Keda ScaledJob Caching / Not Starting New Jobs

155 views Asked by At

Keda has been solid as a rock for us, however we've had some very strange issues related to scaling jobs after an initial scale.

We deploy our ScaledJobs using multiple versions listening to a unique redis queue per unique version. Each job is configured in the same namespace with a unique version name.

Configuration looks like this:

  Max Replica Count:  8
  Min Replica Count:  0
  Polling Interval:   15
  Rollout:
  Scaling Strategy:
  Successful Jobs History Limit:  0
  Triggers:
    Metadata:
      Enable TLS:         true
      Host:               [IP Address]
      List Length:        1
      List Name:          [List Name]
      Password From Env:  CELERY_PASS
      Port:               6378
    Type:                 redis

If we submit jobs in the queue, they will scale just fine for some, however subsequent submissions will sometimes not trigger scaling. The part which seems most suspect is that the operator logs show the metrics for the running pods, but shows 0 for pending, even though the redis list clearly has the items in it.

2023-10-27T18:24:10Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "[Scaled Job Name]", "scaledJob.Namespace": "[Namespace]", "Number of running Jobs": 2}
2023-10-27T18:24:10Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "[Scaled Job Name]", "scaledJob.Namespace": "[Namespace]", "Number of pending Jobs ": 0}

Is there some undocumented throttling / caching / timeout that might be causing this?

0

There are 0 answers