diagnosing redis db failure (oom-killed)

60 views Asked by At

how can i debug oom killer? i have a bun-redis database with 100s of websocket connections storing realtime data directly to my db with data eviction policy of 10 days.

A process of this unit has been killed by the OOM killer.
Killing process 77506 (HeapHelper) with signal SIGKILL.
Failed with result 'oom-kill'.
Consumed 1d 22h 10min 6.378s CPU time.

~ ❯ free -h
               total        used        free      shared  buff/cache   available
Mem:            27Gi       7.4Gi        18Gi        11Mi       1.1Gi        19Gi
127.0.0.1:6379> MEMORY STATS
 1) "peak.allocated"
 2) (integer) 4547468208
 3) "total.allocated"
 4) (integer) 4547247816
 5) "startup.allocated"
 6) (integer) 1069176
 7) "replication.backlog"
 8) (integer) 0
 9) "clients.slaves"
10) (integer) 0
11) "clients.normal"
12) (integer) 1928
13) "cluster.links"
14) (integer) 0
15) "aof.buffer"
16) (integer) 0
17) "lua.caches"
18) (integer) 0
19) "functions.caches"
20) (integer) 184
21) "db.0"
22) 1) "overhead.hashtable.main"
    2) (integer) 241912
    3) "overhead.hashtable.expires"
    4) (integer) 88
23) "overhead.total"
24) (integer) 1313288
25) "keys.count"
26) (integer) 4408
27) "keys.bytes-per-key"
28) (integer) 1031347
29) "dataset.bytes"
30) (integer) 4545934528
31) "dataset.percentage"
32) "99.99462890625"
33) "peak.percentage"
34) "99.99514770507813"
35) "allocator.allocated"
36) (integer) 4547537240
37) "allocator.active"
38) (integer) 4547805184
39) "allocator.resident"
40) (integer) 4695977984
41) "allocator-fragmentation.ratio"
42) "1.000058889389038"
43) "allocator-fragmentation.bytes"
44) (integer) 267944
45) "allocator-rss.ratio"
46) "1.0325812101364136"
47) "allocator-rss.bytes"
48) (integer) 148172800
49) "rss-overhead.ratio"
50) "1.0022660493850708"
51) "rss-overhead.bytes"
52) (integer) 10641408
53) "fragmentation"
54) "1.0350526571273804"
55) "fragmentation.bytes"
56) (integer) 159392128

seems that my peak memory allocation caped at 4.5gb and i have plenty memory reserves after oom killed, i started the redis db again and this is what i see:

5:09:46.782 * <search> Loading event starts
5:09:46.782 * Loading RDB produced by version 255.255.255
5:09:46.782 * RDB age 53411 seconds
5:09:46.782 * RDB memory usage when created 23946.19 Mb
5:09:53.947 * Done loading RDB, keys loaded: 4408, keys expired: 0.
5:09:53.947 # <search> Skip background reindex scan, redis version contains loaded event.
5:09:53.947 * <search> Loading event ends
5:09:53.947 * DB loaded from disk: 7.166 seconds

how can i start to diagnose this?

0

There are 0 answers