How to force NEO4j Garbage Collection?

843 views Asked by At

ADDENDUM TO ADDENDUM: I turned Garbage collection logging on and now I get lines and lines of the same error. First the log shows my looped dbms.clearQueryCaches() in a series of 50 or so.

2022-03-21 02:20:16.045+0000 INFO [o.n.k.i.p.Procedures] Called dbms.clearQueryCaches(): Query caches successfully cleared of 30 queries.
2022-03-21 02:20:44.349+0000 INFO [o.n.k.i.p.Procedures] Called dbms.clearQueryCaches(): Query caches successfully cleared of 30 queries.
2022-03-21 02:20:51.228+0000 INFO [o.n.k.i.p.Procedures] Called dbms.clearQueryCaches(): Query caches successfully cleared of 27 queries.
2022-03-21 02:20:58.766+0000 INFO [o.n.k.i.p.Procedures] Called dbms.clearQueryCaches(): Query caches successfully cleared of 30 queries.

then I get them interspersed with garbage collection errors:

2022-03-21 02:27:44.144+0000 INFO [o.n.k.i.p.Procedures] Called dbms.clearQueryCaches(): Query caches successfully cleared of 30 queries.
2022-03-21 02:27:46.335+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=115, gcTime=1809, gcCount=3}
2022-03-21 02:27:46.436+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=1682, gcTime=16, gcCount=1}
2022-03-21 02:27:48.321+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=1572, gcTime=13, gcCount=2}
2022-03-21 02:27:52.973+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=1594, gcTime=1668, gcCount=3}
2022-03-21 02:27:57.070+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=1694, gcTime=1733, gcCount=4}
2022-03-21 02:27:57.283+0000 INFO [o.n.k.i.p.Procedures] Called dbms.clearQueryCaches(): Query caches successfully cleared of 30 queries.
2022-03-21 02:27:59.375+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=1594, gcTime=14, gcCount=1}
2022-03-21 02:28:01.506+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=1771, gcTime=16, gcCount=1}
2022-03-21 02:28:03.655+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=249, gcTime=2060, gcCount=3}
2022-03-21 02:28:05.715+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=3760, gcTime=1985, gcCount=3}
2022-03-21 02:28:09.396+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=3464, gcTime=1798, gcCount=3}

It looks to me like there is no way to turn off garbage collection (its event is essentially determined by the heap size, as the growing of the heap to max size in conf file requires a full GC cycle).

I came across this neo4j manual bit that seems to say CALL System.gc() could force garbage collection, but then indicates "Do not to explicitly trigger "stop the world" events by calling System.gc()." Found this compelling article explaining why that is.

So I am just left with increasing heap. My heap is the default right now so I'll have to play with that next and see if increasing it will get rid of errors, though I suspect it will just increase the number of loops and run into the same issue just later on.

I found this 6yo old post that seems to have the same issue, so will study it more. Michael Hunger's response is "The root cause is that the code creates hundreds of Neo4j instances each with a full config, and then doesn't clearly shuts them down. And that off-heap page-cache is only released at JVM shutdown." Seems unfortunate.

Then I look for How to configure off-heap transaction state; but of course it's 3.5 and above(I am running 3.4.12.)

ADDENDUM:

I have inserted this query at end of each for loop:

CALL dbms.clearQueryCaches

Now I can run about 100 queries in a row before getting the error, so much improved (was about 20 before). I was reading the documentation and it looks like both buffer and garbage collection can cause issues in this type of scenario. Each query itself easily clears my Java heap space requirement, but when I run them in a for loop I run into the heap space error. Do you know what CALL dbms.clearQueryCaches actually does and what else like it there is to either force clearing buffer and garbage collection? I wish there were more documentation on these for non-java developers. (I understand I can increase the heap space, but as each of my queries is actually quite small I'd love to understand what's causing the issue in the first place.) TIA!!

==================ORIGINAL POST===================

I am running cypher scripts that are short enough on their own to not run into memory issues. I thought - great - I'll run them on a php loop and get my data in there quick, but then I get memory issues at around the 20th iteration. (Unfortunately I have to do this via web browser because I am using pre-existing scripts that rely on session data.)

PHP Fatal error:  Uncaught Neoxygen\\NeoClient\\Exception\\Neo4jException: Neo4j Exception with code "Neo.DatabaseError.General.UnknownError" and message "Java heap space" in /var/www/html/vendor/graphaware/neo4j-php-client/src/Extension/AbstractExtension.php:89\nStack trace:\n#0 /var/www/html/vendor/graphaware/neo4j-php-client/src/Extension/AbstractExtension.php(76): Neoxygen\\NeoClient\\Extension\\AbstractExtension->checkResponseErrors()\n#1 /var/www/html/vendor/graphaware/neo4j-php-client/src/Extension/NeoClientCoreExtension.php(94): Neoxygen\\NeoClient\\Extension\\AbstractExtension->handleHttpResponse()\n#2 [internal function]: Neoxygen\\NeoClient\\Extension\\NeoClientCoreExtension->sendCypherQuery()\n#3 /var/www/html/vendor/graphaware/neo4j-php-client/src/Extension/ExtensionManager.php(49): call_user_func_array()\n#4 /var/www/html/vendor/graphaware/neo4j-php-client/src/Client.php(127): Neoxygen\\NeoClient\\Extension\\ExtensionManager->execute()\n#5 myscript.php(1907): Neoxygen\\NeoClient\\Client->__call()\n#6 /var/www/html/_/save in /var/www/html/vendor/graphaware/neo4j-php-client/src/Extension/AbstractExtension.php on line 89

I at first thought it might have to do with the php client, so I took the scripts and placed them as multi-statement queries straight into the neo4j shell - and I got the same issue at the 44th iteration this time.

ServiceUnavailable: WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver. Please use your browsers development console to determine the root cause of the failure. Common reasons include the database being unavailable, using the wrong connection URL or temporary network problems. If you have enabled encryption, ensure your browser is configured to trust the certificate Neo4j is configured to use. WebSocket `readyState` is: 3

The logs read:

WARN  The client is unauthorized due to authentication failure.

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.VmPauseMonitor-1"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.Scheduler-1"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp1629265201-511"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Silent channel reaper-1"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp1629265201-100"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "CustomProcedureStorage"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "metrics-csv-reporter-1-thread-1"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.BoltNetworkIO-5"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.BoltNetworkIO-4"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.BoltNetworkIO-2"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.BoltNetworkIO-3"

ERROR [io.netty.util.concurrent.DefaultPromise.rejectedExecution] Failed to submit a listener notification task. Event loop shut down? event executor terminated
java.util.concurrent.RejectedExecutionException: event executor terminated

When this happens I have to stop and start the database, and the script that was running during the crash goes through without a hitch after that.

So it appears to be a cumulative memory issue. All this to ask - does garbage not get collected after each query in client to neo4j situations, and in multi-statement situations? Is there a way to force neo4j to "clear the cache" between queries?

I am clearly over my head and would be super grateful for any pointers. TIA!!!

1

There are 1 answers

0
RosarioB On

To clear the query cache in Neo4j you can use CALL db.clearQueryCaches.

I don't know if this will solve your problem, if it does not you could either change your script or increase the heap memory of neo4j.