Python Daemon Process Memory Management

1.5k views Asked by At

I'm currently writing a Python daemon process that monitors a log file in realtime and updates entries in a Postgresql database based on their results. The process only cares about a unique key that appears in the log file and the most recent value it's seen from that key.

I'm using a polling approach,and process a new batch every 10 seconds. In order to reduce the overall set of data to avoid extraneous updates to the database, I'm only storing the key and the most recent value in a dict. Depending on how much activity there has been in the last 10 seconds, this dict can vary from 10-1000 unique entries. Then the dict gets "processed" and those results are sent to the database.

My main concern has revolves around memory management and the dict over time (days, weeks, etc). Since this is a daemon process that's constantly running, memory usage bloats based on the size of the dict, but never shrinks appropriately. I've tried reseting dict using a standard dereference, and the dict.clear() method after processing a batch, but noticed no changes in memory usage (FreeBSD/top). It seems that forcing a gc.collect() does recover some memory, but usually only around 50%.

Do you guys have any advice on how I should proceed? Is there something more I could be doing in my process? Feel free to chime in if you see a different road around the issue :)

1

There are 1 answers

0
Erik Cederstrand On

When you clear() the dict or del the objects referenced by the dict, the contained objects are still around in memory. If they aren't referenced anywhere, they can be garbage-collected, as you have seen, but garbage-collection isn't run explicitly on a del or clear().

I found this similar question for you: https://stackoverflow.com/questions/996437/memory-management-and-python-how-much-do-you-need-to-know. In short, if you aren't running low on memory, you really don't need to worry a lot about this. FreeBSD itself does a good job handling virtual memory, so even if you have a huge amount of stale objects in your Python program, your machine probably won't be swapping to the disk.