I am trying to run an instance of weaviate, but am running across an issue with memory consumption. I have weaviate running in a docker container with 16GB of memory, which looking in the documentation seems like it would be enough for over 1M records (I am using 384 dim vectors just like in the example).
The application connecting to weaviate is constantly inserting and querying for data. The memory usage continues to go up until eventually running out of memory and the docker container dies. This is only around 20k records.
Is this a problem with garbage collection never happening?
UPDATE:
The version of weaviate in question is 1.10.1 and not currently using any modules. Incoming records already have vectors so no vectorizer is being used. The application searches for records similar to the incoming record based on some metadata and nearVector filters then inserts the incoming record. I will be upgrading to 1.12.1 to see if this helps at all, but in the meantime here are some of the suggested memory measurements.
7k records:
docker stats memory usage: 2.56GB / 16GB
gc 1859 @750.550s 0%: 0.33+33+0.058 ms clock, 26+1.2/599/1458+4.6 ms cpu, 2105->2107->1102 MB, 2159 MB goal, 80P
gc 1860 @754.322s 0%: 0.17+34+0.094 ms clock, 13+1.0/644/1460+7.5 ms cpu, 2150->2152->1126 MB, 2205 MB goal, 80P
gc 1861 @758.598s 0%: 0.39+35+0.085 ms clock, 31+1.4/649/1439+6.8 ms cpu, 2197->2199->1151 MB, 2253 MB goal, 80P
11k records:
docker stats memory usage: 5.46GB / 16GB
gc 1899 @991.964s 0%: 1.0+65+0.055 ms clock, 87+9.9/1238/3188+4.4 ms cpu, 4936->4939->2589 MB, 5062 MB goal, 80P
gc 1900 @999.496s 0%: 0.17+58+0.067 ms clock, 13+2.8/1117/3063+5.3 ms cpu, 5049->5052->2649 MB, 5178 MB goal, 80P
gc 1901 @1008.717s 0%: 0.38+65+0.072 ms clock, 30+2.7/1242/3360+5.7 ms cpu, 5167->5170->2710 MB, 5299 MB goal, 80P
17k records:
docker stats memory usage: 11.25GB / 16GB
gc 1932 @1392.757s 0%: 0.37+110+0.019 ms clock, 30+4.6/2130/6034+1.5 ms cpu, 10426->10432->5476 MB, 10694 MB goal, 80P
gc 1933 @1409.740s 0%: 0.14+108+0.052 ms clock, 11+0/2075/5666+4.2 ms cpu, 10679->10683->5609 MB, 10952 MB goal, 80P
gc 1934 @1427.611s 0%: 0.31+116+0.10 ms clock, 25+4.6/2249/6427+8.2 ms cpu, 10937->10942->5745 MB, 11218 MB goal, 80P
20k records:
docker stats memory usage: 15.22GB / 16GB
gc 1946 @1658.985s 0%: 0.13+136+0.077 ms clock, 10+1.1/2673/7618+6.1 ms cpu, 14495->14504->7600 MB, 14866 MB goal, 80P
gc 1947 @1681.090s 0%: 0.28+148+0.045 ms clock, 23+0/2866/8142+3.6 ms cpu, 14821->14829->7785 MB, 15201 MB goal, 80P
GC forced
gc 16 @1700.012s 0%: 0.11+2.0+0.055 ms clock, 8.8+0/20/5.3+4.4 ms cpu, 3->3->3 MB, 7MB goal, 80P
gc 1948 @1703.901s 0%: 0.41+147+0.044 ms clock, 33+0/2870/8153+3.5 ms cpu, 15181->15186->7973 MB, 15570 MB goal, 80P
gc 1949 @1728.327s 0%: 0.29+156+0.048 ms clock, 23+18/3028/8519+3.9 ms cpu, 15548->15553->8168 MB, 15946 MB goal, 80P
pprof
flat flat% sum% cum cum%
7438.24MB 96.88% 96.88% 7438.74MB 96.88% github.com/semi-technologies/weaviate/adapters/repos/db/inverted.(*Searcher).docPointersInvertedNoFrequency.func1
130.83MB 1.70% 98.58% 7594.13MB 98.91% github.com/semi-technologies/weaviate/adapters/repos/db/inverted.(*Searcher).DocIDs
1MB 0.013% 98.59% 40.55MB 0.53% github.com/semi-technologies/weaviate/adapters/repos/vector/hnsw.(*hnsw).Add
0 0% 98.59% 65.83MB 0.86% github.com/go-openapi/runtime/middleware.NewOperationExecutor.func1
UPDATE 2:
Problem still exists after upgrading to 1.12.1
Since you mentioned it crashes at around 20k records already, there should not be a reason for running OOM. Also, at 1M records, 16GB of mem should be plenty, so I'm sure there must be another reason that we can spot.
First we need some information about your setup:
v1.12.1
. Please makes sure to use the latest version to rule out that you are running into any issue that has already been fixed.Profiling
Please update your original post with the profiling results.
To investigate memory issues we need some profiles. We can take those from the outside (what does the OS see?) and from the inside (what does the Go runtime see). There is typically a difference between those two. This is because memory that has been freed up by the GC may not have been released to the OS yet.
Preparations for Profiling
GODEBUG
to the valuegctrace=1
. This will make Go's Garbage collector verbose and log any GC activity to the console. It will say when it runs, what the heap was like before and after, as well as printing the next goal size.6060
of the Weaviate container. This will allow generating debug reports from Go's profiler from within.Profiling cycle
Immediately after startup (when APIs are ready), run
docker stats
to print the initial usage of memory of the entire docker setup. This will help us know how much memory is used by each container initially. Please add the result to your question.Start importing.
In regular intervals (e.g. if you anticipate a crash at 20k elements, I would start at 10k elements imported and take a snapshot every 3k elements), save the output of the following commands into separate files
docker stats
so we see the OS' perspectiveThe closer you can get to the moment it crashes with those profiles, the more meaning they will have.
(Optional) If the previous steps confirmed that the heap was indeed used up entirely, e.g. close to 16GB of heap usage, now the interesting question is "What was on the heap for it to run out so early?". This question can be answered by using the go
pprof
tool and the port6060
we exposed earlier. For this you need to install a local Go runtime. Alternatively you can run the commands from within a docker container that has a Go runtime if you don't want to install Go on your host machine. In this case make sure the container can access the Weaviate container, e.g. by putting them in the same Docker network. From the go runtime run the following commandgo tool pprof -top http://localhost:6060/debug/pprof/heap
. Similar to step 3, the closer you can run this command to the moment it crashes, the more meaning it will have. (Note my examples assumes you are running this from the host machine and port 6060 exposes the Weaviate container's port 6060. If you are running this from inside a Docker network with another container adjust the hostname accordingly, e.g.http://weaviate:6060/...
, etc.)Once you have obtained all these profiles and edited your original post with the profiles, I'm happy to edit this post with some notes on how to interpret them.
In summary, you should be providing the following artifacts:
docker stats
output from after startup before importingdocker stats
pprof
heap profile.