I am using cupy to create two very large sparse matrices, I grab one column from one of the matrices and perform a dot product on this with the other sparse matrix.
I need to keep these matrices in my GPU memory, they take about 1.6gb, when I run my function 1000 times it takes 20 seconds to run which is fine, but then to run it 10,000 times it takes 600 seconds which is much slower than what I expected, I was hoping it would scale linearly and take 200 seconds. However, after about 1000 calls the memory seems to have issues, or at least the functions performance heavily drops off, weirdly when this happens the 3D utilization kicks in when I look at task manager, it goes from 0% utilization when it's running well to 80/100% when it's running slowly, I'm not sure if this is relevant. To try help this I run the function 1000 times and then I do this:
cp.get_default_memory_pool().free_all_blocks()
cp.get_default_pinned_memory_pool().free_all_blocks()
gc.collect()
in order to stop the memory overflowing and essentially reset the system to the state the system was in before my function was called, however, what happens to my memory usage is shown in the attached image.Image of GPU memory usuage . You can see the dedicated gpu memory resets to the levels I expect to store the two large sparse matrices, so does the shared memory usuage. But when I call the function 1000 times again the dedicated gpu memory goes back to the previous level when called 1000 times but the shared memory goes on increasing, this happens every subsequent 1000 calls. Why does this happen and what can I do to stop this?
I have tried using different amounts of calls, so maybe rather than doing 1000 calls before resetting I do 100 calls, or even 10 calls, but nothing works at all it seems.
Just for more context this is the function I am running:
def calculate_top_matches(user_id,user_filter_data, user_feature_matrix, user_preference_matrix_transposed, user_id_to_index, user_index_to_id, user_filters, top_n=125):
try:
candidate_indices = apply_filters(user_id, user_filters, user_filter_data)
candidate_feature_matrix = user_feature_matrix[candidate_indices, :]
similarity_scores = (candidate_feature_matrix.dot(user_preference_matrix_transposed.getcol(user_id_to_index.get(user_id)))).T.tocsr()
del candidate_feature_matrix
top_n = min(top_n, len(candidate_indices))
similarity_scores_top_n = cp.argpartition(similarity_scores.data, -top_n)
top_indices_data = similarity_scores_top_n[-top_n:]
del similarity_scores_top_n
# Map the indices from the data array back to the original indices
top_indices = similarity_scores.indices[top_indices_data].get()
del similarity_scores
del top_indices_data
top_user_ids = [user_index_to_id[candidate_indices[i]] for i in top_indices]
return top_user_ids