Slow write to hdf5 file

26 views Asked by At

I have a problem of the following nature, there is an hdf5 pandas (PyTables).

with open('vector.h5', 'wb') as f:
    f.close()

vector = pd.HDFStore('vector.h5', mode='r+')


with open('compare.h5', 'wb') as f:
    f.close()

compare = pd.HDFStore('compare.h5', mode='r+')

In it, I iteratively write a vector with a dynamic length size from 20000 to 1.

for idx_row in range(length):
    
    array = []
    
    array_title = []
    
    for idx_column in range(idx_row, length):
        
        array.append(compare_vector_cosine(vector[keys[idx_row]][:512], vector[keys[idx_column]][:512]))
        
        array_title.append(keys[idx_column])
            
    compare[keys[idx_row]] = pd.Series(data=array, index=array_title)

For some reason, there is a slow write to the file. It is being created and slowly getting bigger in size.

I have no reason to say that there is no slow reading from another vector file.

why is this happening? what parameters should I add or change?

However, when writing to another file where the vector is static, this happens quickly.

for idx, img in enumerate(IMAGE_TITLE):

    vector[img] = pd.Series(np.concatenate([embedding_main['embedding'], 
                                            embedding_second,
                                            embedding_main['pose'],
                                            np.array([embedding_main['gender']], axis=0))
0

There are 0 answers