I have 4 tables with schema (app, text_id, title, text). Now I'd like to compute the cosine similarity between all possible text pairs (title & text concatenated) and store them eventually in a csv file with fields (app1, app2, text_id1, text1, text_id2, text2, cosine_similarity).
Since there are a lot of possible combinations it should run quite efficient. What is the most common approach here? I'd appreciate any pointers.
Edit: Although the provided reference might touch my problem, I still cant figure out how to approach this. Could someone provide more details on the strategy to accomplish this task? Next to the calculated cosine similarity I need also the corresponding text pairs as an output.
The following is a minimal example to calculate the pairwise cosine similarities between a set of documents (assuming you have successfully retrieved the title and text from your database).