Avoiding commutative duplicates in MongoDB

56 views Asked by At

Background

I am using a MongoDB database to build a medical application, where drug information is stored. There are a few collections in this database, and one of them is for pairwise drug interactions. The data provided to me is on a per drug basis. That is, if A is a drug that interacts with B, C, and D, I get the pairs (A,B), (A,C), and (A,D). However, I get the same information again when parsing the input data for the drugs B, C, and D in the form of (B,A), etc.

Of course, the corresponding medical information is identical (i.e., A interacting with B produces the same reactions as B interacting with A).

Collection structure

The collection is structured as each document having three fields: name1, name2, and description.

While creating the collection for the first time and populating it, is there a way to index this collection so that (name1, name2) is treated as a duplicate of (name2, name1) (since they both will have the same description)? I would like to not insert such duplicates in the collection.

P.S. I am using the MongoDB Java Driver 3.8 with MongoDB 4.0.3.

P.P.S Sample document and table index information added below:

{
    "_id" : ObjectId("5be9eaeedb9c7a2836cdd48c"),
    "name1" : "Lepirudin",
    "name2" : "St. John's Wort",
    "description" : "The metabolism of Lepirudin can be increased when combined with St. John's Wort."
}

I have an ascending index on name1 and name2 and a text index on description. The above document is inserted for Lepirudin. I would like to avoid inserting the following document for St. John's Wort:

{
    "_id" : ObjectId("5be9eaeedb9c7a2836cdd49e"),
    "name1" : "St. John's Wort",
    "name2" : "Lepirudin",
    "description" : "The metabolism of Lepirudin can be increased when combined with St. John's Wort."
}
0

There are 0 answers