Background
I am using a MongoDB database to build a medical application, where drug information is stored. There are a few collections in this database, and one of them is for pairwise drug interactions. The data provided to me is on a per drug basis. That is, if A is a drug that interacts with B, C, and D, I get the pairs (A,B)
, (A,C)
, and (A,D)
. However, I get the same information again when parsing the input data for the drugs B, C, and D in the form of (B,A)
, etc.
Of course, the corresponding medical information is identical (i.e., A interacting with B produces the same reactions as B interacting with A).
Collection structure
The collection is structured as each document
having three fields: name1
, name2
, and description
.
While creating the collection for the first time and populating it, is there a way to index this collection so that (name1, name2)
is treated as a duplicate of (name2, name1)
(since they both will have the same description
)? I would like to not insert such duplicates in the collection.
P.S. I am using the MongoDB Java Driver 3.8 with MongoDB 4.0.3.
P.P.S Sample document and table index information added below:
{
"_id" : ObjectId("5be9eaeedb9c7a2836cdd48c"),
"name1" : "Lepirudin",
"name2" : "St. John's Wort",
"description" : "The metabolism of Lepirudin can be increased when combined with St. John's Wort."
}
I have an ascending index on name1
and name2
and a text index on description
. The above document is inserted for Lepirudin. I would like to avoid inserting the following document for St. John's Wort:
{
"_id" : ObjectId("5be9eaeedb9c7a2836cdd49e"),
"name1" : "St. John's Wort",
"name2" : "Lepirudin",
"description" : "The metabolism of Lepirudin can be increased when combined with St. John's Wort."
}