I am trying to write a little python script that calculates the Tanimoto similarity index between a molecule of interest and a database of molecules. I am using pybel.
The database, in the .smi format, have chemical information of molecules on the first column and their names as a second one and looks like this:
C[C@]12CC[C@H](C1(C)C)CC2=O (-)-CAMPHOR
CC1=CC[C@H](C(=C)C)C[C@@H]1O (-)-CARVEOL
CC1=CC[C@H](CC1=O)C(=C)C (-)-CARVONE
O=CC[C@@H](C)CCC=C(C)C (-)-CITRONELLAL
OCC[C@@H](C)CCC=C(C)C (-)-CITRONELLOL
C[C@@H]1CC[C@@H](C(=C)C)C[C@H]1O (-)-DIHYDROCARVEOL
C[C@@]12CC[C@@H](C1)C(C2=O)(C)C (-)-Fenchone
C[C@@H]1CC[C@H]([C@@H](C1)O)C(C)C (-)-MENTHOL
C[C@@H]1CC[C@H](C(=O)C1)C(C)C (-)-MENTHONE
C[C@@H]1CCCCCCCCCCCCC(=O)C1 (-)-MUSCONE
CC(=C)[C@H]1CCC(=CC1)C=O (-)-PERILLALDEHYDE
.
.
.
This version of the script works as I expect:
from openbabel import pybel
targetmol = next(pybel.readfile("smi", "/path/to/sample.smi"))
targetfp = targetmol.calcfp() <--- calculate fingerprints of the sample
for mol in pybel.readfile("smi", "/path/to/db.smi"):
fp = mol.calcfp() <--- calculate fingerprints of the db
tan = fp | targetfp <--- calculate the Tanimoto index via the "|" operator
if tan>=0.8:
print(tan)
Output:
1.0
1.0
0.9285714285714286
0.8571428571428571
1.0
1.0
0.9285714285714286
0.8571428571428571
.
.
.
Clearly, in order to give a meaning to the numbers I receive, I need to add the molecule name to the corresponding Tanimoto index. I tried this:
from openbabel import pybel
targetmol = next(pybel.readfile("smi", "/path/to/sample.smi"))
targetfp = targetmol.calcfp()
for mol in pybel.readfile("smi", "/path/to/db.smi"):
fp = mol.calcfp()
tan = (fp | targetfp, mol.title)
if tan>=0.8:
print(tan, title)
As from the title, I receive the following error:
Traceback (most recent call last):
File "test3.py", line 15, in <module>
if tan>=0.8:
TypeError: '>=' not supported between instances of 'tuple' and 'float'
My guess is that python, obviously, cannot apply the if tan>=0.8
operation to a string format but I really do not know how to overcome this problem since, as you can guess, I am very new to programming.
Any hints on how to correct this piece of code will be appreciated. Thank you for your time.
You just have to change it to:
tan[0] >= 0.8:
the comma
,
(the one insidetan = (fp | targetfp, mol.title)
) is the syntax for a tuple, which is basically a not mutable array, so to access elements you need to do it by index like for lists.