I have two different SMILES codes in the columns II_Chemical.structure..SMILES.format.
and Chemical.structure..SMILES.format.
in my dataset.
I want to calculate the Tanimoto similarity between these two columns and create a new column named tanimoto similarity score
to store the results.
I am a management student and have no background at all in chemistry.
I am using Python, and I encountered some errors during the coding process.
Errors that I'm encountered are as follows.
ex1) [22:06:24] SMILES Parse Error: syntax error while parsing: [La;v3].[La;v3].[#8]-#6=O.[#8]-#6=O.[#8]-#6=O
ex2) Explicit valence for atom # 10 O, 3, is greater than permitted.
I would greatly appreciate your assistance in resolving these errors and guiding me through the necessary steps in Python to achieve this. Thank you in advance.
I used the following codes before encountering the above errors.
import pandas as pd
from rdkit import Chem
from rdkit.Chem import AllChem
from itertools import combinations
smiles_list = df['II_Chemical.structure..SMILES.format.'].tolist()
similarities = []
for combo in combinations(smiles_list, 2):
smiles1, smiles2 = combo
mol1 = Chem.MolFromSmiles(smiles1)
mol2 = Chem.MolFromSmiles(smiles2)
if mol1 is not None and mol2 is not None:
fingerprint1 = AllChem.RDKFingerprint(mol1)
fingerprint2 = AllChem.RDKFingerprint(mol2)
tanimoto_similarity = AllChem.DataStructs.TanimotoSimilarity(
fingerprint1, fingerprint2)
similarities.append((smiles1, smiles2, tanimoto_similarity))
result_df = pd.DataFrame(similarities, columns=[
'SMILES1', 'SMILES2', 'Tanimoto score'])
result_df.to_csv('Tanimoto_similarity_results.csv', index=False)