Calculating tanimoto similarity while using SMILES code

172 views Asked by At

I have two different SMILES codes in the columns II_Chemical.structure..SMILES.format. and Chemical.structure..SMILES.format. in my dataset. I want to calculate the Tanimoto similarity between these two columns and create a new column named tanimoto similarity score to store the results. I am a management student and have no background at all in chemistry. I am using Python, and I encountered some errors during the coding process.

Errors that I'm encountered are as follows.

ex1) [22:06:24] SMILES Parse Error: syntax error while parsing: [La;v3].[La;v3].[#8]-#6=O.[#8]-#6=O.[#8]-#6=O
ex2) Explicit valence for atom # 10 O, 3, is greater than permitted.

I would greatly appreciate your assistance in resolving these errors and guiding me through the necessary steps in Python to achieve this. Thank you in advance.

I used the following codes before encountering the above errors.

import pandas as pd
from rdkit import Chem
from rdkit.Chem import AllChem
from itertools import combinations


smiles_list = df['II_Chemical.structure..SMILES.format.'].tolist()


similarities = []

for combo in combinations(smiles_list, 2):
    smiles1, smiles2 = combo
    mol1 = Chem.MolFromSmiles(smiles1)
    mol2 = Chem.MolFromSmiles(smiles2)

if mol1 is not None and mol2 is not None:
    fingerprint1 = AllChem.RDKFingerprint(mol1)
    fingerprint2 = AllChem.RDKFingerprint(mol2)
    tanimoto_similarity = AllChem.DataStructs.TanimotoSimilarity(
        fingerprint1, fingerprint2)
    similarities.append((smiles1, smiles2, tanimoto_similarity))

result_df = pd.DataFrame(similarities, columns=[
                         'SMILES1', 'SMILES2', 'Tanimoto score'])
result_df.to_csv('Tanimoto_similarity_results.csv', index=False)

0

There are 0 answers