Converting SMILES to chemical name or IUPAC name using rdkit or other python module

9.4k views Asked by At

Is there a way to convert SMILES to either chemical name or IUPAC name using RDKit or other python modules?

I couldn't find something very helpful in other posts.

Thank you very much!

3

There are 3 answers

5
Oliver Scott On BEST ANSWER

As far as I am aware this is not possible using rdkit, and I do not know of any python modules with this ability. If you are ok with using a web service you could use the NCI resolver.

Here is a naive implementation of a function to retrieve an IUPAC identifier from a SMILES string:

import requests


CACTUS = "https://cactus.nci.nih.gov/chemical/structure/{0}/{1}"


def smiles_to_iupac(smiles):
    rep = "iupac_name"
    url = CACTUS.format(smiles, rep)
    response = requests.get(url)
    response.raise_for_status()
    return response.text


print(smiles_to_iupac('c1ccccc1'))
print(smiles_to_iupac('CC(=O)OC1=CC=CC=C1C(=O)O'))

[Out]:
BENZENE
2-acetyloxybenzoic acid

You could easily extend it to convert multiple different formats, although the function isn't exactly fast...

Another solution is to use PubChem. You can use the API with the python package pubchempy. Bear in mind this may return multiple compounds.

import pubchempy


# Use the SMILES you provided
smiles = 'O=C(NCc1ccc(C(F)(F)F)cc1)[C@@H]1Cc2[nH]cnc2CN1Cc1ccc([N+](=O)[O-])cc1'
compounds = pubchempy.get_compounds(smiles, namespace='smiles')
match = compounds[0]
print(match.iupac_name)

[Out]:
(6S)-5-[(4-nitrophenyl)methyl]-N-[[4-(trifluoromethyl)phenyl]methyl]-3,4,6,7-tetrahydroimidazo[4,5-c]pyridine-6-carboxamide
0
Time Step On

for my part I have been working on some codes with rdkit, trying to solve the same question, I provide a solution using Google Colab, the smiles names of the molecules are in a list:

!pip install pubchempy #required in google colab
import pubchempy

name = [
"COC1=C(C=C2C(=C1)CC(C2=O)CC3CCN(CC3)CC4=CC=CC=C4)OC",
"C1CCNC2=C(C1)C=CC(=C2)C(=O)CCC3CCN(CC3)CC4=CC=CC=C4",
"C[N+]1(CCC(CC1)CC2CC3=CC(=C(C=C3C2=O)OC)OC)CC4=CC=CC=C4",
"C[N+]1(CCC(CC1)CCC(=O)C2=CC3=C(CCCCN3)C=C2)CC4=CC=CC=C4",
"C[N+]1(CCC(CC1)C(C2CC3=CC(=C(C=C3C2=O)OC)OC)O)CC4=CC=CC=C4",
"C[N+]1(CC[C@@]23C=C[C@@H](CC2OC4=C(C=CC(=C34)C1)OC)O)C",
"CC1=C[C@H]2CC3=C(C=CC(=O)N3)[C@@]4(C1)[C@@H]2CCCN4"]

for i in name:
  compounds = pubchempy.get_compounds(i, namespace='smiles')
  match = compounds[0]
  print(match.iupac_name)

The output here:

Someone left a Google Colab blog to test codes, I leave the link for those who are not very familiar with this site, as of the date of this publication, for me, this is relatively new, just a couple of days using Google Colab: https://colab.research.google.com/drive/1fM88FMcTJytuMQwpDm392v2bmNQbZYNg#scrollTo=iNWdZyjEqoyO

I warn that I am more chemist than a programmer, TimeStep

2
Rag On

Recently I managed this conversion using pubchempy. Here is the code for trying.

import pubchempy as pcp

filename = open("inif.txt", "r") # holds SMILE info

for line in filename :
    event = line
    compounds = pcp.get_compounds(event, namespace='smiles') 
    match = compounds[0]
    print(i,'$$$','the CID is',compounds,'$$$',
            'The IUPAC name is', match.iupac_name, '$$$',
            'for the SMILE', event)
    i+=1